[python] 파이썬에서 문자열을 부동으로 변환 할 수 있는지 확인

문자열 목록을 통해 실행되고 가능한 경우 정수 또는 부동 소수점 숫자로 변환하는 Python 코드가 있습니다. 정수로 이것을하는 것은 매우 쉽습니다.

if element.isdigit():
  newelement = int(element)

부동 소수점 숫자는 더 어렵습니다. 지금 partition('.')은 문자열을 분할하고 한쪽 또는 양쪽이 숫자인지 확인하는 데 사용하고 있습니다.

partition = element.partition('.')
if (partition[0].isdigit() and partition[1] == '.' and partition[2].isdigit())
    or (partition[0] == '' and partition[1] == '.' and partition[2].isdigit())
    or (partition[0].isdigit() and partition[1] == '.' and partition[2] == ''):
  newelement = float(element)

이것은 작동하지만 분명히 if 문은 약간의 곰입니다. 내가 고려한 다른 해결책은 이 질문에 설명 된 것처럼 변환을 try / catch 블록으로 감싸고 성공했는지 확인하는 것 입니다.

다른 아이디어가 있습니까? 파티션 및 시도 / 캐치 접근의 상대적인 장점에 대한 의견?

답변

난 그냥 사용합니다 ..

try:
    float(element)
except ValueError:
    print "Not a float"

.. 그것은 간단하고 작동합니다

또 다른 옵션은 정규식입니다.

import re
if re.match(r'^-?\d+(?:\.\d+)?$', element) is None:
    print "Not float"

답변

float 확인을위한 Python 메소드 :

def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

플로트 보트에 숨어있는 고블린들에게 물지 마세요! 단위 테스트를 수행하십시오!

플로트가 아닌 것은 무엇입니까?

Command to parse                        Is it a float?  Comment
--------------------------------------  --------------- ------------
print(isfloat(""))                      False
print(isfloat("1234567"))               True
print(isfloat("NaN"))                   True            nan is also float
print(isfloat("NaNananana BATMAN"))     False
print(isfloat("123.456"))               True
print(isfloat("123.E4"))                True
print(isfloat(".1"))                    True
print(isfloat("1,234"))                 False
print(isfloat("NULL"))                  False           case insensitive
print(isfloat(",1"))                    False
print(isfloat("123.EE4"))               False
print(isfloat("6.523537535629999e-07")) True
print(isfloat("6e777777"))              True            This is same as Inf
print(isfloat("-iNF"))                  True
print(isfloat("1.797693e+308"))         True
print(isfloat("infinity"))              True
print(isfloat("infinity and BEYOND"))   False
print(isfloat("12.34.56"))              False           Two dots not allowed.
print(isfloat("#56"))                   False
print(isfloat("56%"))                   False
print(isfloat("0E0"))                   True
print(isfloat("x86E0"))                 False
print(isfloat("86-5"))                  False
print(isfloat("True"))                  False           Boolean is not a float.
print(isfloat(True))                    True            Boolean is a float
print(isfloat("+1e1^5"))                False
print(isfloat("+1e1"))                  True
print(isfloat("+1e1.3"))                False
print(isfloat("+1.3P1"))                False
print(isfloat("-+1"))                   False
print(isfloat("(1)"))                   False           brackets not interpreted

답변

'1.43'.replace('.','',1).isdigit()

true‘.’가 없거나없는 경우에만 반환 됩니다. 자릿수로.

'1.4.3'.replace('.','',1).isdigit()

돌아올 것이다 false

'1.ww'.replace('.','',1).isdigit()

돌아올 것이다 false

답변

TL; DR :

입력이 대부분 float로 변환 될 수 있는 문자열 인 경우이 try: except:방법이 가장 적합한 기본 Python 방법입니다.
입력이 대부분 부동 소수점으로 변환 할 수없는 문자열 인 경우 정규식 또는 파티션 방법이 더 좋습니다.
1) 입력이 확실하지 않거나 속도가 더 필요하고 2) 타사 C 확장을 신경 쓰지 않고 설치할 수 있으면 빠른 번호 가 매우 잘 작동합니다.

fastnumbers 라는 타사 모듈을 통해 사용할 수있는 다른 방법이 있습니다 (공개, 저는 저자입니다). isfloat 라는 함수를 제공합니다 . 이 답변 에서 Jacob Gabrielson 이 간략하게 설명한 unittest 예제를 취 했지만 fastnumbers.isfloat방법을 추가했습니다 . 또한 Jacob의 예제는 점 연산자로 인해 전역 조회에 소비 되었기 때문에 정규식 옵션에 대한 정의를 수행하지 않았다는 점에 유의해야합니다 try: except:.

def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$").match
def is_float_re(str):
    return True if _float_regexp(str) else False

def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and partition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True
    else:
        return False

from fastnumbers import isfloat


if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('ttest.is_float_re("12.2x")', "import ttest").timeit()
            print 're happy:', timeit.Timer('ttest.is_float_re("12.2")', "import ttest").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('ttest.is_float_try("12.2x")', "import ttest").timeit()
            print 'try happy:', timeit.Timer('ttest.is_float_try("12.2")', "import ttest").timeit()

        def test_fn_perf(self):
            print
            print 'fn sad:', timeit.Timer('ttest.isfloat("12.2x")', "import ttest").timeit()
            print 'fn happy:', timeit.Timer('ttest.isfloat("12.2")', "import ttest").timeit()


        def test_part_perf(self):
            print
            print 'part sad:', timeit.Timer('ttest.is_float_partition("12.2x")', "import ttest").timeit()
            print 'part happy:', timeit.Timer('ttest.is_float_partition("12.2")', "import ttest").timeit()

    unittest.main()

내 컴퓨터에서 출력은 다음과 같습니다.

fn sad: 0.220988988876
fn happy: 0.212214946747
.
part sad: 1.2219619751
part happy: 0.754667043686
.
re sad: 1.50515985489
re happy: 1.01107215881
.
try sad: 2.40243887901
try happy: 0.425730228424
.
----------------------------------------------------------------------
Ran 4 tests in 7.761s

OK

보시다시피, 정규 표현식은 실제로 원래 보이는 것만 큼 나쁘지 않으며, 실제로 속도가 필요한 경우 fastnumbers방법이 좋습니다.

답변

성능에 관심이 있다면 (그리고 내가 제안하지는 않겠지 만) 시도 기반 접근 방식은 파티션 기반 접근 방식이나 정규 표현식 접근 방식과 비교할 때 확실한 승자가 될 것입니다. 유효하지 않은 문자열.이 경우 잠재적으로 느릴 수 있습니다 (아마 예외 처리 비용으로 인해).

다시 한 번 말하지만, 성능에 신경 쓰지 말고 초당 100 억 번을 수행하는 경우 데이터를 제공하는 것입니다. 또한 파티션 기반 코드는 하나 이상의 유효한 문자열을 처리하지 않습니다.

$ ./floatstr.py
에프..
슬픈 파티션 : 3.1102449894
행복한 파티션 : 2.09208488464
..
다시 슬퍼 : 7.76906108856
다시 행복 : 7.09421992302
..
슬프다 : 12.1525540352
행복해 : 1.44165301323
.
===================================================== =====================
실패 : test_partition (__ main __. ConvertTests)
-------------------------------------------------- --------------------
역 추적 (가장 최근 통화) :
  test_partition의 48 행 "./floatstr.py"파일
    self.failUnless (is_float_partition ( "20e2"))
AssertionError

-------------------------------------------------- --------------------
33.670에서 8 번의 테스트 실행

실패 (실패 = 1)

코드는 다음과 같습니다 (Python 2.6, John Gietzen의 답변 에서 가져온 regexp ) :

def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$")
def is_float_re(str):
    return re.match(_float_regexp, str)


def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and pa\
rtition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True

if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):
        def test_re(self):
            self.failUnless(is_float_re("20e2"))

        def test_try(self):
            self.failUnless(is_float_try("20e2"))

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('floatstr.is_float_re("12.2x")', "import floatstr").timeit()
            print 're happy:', timeit.Timer('floatstr.is_float_re("12.2")', "import floatstr").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('floatstr.is_float_try("12.2x")', "import floatstr").timeit()
            print 'try happy:', timeit.Timer('floatstr.is_float_try("12.2")', "import floatstr").timeit()

        def test_partition_perf(self):
            print
            print 'partition sad:', timeit.Timer('floatstr.is_float_partition("12.2x")', "import floatstr").timeit()
            print 'partition happy:', timeit.Timer('floatstr.is_float_partition("12.2")', "import floatstr").timeit()

        def test_partition(self):
            self.failUnless(is_float_partition("20e2"))

        def test_partition2(self):
            self.failUnless(is_float_partition(".2"))

        def test_partition3(self):
            self.failIf(is_float_partition("1234x.2"))

    unittest.main()

답변

다양성을 위해 여기에 또 다른 방법이 있습니다.

>>> all([i.isnumeric() for i in '1.2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2.f'.split('.',1)])
False

편집 : 특히 지수가있을 때 플로트의 모든 경우를 견딜 수는 없습니다. 이를 해결하기 위해 다음과 같이 보입니다. 이것은 val은 int에 대한 float 및 False이지만 True 만 반환하지만 정규 표현식보다 성능이 떨어집니다.

>>> def isfloat(val):
...     return all([ [any([i.isnumeric(), i in ['.','e']]) for i in val],  len(val.split('.')) == 2] )
...
>>> isfloat('1')
False
>>> isfloat('1.2')
True
>>> isfloat('1.2e3')
True
>>> isfloat('12e3')
False

답변

이 정규식은 과학 부동 소수점 숫자를 확인합니다.

^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$

그러나 최선의 방법은 파서를 시도해 보는 것입니다.