pycommons.strings package¶
Common string handling routines. Constants for common characters. A regular expression matching all characters that are non-line breaking white space. A regular expression matching all characters that are non-line breaking white space. A regular expression matching any white space or newline character. Transform a string into Unicode-based subscript. All characters that can be represented as subscript in unicode will be translated to subscript. Notice that only a subset of the latin characters can be converted to unicode subscript. If any character cannot be translated, it will raise a s ( the string in subscript Transform a string into Unicode-based superscript. All characters that can be represented as superscript in unicode will be translated to superscript. Notice that only a subset of the latin characters can be converted to unicode superscropt. If any character cannot be translated, it will raise a s ( the string in subscript Routines for checking whether a value is a non-empty string w/o spaces. Enforce that a text is a non-empty string. value ( the value, but as type str TypeError – if value is not a str ValueError – if value is empty Enforce that a text is a non-empty string without white space. value ( the value, but as type str TypeError – if value is not a str ValueError – if value is empty or contains any white space characters Converting stuff to and from strings. Convert a Boolean value to a string. This function is the inverse of value ( if value == True if value == False TypeError – if value is not a bool Convert a datetime object to a date string. date ( the date string TypeError – if date is not an instance of Convert a datetime object to a date-time string. dateandtime ( the date-time string TypeError – if dateandtime is not an instance of Convert float to a string. The floating point value value is converted to a string. value ( the string representation TypeError – if value is not a float ValueError – if value is not a number Convert an integer or None to a string. If value is None, “” is returned. If value is an instance of bool, a TypeError is raised. If value is an int, str(val) is returned. Otherwise, a TypeError is thrown. the string representation, ‘’ for None if value is None otherwise TypeError – if value is a bool (notice that bool is a subclass of int) or any other non-int type. Convert a numerical type (int, float) or None to a string. If value is None, then “” is returned. Otherwise, the result of the string representation, “” for None if value is None otherwise TypeError – if value not Nont but instead is a bool (notice that bool is a subclass of int) or any other type that is neither int nor float. ValueError – if value is not-a-number Transform a numerical value which is either int or`float` to a string. If value is an instance of int, the result of its conversion via str will be returned. If value is an instance of bool, a TypeError will be raised. Otherwise, the result of the string TypeError – if value is a bool (notice that bool is a subclass of int) or any other type that is neither int nor float. ValueError – if value is not-a-number Convert a string to a boolean value. This function is the inverse of value ( if value == “T” if value == “F” TypeError – if value is not a string ValueError – if value is neither T nor F Convert a string to an int or None. If the value value is None, then None is returned. If the vlaue value is empty or entirely composed of white space, None is returned. If the value value can be converted to an integer, then an int with the corresponding value is returned. Otherwise, a ValueError is thrown. the int or None TypeError – if value is neither a str nor None ValueError – if value is a str but cannot be base-10 converted to an integer Convert a string to an int or float. If value is not an instance of str, a TypeError will be raised. If the value value can be converted to an integer, then an int with the corresponding value is returned. If the value value can be converted to a float, a float with the appropriate value is returned. Otherwise, a ValueError is thrown. value ( the int or float: Integers are preferred to be used whereever possible TypeError – if value is not a str ValueError – if value is a str but cannot be converted to an integer (base-10) or converts to a float which is not a number Convert a string to an int or float or None. If the value value is None, then None is returned. If the vlaue value is empty or entirely composed of white space, None is returned. If the value value can be converted to an integer, then an int with the corresponding value is returned. If the value value can be converted to a float, a float with the appropriate value is returned. Otherwise, a ValueError is thrown. the int or float or None TypeError – if value is neither a str nor None ValueError – if value is a str but cannot be converted to an integer (base-10) or converts to a float which is not a number Routines for handling strings. Compute the common prefix of an iterable of strings. strings ( the common prefix TypeError – if the input is not a string, iterable of string, or contains any non-string element (before the prefix is determined) Notice: If the prefix is determined as the empty string, then the search is stopped. If some non-str items follow later in strings, then these may not raise a TypeError Replace all occurrences of ‘search’ in ‘inside’ with ‘replace’. This replacement procedure is done repetitively and recursively until no occurrence of search is found anymore. This, of course, may lead to an endless loop, so a ValueError is thrown if there are too many recursive replacements. search ( replace ( inside ( the new string after the recursive replacement TypeError – if any of the parameters is not of the right type ValueError – if there are 100000 recursive replacements or more, indicating that there could be an endless loop Perform a recursive replacement of strings. After applying this function, there will not be any occurence of find left in src. All of them will have been replaced by replace. If that produces new instances of find, these will be replaced as well unless they do not make the string shorter. In other words, the replacement is continued only if the new string becomes shorter. See the string src, with all occurrences of find replaced by replace TypeError – if any of the parameters are not strings Split a string by the given other string. The goal is to provide a less memory intense variant of the method each split elementSubmodules¶
pycommons.strings.chars module¶
KeyError
. White space is preserved.str
) – the string>>> subscript("a0= 4(e)")
'ₐ₀₌ ₄₍ₑ₎'
>>> try:
... subscript("a0=4(e)Y")
... except KeyError as ke:
... print(ke)
'Y'
>>> try:
... subscript(None)
... except TypeError as te:
... print(te)
descriptor '__iter__' requires a 'str' object but received a 'NoneType'
>>> try:
... superscript(1)
... except TypeError as te:
... print(te)
descriptor '__iter__' requires a 'str' object but received a 'int'
KeyError
. White space is preserved.str
) – the string>>> superscript("a0 =4(e)")
'ᵃ⁰ ⁼⁴⁽ᵉ⁾'
>>> try:
... superscript("a0=4(e)Y")
... except KeyError as ke:
... print(ke)
'Y'
>>> try:
... superscript(None)
... except TypeError as te:
... print(te)
descriptor '__iter__' requires a 'str' object but received a 'NoneType'
>>> try:
... superscript(1)
... except TypeError as te:
... print(te)
descriptor '__iter__' requires a 'str' object but received a 'int'
pycommons.strings.enforce module¶
Any
) – the value to be checked whether it is a non-empty string>>> enforce_non_empty_str("1")
'1'
>>> enforce_non_empty_str(" 1 1 ")
' 1 1 '
>>> try:
... enforce_non_empty_str("")
... except ValueError as ve:
... print(ve)
Non-empty str expected, but got empty string.
>>> try:
... enforce_non_empty_str(1)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
... enforce_non_empty_str(None)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
Any
) – the value to be checked whether it is a non-empty string without any white space>>> enforce_non_empty_str_without_ws("1")
'1'
>>> try:
... enforce_non_empty_str_without_ws(" 1 1 ")
... except ValueError as ve:
... print(ve)
No white space allowed in string, but got ' 1 1 '.
>>> try:
... enforce_non_empty_str_without_ws("a\tb")
... except ValueError as ve:
... print(ve)
No white space allowed in string, but got 'a\tb'.
>>> try:
... enforce_non_empty_str_without_ws("012345678901234567890 12345678")
... except ValueError as ve:
... print(ve)
No white space allowed in string, but got '012345678901234567890 12345678'.
>>> try:
... enforce_non_empty_str_without_ws(
... "012345678901234567890 1234567801234567890123456789012345678")
... except ValueError as ve:
... print(str(ve)[10:])
pace allowed in string, but got '012345678901234567890 12345678...'.
>>> try:
... enforce_non_empty_str_without_ws("")
... except ValueError as ve:
... print(ve)
Non-empty str expected, but got empty string.
>>> try:
... enforce_non_empty_str_without_ws(1)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
... enforce_non_empty_str_without_ws(None)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
pycommons.strings.string_conv module¶
str_to_bool()
.bool
) – the Boolean value>>> print(bool_to_str(True))
T
>>> print(bool_to_str(False))
F
>>> try:
... bool_to_str("t")
... except TypeError as te:
... print(te)
value should be an instance of bool but is str, namely 't'.
>>> try:
... bool_to_str(None)
... except TypeError as te:
... print(te)
value should be an instance of bool but is None.
datetime
) – the datedatetime.datetime
.>>> datetime_to_date_str(datetime(1999, 12, 21))
'1999‑12‑21'
>>> try:
... datetime_to_date_str(None)
... except TypeError as te:
... print(te)
date should be an instance of datetime.datetime but is None.
>>> try:
... datetime_to_date_str(1)
... except TypeError as te:
... print(te)
date should be an instance of datetime.datetime but is int, namely 1.
datetime
) – the date and timedatetime.datetime
.>>> datetime_to_datetime_str(datetime(1999, 12, 21, 13, 42, 23))
'1999\u201112\u201121\xa013:42'
>>> from datetime import timezone
>>> datetime_to_datetime_str(datetime(1999, 12, 21, 13, 42,
... tzinfo=timezone.utc))
'1999\u201112\u201121\xa013:42\xa0UTC'
>>> try:
... datetime_to_datetime_str(None)
... except TypeError as te:
... print(te)
dateandtime should be an instance of datetime.datetime but is None.
>>> try:
... datetime_to_datetime_str(1)
... except TypeError as te:
... print(str(te)[:60])
dateandtime should be an instance of datetime.datetime but i
float
) – the floating point value>>> float_to_str(1.3)
'1.3'
>>> float_to_str(1.0)
'1'
>>> float_to_str(1e-5)
'1e-5'
>>> try:
... float_to_str(1)
... except TypeError as te:
... print(te)
value should be an instance of float but is int, namely 1.
>>> try:
... float_to_str(None)
... except TypeError as te:
... print(te)
value should be an instance of float but is None.
>>> from math import nan
>>> try:
... float_to_str(nan)
... except ValueError as ve:
... print(ve)
nan => 'nan' is not a permitted float.
>>> from math import inf
>>> float_to_str(inf)
'inf'
>>> float_to_str(-inf)
'-inf'
>>> float_to_str(1e300)
'1e300'
>>> float_to_str(-1e300)
'-1e300'
>>> float_to_str(-1e-300)
'-1e-300'
>>> float_to_str(1e-300)
'1e-300'
>>> float_to_str(1e1)
'10'
>>> float_to_str(1e5)
'100000'
>>> float_to_str(1e10)
'10000000000'
>>> float_to_str(1e20)
'1e20'
>>> float_to_str(1e030)
'1e30'
>>> float_to_str(0.0)
'0'
>>> float_to_str(-0.0)
'0'
>>> print(repr(int_or_none_to_str(None)))
''
>>> print(int_or_none_to_str(12))
12
>>> try:
... int_or_none_to_str(True)
... except TypeError as te:
... print(te)
value should be an instance of int but is bool, namely True.
>>> try:
... int_or_none_to_str(False)
... except TypeError as te:
... print(te)
value should be an instance of int but is bool, namely False.
>>> print(int_or_none_to_str(-10))
-10
>>> try:
... int_or_none_to_str(1.0)
... except TypeError as te:
... print(te)
value should be an instance of int but is float, namely 1.0.
num_to_str()
is returned.>>> print(repr(num_or_none_to_str(None)))
''
>>> print(num_or_none_to_str(12))
12
>>> print(num_or_none_to_str(12.3))
12.3
>>> try:
... num_or_none_to_str(True)
... except TypeError as te:
... print(te)
value should be an instance of any in {float, int} but is bool, namely True.
>>> try:
... num_or_none_to_str(False)
... except TypeError as te:
... print(te)
value should be an instance of any in {float, int} but is bool, namely False.
>>> from math import nan
>>> try:
... num_to_str(nan)
... except ValueError as ve:
... print(ve)
nan => 'nan' is not a permitted float.
float_to_str()
is returned. This means that nan will yield a ValueError and anything that is neither an int, bool, or float will incur a TypeError.>>> num_to_str(1)
'1'
>>> num_to_str(1.5)
'1.5'
>>> try:
... num_to_str(True)
... except TypeError as te:
... print(te)
value should be an instance of any in {float, int} but is bool, namely True.
>>> try:
... num_to_str(False)
... except TypeError as te:
... print(te)
value should be an instance of any in {float, int} but is bool, namely False.
>>> try:
... num_to_str("x")
... except TypeError as te:
... print(te)
value should be an instance of float but is str, namely 'x'.
>>> try:
... num_to_str(None)
... except TypeError as te:
... print(te)
value should be an instance of float but is None.
>>> from math import inf, nan
>>> try:
... num_to_str(nan)
... except ValueError as ve:
... print(ve)
nan => 'nan' is not a permitted float.
>>> num_to_str(inf)
'inf'
>>> num_to_str(-inf)
'-inf'
bool_to_str()
.str
) – the string value>>> str_to_bool("T")
True
>>> str_to_bool("F")
False
>>> try:
... str_to_bool("x")
... except ValueError as v:
... print(v)
Expected 'T' or 'F', but got 'x'.
>>> try:
... str_to_bool(1)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
... str_to_bool(None)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> print(str_to_int_or_none(""))
None
>>> print(str_to_int_or_none("5"))
5
>>> print(str_to_int_or_none(None))
None
>>> print(str_to_int_or_none(" "))
None
>>> try:
... print(str_to_int_or_none(1.3))
... except TypeError as te:
... print(te)
value should be an instance of str but is float, namely 1.3.
>>> try:
... print(str_to_int_or_none("1.3"))
... except ValueError as ve:
... print(ve)
invalid literal for int() with base 10: '1.3'
str
) – the string value>>> print(type(str_to_num("15.0")))
<class 'int'>
>>> print(type(str_to_num("15.1")))
<class 'float'>
>>> str_to_num("inf")
inf
>>> str_to_num(" -inf ")
-inf
>>> try:
... str_to_num(21)
... except TypeError as te:
... print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'int' object
>>> try:
... str_to_num("nan")
... except ValueError as ve:
... print(ve)
NaN is not permitted, but got 'nan'.
>>> try:
... str_to_num("12-3")
... except ValueError as ve:
... print(ve)
Invalid numerical value '12-3'.
>>> str_to_num("1e34423")
inf
>>> str_to_num("-1e34423")
-inf
>>> str_to_num("-1e-34423")
0
>>> str_to_num("1e-34423")
0
>>> try:
... str_to_num("-1e-34e4423")
... except ValueError as ve:
... print(ve)
Invalid numerical value '-1e-34e4423'.
>>> try:
... str_to_num("T")
... except ValueError as ve:
... print(ve)
Invalid numerical value 'T'.
>>> try:
... str_to_num("F")
... except ValueError as ve:
... print(ve)
Invalid numerical value 'F'.
>>> try:
... str_to_num(None)
... except TypeError as te:
... print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'NoneType' object
>>> try:
... str_to_num("")
... except ValueError as ve:
... print(ve)
Value '' becomes empty after stripping, cannot be converted to a number.
>>> print(type(str_to_num_or_none("15.0")))
<class 'int'>
>>> print(type(str_to_num_or_none("15.1")))
<class 'float'>
>>> str_to_num_or_none("inf")
inf
>>> str_to_num_or_none(" -inf ")
-inf
>>> try:
... str_to_num_or_none(21)
... except TypeError as te:
... print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'int' object
>>> try:
... str_to_num_or_none("nan")
... except ValueError as ve:
... print(ve)
NaN is not permitted, but got 'nan'.
>>> try:
... str_to_num_or_none("12-3")
... except ValueError as ve:
... print(ve)
Invalid numerical value '12-3'.
>>> str_to_num_or_none("1e34423")
inf
>>> str_to_num_or_none("-1e34423")
-inf
>>> str_to_num_or_none("-1e-34423")
0
>>> str_to_num_or_none("1e-34423")
0
>>> try:
... str_to_num_or_none("-1e-34e4423")
... except ValueError as ve:
... print(ve)
Invalid numerical value '-1e-34e4423'.
>>> try:
... str_to_num_or_none("T")
... except ValueError as ve:
... print(ve)
Invalid numerical value 'T'.
>>> try:
... str_to_num_or_none("F")
... except ValueError as ve:
... print(ve)
Invalid numerical value 'F'.
>>> print(str_to_num_or_none(""))
None
>>> print(str_to_num_or_none(None))
None
>>> print(type(str_to_num_or_none("5.0")))
<class 'int'>
>>> print(type(str_to_num_or_none("5.1")))
<class 'float'>
pycommons.strings.tools module¶
Union
[str
, Iterable
[str
]]) – the iterable of strings>>> get_prefix_str(["abc", "acd"])
'a'
>>> get_prefix_str(["xyz", "gsdf"])
''
>>> get_prefix_str([])
''
>>> get_prefix_str(["abx"])
'abx'
>>> get_prefix_str(("abx", ))
'abx'
>>> get_prefix_str({"abx", })
'abx'
>>> get_prefix_str("abx")
'abx'
>>> get_prefix_str(("\\relative.path", "\\relative.figure",
... "\\relative.code"))
'\\relative.'
>>> get_prefix_str({"\\relative.path", "\\relative.figure",
... "\\relative.code"})
'\\relative.'
>>> try:
... get_prefix_str(None)
... except TypeError as te:
... print(te)
strings should be an instance of any in {str, typing.Iterable} but is None.
>>> try:
... get_prefix_str(1)
... except TypeError as te:
... print(str(te)[:60])
strings should be an instance of any in {str, typing.Iterabl
>>> try:
... get_prefix_str(["abc", "acd", 2, "x"])
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
... get_prefix_str(["abc", "acd", None, "x"])
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> get_prefix_str(["xyz", "gsdf", 5])
''
Union
[str
, Pattern
]) – the regular expression to search, either a string or a patternUnion
[str
, Callable
[[Match
], str
]]) – the string to replace it with, or a function receiving a re.Match
instance and returning a replacement stringstr
) – the string in which to search/replace>>> replace_regex('[ \t]+\n', '\n', ' bla \nxyz\tabc\t\n')
' bla\nxyz\tabc\n'
>>> replace_regex('[0-9]A', 'X', '23A7AA')
'2XXA'
>>> from re import compile as cpx
>>> replace_regex(cpx('[0-9]A'), 'X', '23A7AA')
'2XXA'
>>> def __repl(a):
... print(repr(a))
... return "y"
>>> replace_regex("a.b", __repl, "albaab")
<re.Match object; span=(0, 3), match='alb'>
<re.Match object; span=(3, 6), match='aab'>
'yy'
>>> def __repl(a):
... print(repr(a))
... ss = a.group()
... print(ss)
... return "axb"
>>> replace_regex("aa.bb", __repl, "aaaaaxbbbbb")
<re.Match object; span=(3, 8), match='aaxbb'>
aaxbb
<re.Match object; span=(2, 7), match='aaxbb'>
aaxbb
<re.Match object; span=(1, 6), match='aaxbb'>
aaxbb
<re.Match object; span=(0, 5), match='aaxbb'>
aaxbb
'axb'
>>> replace_regex("aa.bb", "axb", "aaaaaxbbbbb")
'axb'
>>> replace_regex("aa.bb", "axb", "".join("a" * 100 + "y" + "b" * 100))
'axb'
>>> replace_regex("aa.bb", "axb",
... "".join("a" * 10000 + "y" + "b" * 10000))
'axb'
>>> try:
... replace_regex(1, "1", "2")
... except TypeError as te:
... print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
... replace_regex(None, "1", "2")
... except TypeError as te:
... print(te)
search should be an instance of any in {str, typing.Pattern} but is None.
>>> try:
... replace_regex("x", 2, "2")
... except TypeError as te:
... print(te)
replace should be an instance of str or a callable but is int, namely 2.
>>> try:
... replace_regex("x", None, "2")
... except TypeError as te:
... print(te)
replace should be an instance of str or a callable but is None.
>>> try:
... replace_regex(1, 1, "2")
... except TypeError as te:
... print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
... replace_regex("yy", "1", 3)
... except TypeError as te:
... print(te)
inside should be an instance of str but is int, namely 3.
>>> try:
... replace_regex("adad", "1", None)
... except TypeError as te:
... print(te)
inside should be an instance of str but is None.
>>> try:
... replace_regex(1, "1", 3)
... except TypeError as te:
... print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
... replace_regex(1, 3, 5)
... except TypeError as te:
... print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
... replace_regex("abab|baab|bbab|aaab|aaaa|bbbb", "baba",
... "ababababab")
... except ValueError as ve:
... print(str(ve)[:50])
Too many replacements, pattern re.compile('abab|ba
replace_regex()
for regular-expression based replacements.>>> replace_str("a", "b", "abc")
'bbc'
>>> replace_str("aa", "a", "aaaaa")
'a'
>>> replace_str("aba", "a", "abaababa")
'aa'
>>> replace_str("aba", "aba", "abaababa")
'abaababa'
>>> replace_str("aa", "aa", "aaaaaaaa")
'aaaaaaaa'
>>> replace_str("a", "aa", "aaaaaaaa")
'aaaaaaaaaaaaaaaa'
>>> replace_str("a", "xx", "aaaaaaaa")
'xxxxxxxxxxxxxxxx'
>>> try:
... replace_str(None, "a", "b")
... except TypeError as te:
... print(te)
replace() argument 1 must be str, not None
>>> try:
... replace_str(1, "a", "b")
... except TypeError as te:
... print(te)
replace() argument 1 must be str, not int
>>> try:
... replace_str("a", None, "b")
... except TypeError as te:
... print(te)
replace() argument 2 must be str, not None
>>> try:
... replace_str("x", 1, "b")
... except TypeError as te:
... print(te)
replace() argument 2 must be str, not int
>>> try:
... replace_str("a", "v", None)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> try:
... replace_str("x", "xy", 1)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
str.split()
. This routine should iteratively divide a given string based on a splitting character or string. This function may be useful if we are dealing with a very big source string and want to iteratively split it into smaller strings. Instead of creating a list with many small strings, what str.split()
does, it creates these strings iteratively>>> list(split_str("", ""))
['']
>>> list(split_str("", "x"))
['']
>>> list(split_str("a", ""))
['a']
>>> list(split_str("abc", ""))
['a', 'b', 'c']
>>> list(split_str("a;b;c", ";"))
['a', 'b', 'c']
>>> list(split_str("a;b;c;", ";"))
['a', 'b', 'c', '']
>>> list(split_str(";a;b;;c;", ";"))
['', 'a', 'b', '', 'c', '']
>>> list(split_str("a;aaa;aba;aa;aca;a", "a;a"))
['', 'a', 'b', '', 'c', '']