pycommons.strings package¶
Common string handling routines. Constants for common characters. A regular expression matching all characters that are non-line breaking white space. A regular expression matching all characters that are non-line breaking white space. A regular expression matching any white space or newline character. Transform a string into Unicode-based subscript. All characters that can be represented as subscript in unicode will be translated to subscript. Notice that only a subset of the latin characters can be converted to unicode subscript. If any character cannot be translated, it will raise a s ( the string in subscript Transform a string into Unicode-based superscript. All characters that can be represented as superscript in unicode will be translated to superscript. Notice that only a subset of the latin characters can be converted to unicode superscropt. If any character cannot be translated, it will raise a s ( the string in subscript Some string splitting and processing routines. Enforce that a text is a non-empty string. value ( the text TypeError – if text is not a str ValueError – if text is empty Enforce that a text is a non-empty string without white space. value ( the text, if everything does well TypeError – if text is not a str ValueError – if text is empty or contains any white space characters Return the input if it is a string, otherwise throw an error. value ( value if isinstance(value, str) TypeError – if not isinstance(value, str) Converting stuff to and from strings. Convert a Boolean value to a string. This function is the inverse of value ( if value == True if value == False TypeError – if value is not a bool Convert a datetime object to a date string. date ( the date string TypeError – if date is not an instance of Convert a datetime object to a date-time string. dateandtime ( the date-time string TypeError – if dateandtime is not an instance of Convert float to a string. The floating point value value is converted to a string. value ( the string representation TypeError – if value is not a float ValueError – if value is not a number Convert an integer or None to a string. If value is None, “” is returned. If value is an instance of bool, a TypeError is raised. If value is an int, str(val) is returned. Otherwise, a TypeError is thrown. the string representation, ‘’ for None if value is None otherwise TypeError – if value is a bool (notice that bool is a subclass of int) or any other non-int type. Convert a numerical type (int, float) or None to a string. If value is None, then “” is returned. Otherwise, the result of the string representation, “” for None if value is None otherwise TypeError – if value not Nont but instead is a bool (notice that bool is a subclass of int) or any other type that is neither int nor float. ValueError – if value is not-a-number Transform a numerical type (int, float, or bool) to a string. If value is an instance of int, the result of its conversion via str will be returned. If value is an instance of bool, a TypeError will be raised. Otherwise, the result of the string TypeError – if value is a bool (notice that bool is a subclass of int) or any other type that is neither int nor float. ValueError – if value is not-a-number Convert a string to a boolean value. This function is the inverse of value ( if value == “T” if value == “F” TypeError – if value is not a string ValueError – if value is neither T nor F Convert a string to an int or None. If the value value is None, then None is returned. If the vlaue value is empty or entirely composed of white space, None is returned. If the value value can be converted to an integer, then an int with the corresponding value is returned. Otherwise, a ValueError is thrown. the int or None TypeError – if value is neither a str nor None ValueError – if value is a str but cannot be base-10 converted to an integer Convert a string to an int or float. If value is not an instance of str, a TypeError will be raised. If the value value can be converted to an integer, then an int with the corresponding value is returned. If the value value can be converted to a float, a float with the appropriate value is returned. Otherwise, a ValueError is thrown. value ( the int or float: Integers are preferred to be used whereever possible TypeError – if value is not a str ValueError – if value is a str but cannot be converted to an integer (base-10) or converts to a float which is not a number Convert a string to an int or float or None. If the value value is None, then None is returned. If the vlaue value is empty or entirely composed of white space, None is returned. If the value value can be converted to an integer, then an int with the corresponding value is returned. If the value value can be converted to a float, a float with the appropriate value is returned. Otherwise, a ValueError is thrown. the int or float or None TypeError – if value is neither a str nor None ValueError – if value is a str but cannot be converted to an integer (base-10) or converts to a float which is not a number Routines for handling strings. Compute the common prefix of an iterable of strings. strings ( the common prefix TypeError – if the input is not a string, iterable of string, or contains any non-string element (before the prefix is determined) Notice: If the prefix is determined as the empty string, then the search is stopped. If some non-str items follow later in strings, then these may not raise a TypeError Replace all occurrences of ‘search’ in ‘inside’ with ‘replace’. This replacement procedure is done repetitively and recursively until no occurrence of search is found anymore. This, of course, may lead to an endless loop, so a ValueError is thrown if there are too many recursive replacements. search ( replace ( inside ( the new string after the recursive replacement TypeError – if any of the parameters is not of the right type ValueError – if there are 100000 recursive replacements or more, indicating that there could be an endless loop Perform a recursive replacement of strings. After applying this function, there will not be any occurence of find left in src. All of them will have been replaced by replace. If that produces new instances of find, these will be replaced as well unless they do not make the string shorter. In other words, the replacement is continued only if the new string becomes shorter. See the string src, with all occurrences of find replaced by replace TypeError – if any of the parameters are not stringsSubmodules¶
pycommons.strings.chars module¶
KeyError
. White space is preserved.str
) – the string>>> subscript("a0= 4(e)")
'ₐ₀₌ ₄₍ₑ₎'
>>> try:
... subscript("a0=4(e)Y")
... except KeyError as ke:
... print(ke)
'Y'
>>> try:
... subscript(None)
... except TypeError as te:
... print(te)
descriptor '__iter__' requires a 'str' object but received a 'NoneType'
>>> try:
... superscript(1)
... except TypeError as te:
... print(te)
descriptor '__iter__' requires a 'str' object but received a 'int'
KeyError
. White space is preserved.str
) – the string>>> superscript("a0 =4(e)")
'ᵃ⁰ ⁼⁴⁽ᵉ⁾'
>>> try:
... superscript("a0=4(e)Y")
... except KeyError as ke:
... print(ke)
'Y'
>>> try:
... superscript(None)
... except TypeError as te:
... print(te)
descriptor '__iter__' requires a 'str' object but received a 'NoneType'
>>> try:
... superscript(1)
... except TypeError as te:
... print(te)
descriptor '__iter__' requires a 'str' object but received a 'int'
pycommons.strings.enforce module¶
Any
) – the text>>> enforce_non_empty_str("1")
'1'
>>> enforce_non_empty_str(" 1 1 ")
' 1 1 '
>>> try:
... enforce_non_empty_str("")
... except ValueError as ve:
... print(ve)
Non-empty str expected, but got ''.
>>> try:
... enforce_non_empty_str(1)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
... enforce_non_empty_str(None)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
Any
) – the text>>> enforce_non_empty_str_without_ws("1")
'1'
>>> try:
... enforce_non_empty_str_without_ws(" 1 1 ")
... except ValueError as ve:
... print(ve)
No white space allowed in string, but got ' 1 1 '.
>>> try:
... enforce_non_empty_str_without_ws("a\tb")
... except ValueError as ve:
... print(ve)
No white space allowed in string, but got 'a\tb'.
>>> try:
... enforce_non_empty_str_without_ws("")
... except ValueError as ve:
... print(ve)
Non-empty str expected, but got ''.
>>> try:
... enforce_non_empty_str_without_ws(1)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
... enforce_non_empty_str_without_ws(None)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
Any
) – the value>>> enforce_str("1")
'1'
>>> enforce_str("")
''
>>> try:
... enforce_str(1)
... except TypeError as te:
... print(te)
value should be an instance of str but is int, namely '1'.
>>> try:
... enforce_str(None)
... except TypeError as te:
... print(te)
value should be an instance of str but is None.
pycommons.strings.string_conv module¶
str_to_bool()
.bool
) – the Boolean value>>> print(bool_to_str(True))
:rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`str\``
T
>>> print(bool_to_str(False))
F
>>> try:
... bool_to_str("t")
... except TypeError as te:
... print(te)
value should be an instance of bool but is str, namely 't'.
>>> try:
... bool_to_str(None)
... except TypeError as te:
... print(te)
value should be an instance of bool but is None.
datetime
) – the datedatetime.datetime
.>>> datetime_to_date_str(datetime(1999, 12, 21))
'1999‑12‑21'
>>> try:
... datetime_to_date_str(None)
... except TypeError as te:
... print(te)
date should be an instance of datetime.datetime but is None.
>>> try:
... datetime_to_date_str(1)
... except TypeError as te:
... print(te)
date should be an instance of datetime.datetime but is int, namely '1'.
datetime
) – the date and timedatetime.datetime
.>>> datetime_to_datetime_str(datetime(1999, 12, 21, 13, 42, 23))
'1999\u201112\u201121\xa013:42'
>>> from datetime import timezone
>>> datetime_to_datetime_str(datetime(1999, 12, 21, 13, 42,
... tzinfo=timezone.utc))
'1999\u201112\u201121\xa013:42\xa0UTC'
>>> try:
... datetime_to_datetime_str(None)
... except TypeError as te:
... print(te)
dateandtime should be an instance of datetime.datetime but is None.
>>> try:
... datetime_to_datetime_str(1)
... except TypeError as te:
... print(str(te)[:60])
dateandtime should be an instance of datetime.datetime but i
float
) – the floating point value>>> float_to_str(1.3)
'1.3'
>>> float_to_str(1.0)
'1'
>>> float_to_str(1e-5)
'1e-5'
>>> try:
... float_to_str(1)
... except TypeError as te:
... print(te)
value should be an instance of float but is int, namely '1'.
>>> try:
... float_to_str(None)
... except TypeError as te:
... print(te)
value should be an instance of float but is None.
>>> from math import nan
>>> try:
... float_to_str(nan)
... except ValueError as ve:
... print(ve)
nan => 'nan' is not a permitted float.
>>> from math import inf
>>> float_to_str(inf)
'inf'
>>> float_to_str(-inf)
'-inf'
>>> float_to_str(1e300)
'1e300'
>>> float_to_str(-1e300)
'-1e300'
>>> float_to_str(-1e-300)
'-1e-300'
>>> float_to_str(1e-300)
'1e-300'
>>> float_to_str(1e1)
'10'
>>> float_to_str(1e5)
'100000'
>>> float_to_str(1e10)
'10000000000'
>>> float_to_str(1e20)
'1e20'
>>> float_to_str(1e030)
'1e30'
>>> float_to_str(0.0)
'0'
>>> float_to_str(-0.0)
'0'
>>> print(repr(int_or_none_to_str(None)))
''
>>> print(int_or_none_to_str(12))
12
>>> try:
... int_or_none_to_str(True)
... except TypeError as te:
... print(te)
value should be an instance of int but is bool, namely 'True'.
>>> try:
... int_or_none_to_str(False)
... except TypeError as te:
... print(te)
value should be an instance of int but is bool, namely 'False'.
>>> print(int_or_none_to_str(-10))
-10
>>> try:
... int_or_none_to_str(1.0)
... except TypeError as te:
... print(te)
value should be an instance of int but is float, namely '1.0'.
num_to_str()
is returned.>>> print(repr(num_or_none_to_str(None)))
''
>>> print(num_or_none_to_str(12))
12
>>> print(num_or_none_to_str(12.3))
12.3
>>> try:
... num_or_none_to_str(True)
... except TypeError as te:
... print(te)
value should be an instance of any in {float, int} but is bool, namely 'True'.
>>> try:
... num_or_none_to_str(False)
... except TypeError as te:
... print(te)
value should be an instance of any in {float, int} but is bool, namely 'False'.
>>> from math import nan
>>> try:
... num_to_str(nan)
... except ValueError as ve:
... print(ve)
nan => 'nan' is not a permitted float.
float_to_str()
is returned. This means that nan will yield a ValueError and anything that is neither an int, bool, or float will incur a TypeError.>>> num_to_str(1)
'1'
>>> num_to_str(1.5)
'1.5'
>>> try:
... num_to_str(True)
... except TypeError as te:
... print(te)
value should be an instance of any in {float, int} but is bool, namely 'True'.
>>> try:
... num_to_str(False)
... except TypeError as te:
... print(te)
value should be an instance of any in {float, int} but is bool, namely 'False'.
>>> try:
... num_to_str("x")
... except TypeError as te:
... print(te)
value should be an instance of float but is str, namely 'x'.
>>> try:
... num_to_str(None)
... except TypeError as te:
... print(te)
value should be an instance of float but is None.
>>> from math import inf, nan
>>> try:
... num_to_str(nan)
... except ValueError as ve:
... print(ve)
nan => 'nan' is not a permitted float.
>>> num_to_str(inf)
'inf'
>>> num_to_str(-inf)
'-inf'
bool_to_str()
.str
) – the string value>>> str_to_bool("T")
:rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`bool\``
True
>>> str_to_bool("F")
False
>>> try:
... str_to_bool("x")
... except ValueError as v:
... print(v)
Expected 'T' or 'F', but got 'x'.
>>> try:
... str_to_bool(1)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
... str_to_bool(None)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> print(str_to_int_or_none(""))
None
>>> print(str_to_int_or_none("5"))
5
>>> print(str_to_int_or_none(None))
None
>>> print(str_to_int_or_none(" "))
None
>>> try:
... print(str_to_int_or_none(1.3))
... except TypeError as te:
... print(te)
value should be an instance of str but is float, namely '1.3'.
>>> try:
... print(str_to_int_or_none("1.3"))
... except ValueError as ve:
... print(ve)
invalid literal for int() with base 10: '1.3'
str
) – the string value>>> print(type(str_to_num("15.0")))
<class 'int'>
>>> print(type(str_to_num("15.1")))
<class 'float'>
>>> str_to_num("inf")
inf
>>> str_to_num(" -inf ")
-inf
>>> try:
... str_to_num(21)
... except TypeError as te:
... print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'int' object
>>> try:
... str_to_num("nan")
... except ValueError as ve:
... print(ve)
NaN is not permitted, but got 'nan'.
>>> try:
... str_to_num("12-3")
... except ValueError as ve:
... print(ve)
Invalid numerical value '12-3'.
>>> str_to_num("1e34423")
inf
>>> str_to_num("-1e34423")
-inf
>>> str_to_num("-1e-34423")
0
>>> str_to_num("1e-34423")
0
>>> try:
... str_to_num("-1e-34e4423")
... except ValueError as ve:
... print(ve)
Invalid numerical value '-1e-34e4423'.
>>> try:
... str_to_num("T")
... except ValueError as ve:
... print(ve)
Invalid numerical value 'T'.
>>> try:
... str_to_num("F")
... except ValueError as ve:
... print(ve)
Invalid numerical value 'F'.
>>> try:
... str_to_num(None)
... except TypeError as te:
... print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'NoneType' object
>>> try:
... str_to_num("")
... except ValueError as ve:
... print(ve)
Value '' becomes empty after stripping, cannot be converted to a number.
>>> print(type(str_to_num_or_none("15.0")))
<class 'int'>
>>> print(type(str_to_num_or_none("15.1")))
<class 'float'>
>>> str_to_num_or_none("inf")
inf
>>> str_to_num_or_none(" -inf ")
-inf
>>> try:
... str_to_num_or_none(21)
... except TypeError as te:
... print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'int' object
>>> try:
... str_to_num_or_none("nan")
... except ValueError as ve:
... print(ve)
NaN is not permitted, but got 'nan'.
>>> try:
... str_to_num_or_none("12-3")
... except ValueError as ve:
... print(ve)
Invalid numerical value '12-3'.
>>> str_to_num_or_none("1e34423")
inf
>>> str_to_num_or_none("-1e34423")
-inf
>>> str_to_num_or_none("-1e-34423")
0
>>> str_to_num_or_none("1e-34423")
0
>>> try:
... str_to_num_or_none("-1e-34e4423")
... except ValueError as ve:
... print(ve)
Invalid numerical value '-1e-34e4423'.
>>> try:
... str_to_num_or_none("T")
... except ValueError as ve:
... print(ve)
Invalid numerical value 'T'.
>>> try:
... str_to_num_or_none("F")
... except ValueError as ve:
... print(ve)
Invalid numerical value 'F'.
>>> print(str_to_num_or_none(""))
None
>>> print(str_to_num_or_none(None))
None
>>> print(type(str_to_num_or_none("5.0")))
<class 'int'>
>>> print(type(str_to_num_or_none("5.1")))
<class 'float'>
pycommons.strings.tools module¶
Union
[str
, Iterable
[str
]]) – the iterable of strings>>> get_prefix_str(["abc", "acd"])
'a'
>>> get_prefix_str(["xyz", "gsdf"])
''
>>> get_prefix_str([])
''
>>> get_prefix_str(["abx"])
'abx'
>>> get_prefix_str(("abx", ))
'abx'
>>> get_prefix_str({"abx", })
'abx'
>>> get_prefix_str("abx")
'abx'
>>> get_prefix_str(("\\relative.path", "\\relative.figure",
... "\\relative.code"))
'\\relative.'
>>> get_prefix_str({"\\relative.path", "\\relative.figure",
... "\\relative.code"})
'\\relative.'
>>> try:
... get_prefix_str(None)
... except TypeError as te:
... print(te)
strings should be an instance of any in {str, typing.Iterable} but is None.
>>> try:
... get_prefix_str(1)
... except TypeError as te:
... print(str(te)[:60])
strings should be an instance of any in {str, typing.Iterabl
>>> try:
... get_prefix_str(["abc", "acd", 2, "x"])
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
... get_prefix_str(["abc", "acd", None, "x"])
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> get_prefix_str(["xyz", "gsdf", 5])
''
Union
[str
, Pattern
]) – the regular expression to search, either a string or a patternUnion
[str
, Callable
[[Match
], str
]]) – the string to replace it with, or a function receiving a re.Match
instance and returning a replacement stringstr
) – the string in which to search/replace>>> replace_regex('[ \t]+\n', '\n', ' bla \nxyz\tabc\t\n')
' bla\nxyz\tabc\n'
>>> replace_regex('[0-9]A', 'X', '23A7AA')
'2XXA'
>>> from re import compile as cpx
>>> replace_regex(cpx('[0-9]A'), 'X', '23A7AA')
'2XXA'
>>> def __repl(a):
... print(repr(a))
... return "y"
>>> replace_regex("a.b", __repl, "albaab")
<re.Match object; span=(0, 3), match='alb'>
<re.Match object; span=(3, 6), match='aab'>
'yy'
>>> def __repl(a):
... print(repr(a))
... ss = a.group()
... print(ss)
... return "axb"
>>> replace_regex("aa.bb", __repl, "aaaaaxbbbbb")
<re.Match object; span=(3, 8), match='aaxbb'>
aaxbb
<re.Match object; span=(2, 7), match='aaxbb'>
aaxbb
<re.Match object; span=(1, 6), match='aaxbb'>
aaxbb
<re.Match object; span=(0, 5), match='aaxbb'>
aaxbb
'axb'
>>> replace_regex("aa.bb", "axb", "aaaaaxbbbbb")
'axb'
>>> replace_regex("aa.bb", "axb", "".join("a" * 100 + "y" + "b" * 100))
'axb'
>>> replace_regex("aa.bb", "axb",
... "".join("a" * 10000 + "y" + "b" * 10000))
'axb'
>>> try:
... replace_regex(1, "1", "2")
... except TypeError as te:
... print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
... replace_regex(None, "1", "2")
... except TypeError as te:
... print(te)
search should be an instance of any in {str, typing.Pattern} but is None.
>>> try:
... replace_regex("x", 2, "2")
... except TypeError as te:
... print(te)
replace should be an instance of str or a callable but is int, namely '2'.
>>> try:
... replace_regex("x", None, "2")
... except TypeError as te:
... print(te)
replace should be an instance of str or a callable but is None.
>>> try:
... replace_regex(1, 1, "2")
... except TypeError as te:
... print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
... replace_regex("yy", "1", 3)
... except TypeError as te:
... print(te)
inside should be an instance of str but is int, namely '3'.
>>> try:
... replace_regex("adad", "1", None)
... except TypeError as te:
... print(te)
inside should be an instance of str but is None.
>>> try:
... replace_regex(1, "1", 3)
... except TypeError as te:
... print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
... replace_regex(1, 3, 5)
... except TypeError as te:
... print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
... replace_regex("abab|baab|bbab|aaab|aaaa|bbbb", "baba",
... "ababababab")
... except ValueError as ve:
... print(str(ve)[:50])
Too many replacements, pattern re.compile('abab|ba
replace_regex()
for regular-expression based replacements.>>> replace_str("a", "b", "abc")
'bbc'
>>> replace_str("aa", "a", "aaaaa")
'a'
>>> replace_str("aba", "a", "abaababa")
'aa'
>>> replace_str("aba", "aba", "abaababa")
'abaababa'
>>> replace_str("aa", "aa", "aaaaaaaa")
'aaaaaaaa'
>>> replace_str("a", "aa", "aaaaaaaa")
'aaaaaaaaaaaaaaaa'
>>> replace_str("a", "xx", "aaaaaaaa")
'xxxxxxxxxxxxxxxx'
>>> try:
... replace_str(None, "a", "b")
... except TypeError as te:
... print(te)
replace() argument 1 must be str, not None
>>> try:
... replace_str(1, "a", "b")
... except TypeError as te:
... print(te)
replace() argument 1 must be str, not int
>>> try:
... replace_str("a", None, "b")
... except TypeError as te:
... print(te)
replace() argument 2 must be str, not None
>>> try:
... replace_str("x", 1, "b")
... except TypeError as te:
... print(te)
replace() argument 2 must be str, not int
>>> try:
... replace_str("a", "v", None)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> try:
... replace_str("x", "xy", 1)
... except TypeError as te:
... print(te)
descriptor '__len__' requires a 'str' object but received a 'int'