pycommons.strings package

Common string handling routines.

Submodules

pycommons.strings.chars module

Constants for common characters.

pycommons.strings.chars.NBDASH: Final[str] = '‑'

A non-breaking hyphen

pycommons.strings.chars.NBSP: Final[str] = '\xa0'

A constant for non-breaking space

pycommons.strings.chars.NEWLINE: Final[str] = '\n\r\x85\u2028\u2029'

A regular expression matching all characters that are non-line breaking white space.

pycommons.strings.chars.WHITESPACE: Final[str] = '\t\x0b\x0c \xa0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u202f\u205f\u3000'

A regular expression matching all characters that are non-line breaking white space.

pycommons.strings.chars.WHITESPACE_OR_NEWLINE: Final[str] = '\t\n\x0b\x0c\r \x85\xa0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000'

A regular expression matching any white space or newline character.

pycommons.strings.chars.subscript(s)[source]

Transform a string into Unicode-based subscript.

All characters that can be represented as subscript in unicode will be translated to subscript. Notice that only a subset of the latin characters can be converted to unicode subscript. If any character cannot be translated, it will raise a KeyError. White space is preserved.

Parameters:

s (str) – the string

Return type:

str

Returns:

the string in subscript

Raises:
>>> subscript("a0= 4(e)")
'ₐ₀₌ ₄₍ₑ₎'
>>> try:
...     subscript("a0=4(e)Y")
... except KeyError as ke:
...     print(ke)
'Y'
>>> try:
...     subscript(None)
... except TypeError as te:
...     print(te)
descriptor '__iter__' requires a 'str' object but received a 'NoneType'
>>> try:
...     superscript(1)
... except TypeError as te:
...     print(te)
descriptor '__iter__' requires a 'str' object but received a 'int'
pycommons.strings.chars.superscript(s)[source]

Transform a string into Unicode-based superscript.

All characters that can be represented as superscript in unicode will be translated to superscript. Notice that only a subset of the latin characters can be converted to unicode superscropt. If any character cannot be translated, it will raise a KeyError. White space is preserved.

Parameters:

s (str) – the string

Return type:

str

Returns:

the string in subscript

Raises:
>>> superscript("a0 =4(e)")
'ᵃ⁰ ⁼⁴⁽ᵉ⁾'
>>> try:
...     superscript("a0=4(e)Y")
... except KeyError as ke:
...     print(ke)
'Y'
>>> try:
...     superscript(None)
... except TypeError as te:
...     print(te)
descriptor '__iter__' requires a 'str' object but received a 'NoneType'
>>> try:
...     superscript(1)
... except TypeError as te:
...     print(te)
descriptor '__iter__' requires a 'str' object but received a 'int'

pycommons.strings.enforce module

Routines for checking whether a value is a non-empty string w/o spaces.

pycommons.strings.enforce.enforce_non_empty_str(value)[source]

Enforce that a text is a non-empty string.

Parameters:

value (Any) – the value to be checked whether it is a non-empty string

Return type:

str

Returns:

the value, but as type str

Raises:
>>> enforce_non_empty_str("1")
'1'
>>> enforce_non_empty_str(" 1 1 ")
' 1 1 '
>>> try:
...     enforce_non_empty_str("")
... except ValueError as ve:
...     print(ve)
Non-empty str expected, but got empty string.
>>> try:
...     enforce_non_empty_str(1)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
...     enforce_non_empty_str(None)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
pycommons.strings.enforce.enforce_non_empty_str_without_ws(value)[source]

Enforce that a text is a non-empty string without white space.

Parameters:

value (Any) – the value to be checked whether it is a non-empty string without any white space

Return type:

str

Returns:

the value, but as type str

Raises:
  • TypeError – if value is not a str

  • ValueError – if value is empty or contains any white space characters

>>> enforce_non_empty_str_without_ws("1")
'1'
>>> try:
...     enforce_non_empty_str_without_ws(" 1 1 ")
... except ValueError as ve:
...     print(ve)
No white space allowed in string, but got ' 1 1 '.
>>> try:
...     enforce_non_empty_str_without_ws("a\tb")
... except ValueError as ve:
...     print(ve)
No white space allowed in string, but got 'a\tb'.
>>> try:
...     enforce_non_empty_str_without_ws("012345678901234567890 12345678")
... except ValueError as ve:
...     print(ve)
No white space allowed in string, but got '012345678901234567890 12345678'.
>>> try:
...     enforce_non_empty_str_without_ws(
...         "012345678901234567890 1234567801234567890123456789012345678")
... except ValueError as ve:
...     print(str(ve)[10:])
pace allowed in string, but got '012345678901234567890 12345678...'.
>>> try:
...     enforce_non_empty_str_without_ws("")
... except ValueError as ve:
...     print(ve)
Non-empty str expected, but got empty string.
>>> try:
...     enforce_non_empty_str_without_ws(1)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
...     enforce_non_empty_str_without_ws(None)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'

pycommons.strings.string_conv module

Converting stuff to and from strings.

pycommons.strings.string_conv.bool_to_str(value)[source]

Convert a Boolean value to a string.

This function is the inverse of str_to_bool().

Parameters:

value (bool) – the Boolean value

Returns “T””T”:

if value == True

Returns “F””F”:

if value == False

Raises:

TypeError – if value is not a bool

Return type:

str

>>> print(bool_to_str(True))
T
>>> print(bool_to_str(False))
F
>>> try:
...     bool_to_str("t")
... except TypeError as te:
...     print(te)
value should be an instance of bool but is str, namely 't'.
>>> try:
...     bool_to_str(None)
... except TypeError as te:
...     print(te)
value should be an instance of bool but is None.
pycommons.strings.string_conv.datetime_to_date_str(date)[source]

Convert a datetime object to a date string.

Parameters:

date (datetime) – the date

Return type:

str

Returns:

the date string

Raises:

TypeError – if date is not an instance of datetime.datetime.

>>> datetime_to_date_str(datetime(1999, 12, 21))
'1999‑12‑21'
>>> try:
...     datetime_to_date_str(None)
... except TypeError as te:
...     print(te)
date should be an instance of datetime.datetime but is None.
>>> try:
...     datetime_to_date_str(1)
... except TypeError as te:
...     print(te)
date should be an instance of datetime.datetime but is int, namely 1.
pycommons.strings.string_conv.datetime_to_datetime_str(dateandtime)[source]

Convert a datetime object to a date-time string.

Parameters:

dateandtime (datetime) – the date and time

Return type:

str

Returns:

the date-time string

Raises:

TypeError – if dateandtime is not an instance of datetime.datetime.

>>> datetime_to_datetime_str(datetime(1999, 12, 21, 13, 42, 23))
'1999\u201112\u201121\xa013:42'
>>> from datetime import timezone
>>> datetime_to_datetime_str(datetime(1999, 12, 21, 13, 42,
...                                   tzinfo=timezone.utc))
'1999\u201112\u201121\xa013:42\xa0UTC'
>>> try:
...     datetime_to_datetime_str(None)
... except TypeError as te:
...     print(te)
dateandtime should be an instance of datetime.datetime but is None.
>>> try:
...     datetime_to_datetime_str(1)
... except TypeError as te:
...     print(str(te)[:60])
dateandtime should be an instance of datetime.datetime but i
pycommons.strings.string_conv.float_to_str(value)[source]

Convert float to a string.

The floating point value value is converted to a string.

Parameters:

value (float) – the floating point value

Return type:

str

Returns:

the string representation

Raises:
>>> float_to_str(1.3)
'1.3'
>>> float_to_str(1.0)
'1'
>>> float_to_str(1e-5)
'1e-5'
>>> try:
...     float_to_str(1)
... except TypeError as te:
...     print(te)
value should be an instance of float but is int, namely 1.
>>> try:
...     float_to_str(None)
... except TypeError as te:
...     print(te)
value should be an instance of float but is None.
>>> from math import nan
>>> try:
...     float_to_str(nan)
... except ValueError as ve:
...     print(ve)
nan => 'nan' is not a permitted float.
>>> from math import inf
>>> float_to_str(inf)
'inf'
>>> float_to_str(-inf)
'-inf'
>>> float_to_str(1e300)
'1e300'
>>> float_to_str(-1e300)
'-1e300'
>>> float_to_str(-1e-300)
'-1e-300'
>>> float_to_str(1e-300)
'1e-300'
>>> float_to_str(1e1)
'10'
>>> float_to_str(1e5)
'100000'
>>> float_to_str(1e10)
'10000000000'
>>> float_to_str(1e20)
'1e20'
>>> float_to_str(1e030)
'1e30'
>>> float_to_str(0.0)
'0'
>>> float_to_str(-0.0)
'0'
pycommons.strings.string_conv.int_or_none_to_str(value)[source]

Convert an integer or None to a string.

If value is None, “” is returned. If value is an instance of bool, a TypeError is raised. If value is an int, str(val) is returned. Otherwise, a TypeError is thrown.

Parameters:

value (int | None) – the value

Return type:

str

Returns:

the string representation, ‘’ for None

Returns “”””:

if value is None

Returns int.__str__(value)int.__str__(value):

otherwise

Raises:

TypeError – if value is a bool (notice that bool is a subclass of int) or any other non-int type.

>>> print(repr(int_or_none_to_str(None)))
''
>>> print(int_or_none_to_str(12))
12
>>> try:
...     int_or_none_to_str(True)
... except TypeError as te:
...     print(te)
value should be an instance of int but is bool, namely True.
>>> try:
...     int_or_none_to_str(False)
... except TypeError as te:
...     print(te)
value should be an instance of int but is bool, namely False.
>>> print(int_or_none_to_str(-10))
-10
>>> try:
...     int_or_none_to_str(1.0)
... except TypeError as te:
...     print(te)
value should be an instance of int but is float, namely 1.0.
pycommons.strings.string_conv.num_or_none_to_str(value)[source]

Convert a numerical type (int, float) or None to a string.

If value is None, then “” is returned. Otherwise, the result of num_to_str() is returned.

Parameters:

value (int | float | None) – the value

Return type:

str

Returns:

the string representation, “” for None

Returns “”””:

if value is None

Returns num_to_str(value)num_to_str(value):

otherwise

Raises:
  • TypeError – if value not Nont but instead is a bool (notice that bool is a subclass of int) or any other type that is neither int nor float.

  • ValueError – if value is not-a-number

>>> print(repr(num_or_none_to_str(None)))
''
>>> print(num_or_none_to_str(12))
12
>>> print(num_or_none_to_str(12.3))
12.3
>>> try:
...     num_or_none_to_str(True)
... except TypeError as te:
...     print(te)
value should be an instance of any in {float, int} but is bool, namely True.
>>> try:
...     num_or_none_to_str(False)
... except TypeError as te:
...     print(te)
value should be an instance of any in {float, int} but is bool, namely False.
>>> from math import nan
>>> try:
...     num_to_str(nan)
... except ValueError as ve:
...     print(ve)
nan => 'nan' is not a permitted float.
pycommons.strings.string_conv.num_to_str(value)[source]

Transform a numerical value which is either int or`float` to a string.

If value is an instance of int, the result of its conversion via str will be returned. If value is an instance of bool, a TypeError will be raised. Otherwise, the result of float_to_str() is returned. This means that nan will yield a ValueError and anything that is neither an int, bool, or float will incur a TypeError.

Parameters:

value (int | float) – the value

Return type:

str

Returns:

the string

Raises:
  • TypeError – if value is a bool (notice that bool is a subclass of int) or any other type that is neither int nor float.

  • ValueError – if value is not-a-number

>>> num_to_str(1)
'1'
>>> num_to_str(1.5)
'1.5'
>>> try:
...     num_to_str(True)
... except TypeError as te:
...     print(te)
value should be an instance of any in {float, int} but is bool, namely True.
>>> try:
...     num_to_str(False)
... except TypeError as te:
...     print(te)
value should be an instance of any in {float, int} but is bool, namely False.
>>> try:
...     num_to_str("x")
... except TypeError as te:
...     print(te)
value should be an instance of float but is str, namely 'x'.
>>> try:
...     num_to_str(None)
... except TypeError as te:
...     print(te)
value should be an instance of float but is None.
>>> from math import inf, nan
>>> try:
...     num_to_str(nan)
... except ValueError as ve:
...     print(ve)
nan => 'nan' is not a permitted float.
>>> num_to_str(inf)
'inf'
>>> num_to_str(-inf)
'-inf'
pycommons.strings.string_conv.str_to_bool(value)[source]

Convert a string to a boolean value.

This function is the inverse of bool_to_str().

Parameters:

value (str) – the string value

Returns TrueTrue:

if value == “T”

Returns FalseFalse:

if value == “F”

Raises:
Return type:

bool

>>> str_to_bool("T")
True
>>> str_to_bool("F")
False
>>> try:
...     str_to_bool("x")
... except ValueError as v:
...     print(v)
Expected 'T' or 'F', but got 'x'.
>>> try:
...     str_to_bool(1)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
...     str_to_bool(None)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
pycommons.strings.string_conv.str_to_int_or_none(value)[source]

Convert a string to an int or None.

If the value value is None, then None is returned. If the vlaue value is empty or entirely composed of white space, None is returned. If the value value can be converted to an integer, then an int with the corresponding value is returned. Otherwise, a ValueError is thrown.

Parameters:

value (str | None) – the string value, or None

Return type:

int | None

Returns:

the int or None

Raises:
  • TypeError – if value is neither a str nor None

  • ValueError – if value is a str but cannot be base-10 converted to an integer

>>> print(str_to_int_or_none(""))
None
>>> print(str_to_int_or_none("5"))
5
>>> print(str_to_int_or_none(None))
None
>>> print(str_to_int_or_none("  "))
None
>>> try:
...     print(str_to_int_or_none(1.3))
... except TypeError as te:
...     print(te)
value should be an instance of str but is float, namely 1.3.
>>> try:
...     print(str_to_int_or_none("1.3"))
... except ValueError as ve:
...     print(ve)
invalid literal for int() with base 10: '1.3'
pycommons.strings.string_conv.str_to_num(value)[source]

Convert a string to an int or float.

If value is not an instance of str, a TypeError will be raised. If the value value can be converted to an integer, then an int with the corresponding value is returned. If the value value can be converted to a float, a float with the appropriate value is returned. Otherwise, a ValueError is thrown.

Parameters:

value (str) – the string value

Return type:

int | float

Returns:

the int or float: Integers are preferred to be used whereever possible

Raises:
  • TypeError – if value is not a str

  • ValueError – if value is a str but cannot be converted to an integer (base-10) or converts to a float which is not a number

>>> print(type(str_to_num("15.0")))
<class 'int'>
>>> print(type(str_to_num("15.1")))
<class 'float'>
>>> str_to_num("inf")
inf
>>> str_to_num("  -inf  ")
-inf
>>> try:
...     str_to_num(21)
... except TypeError as te:
...     print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'int' object
>>> try:
...     str_to_num("nan")
... except ValueError as ve:
...     print(ve)
NaN is not permitted, but got 'nan'.
>>> try:
...     str_to_num("12-3")
... except ValueError as ve:
...     print(ve)
Invalid numerical value '12-3'.
>>> str_to_num("1e34423")
inf
>>> str_to_num("-1e34423")
-inf
>>> str_to_num("-1e-34423")
0
>>> str_to_num("1e-34423")
0
>>> try:
...     str_to_num("-1e-34e4423")
... except ValueError as ve:
...     print(ve)
Invalid numerical value '-1e-34e4423'.
>>> try:
...     str_to_num("T")
... except ValueError as ve:
...     print(ve)
Invalid numerical value 'T'.
>>> try:
...     str_to_num("F")
... except ValueError as ve:
...     print(ve)
Invalid numerical value 'F'.
>>> try:
...     str_to_num(None)
... except TypeError as te:
...     print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'NoneType' object
>>> try:
...     str_to_num("")
... except ValueError as ve:
...     print(ve)
Value '' becomes empty after stripping, cannot be converted to a number.
pycommons.strings.string_conv.str_to_num_or_none(value)[source]

Convert a string to an int or float or None.

If the value value is None, then None is returned. If the vlaue value is empty or entirely composed of white space, None is returned. If the value value can be converted to an integer, then an int with the corresponding value is returned. If the value value can be converted to a float, a float with the appropriate value is returned. Otherwise, a ValueError is thrown.

Parameters:

value (str | None) – the string value

Return type:

int | float | None

Returns:

the int or float or None

Raises:
  • TypeError – if value is neither a str nor None

  • ValueError – if value is a str but cannot be converted to an integer (base-10) or converts to a float which is not a number

>>> print(type(str_to_num_or_none("15.0")))
<class 'int'>
>>> print(type(str_to_num_or_none("15.1")))
<class 'float'>
>>> str_to_num_or_none("inf")
inf
>>> str_to_num_or_none("  -inf  ")
-inf
>>> try:
...     str_to_num_or_none(21)
... except TypeError as te:
...     print(te)
descriptor 'strip' for 'str' objects doesn't apply to a 'int' object
>>> try:
...     str_to_num_or_none("nan")
... except ValueError as ve:
...     print(ve)
NaN is not permitted, but got 'nan'.
>>> try:
...     str_to_num_or_none("12-3")
... except ValueError as ve:
...     print(ve)
Invalid numerical value '12-3'.
>>> str_to_num_or_none("1e34423")
inf
>>> str_to_num_or_none("-1e34423")
-inf
>>> str_to_num_or_none("-1e-34423")
0
>>> str_to_num_or_none("1e-34423")
0
>>> try:
...     str_to_num_or_none("-1e-34e4423")
... except ValueError as ve:
...     print(ve)
Invalid numerical value '-1e-34e4423'.
>>> try:
...     str_to_num_or_none("T")
... except ValueError as ve:
...     print(ve)
Invalid numerical value 'T'.
>>> try:
...     str_to_num_or_none("F")
... except ValueError as ve:
...     print(ve)
Invalid numerical value 'F'.
>>> print(str_to_num_or_none(""))
None
>>> print(str_to_num_or_none(None))
None
>>> print(type(str_to_num_or_none("5.0")))
<class 'int'>
>>> print(type(str_to_num_or_none("5.1")))
<class 'float'>

pycommons.strings.tools module

Routines for handling strings.

pycommons.strings.tools.get_prefix_str(strings)[source]

Compute the common prefix of an iterable of strings.

Parameters:

strings (Union[str, Iterable[str]]) – the iterable of strings

Return type:

str

Returns:

the common prefix

Raises:

TypeError – if the input is not a string, iterable of string, or contains any non-string element (before the prefix is determined) Notice: If the prefix is determined as the empty string, then the search is stopped. If some non-str items follow later in strings, then these may not raise a TypeError

>>> get_prefix_str(["abc", "acd"])
'a'
>>> get_prefix_str(["xyz", "gsdf"])
''
>>> get_prefix_str([])
''
>>> get_prefix_str(["abx"])
'abx'
>>> get_prefix_str(("abx", ))
'abx'
>>> get_prefix_str({"abx", })
'abx'
>>> get_prefix_str("abx")
'abx'
>>> get_prefix_str(("\\relative.path", "\\relative.figure",
...     "\\relative.code"))
'\\relative.'
>>> get_prefix_str({"\\relative.path", "\\relative.figure",
...     "\\relative.code"})
'\\relative.'
>>> try:
...     get_prefix_str(None)
... except TypeError as te:
...     print(te)
strings should be an instance of any in {str, typing.Iterable} but is None.
>>> try:
...     get_prefix_str(1)
... except TypeError as te:
...     print(str(te)[:60])
strings should be an instance of any in {str, typing.Iterabl
>>> try:
...     get_prefix_str(["abc", "acd", 2, "x"])
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
...     get_prefix_str(["abc", "acd", None, "x"])
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> get_prefix_str(["xyz", "gsdf", 5])
''
pycommons.strings.tools.replace_regex(search, replace, inside)[source]

Replace all occurrences of ‘search’ in ‘inside’ with ‘replace’.

This replacement procedure is done repetitively and recursively until no occurrence of search is found anymore. This, of course, may lead to an endless loop, so a ValueError is thrown if there are too many recursive replacements.

Parameters:
  • search (Union[str, Pattern]) – the regular expression to search, either a string or a pattern

  • replace (Union[str, Callable[[Match], str]]) – the string to replace it with, or a function receiving a re.Match instance and returning a replacement string

  • inside (str) – the string in which to search/replace

Return type:

str

Returns:

the new string after the recursive replacement

Raises:
  • TypeError – if any of the parameters is not of the right type

  • ValueError – if there are 100000 recursive replacements or more, indicating that there could be an endless loop

>>> replace_regex('[ \t]+\n', '\n', ' bla \nxyz\tabc\t\n')
' bla\nxyz\tabc\n'
>>> replace_regex('[0-9]A', 'X', '23A7AA')
'2XXA'
>>> from re import compile as cpx
>>> replace_regex(cpx('[0-9]A'), 'X', '23A7AA')
'2XXA'
>>> def __repl(a):
...     print(repr(a))
...     return "y"
>>> replace_regex("a.b", __repl, "albaab")
<re.Match object; span=(0, 3), match='alb'>
<re.Match object; span=(3, 6), match='aab'>
'yy'
>>> def __repl(a):
...     print(repr(a))
...     ss = a.group()
...     print(ss)
...     return "axb"
>>> replace_regex("aa.bb", __repl, "aaaaaxbbbbb")
<re.Match object; span=(3, 8), match='aaxbb'>
aaxbb
<re.Match object; span=(2, 7), match='aaxbb'>
aaxbb
<re.Match object; span=(1, 6), match='aaxbb'>
aaxbb
<re.Match object; span=(0, 5), match='aaxbb'>
aaxbb
'axb'
>>> replace_regex("aa.bb", "axb", "aaaaaxbbbbb")
'axb'
>>> replace_regex("aa.bb", "axb", "".join("a" * 100 + "y" + "b" * 100))
'axb'
>>> replace_regex("aa.bb", "axb",
...               "".join("a" * 10000 + "y" + "b" * 10000))
'axb'
>>> try:
...    replace_regex(1, "1", "2")
... except TypeError as te:
...    print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
...    replace_regex(None, "1", "2")
... except TypeError as te:
...    print(te)
search should be an instance of any in {str, typing.Pattern} but is None.
>>> try:
...    replace_regex("x", 2, "2")
... except TypeError as te:
...    print(te)
replace should be an instance of str or a callable but is int, namely 2.
>>> try:
...    replace_regex("x", None, "2")
... except TypeError as te:
...    print(te)
replace should be an instance of str or a callable but is None.
>>> try:
...    replace_regex(1, 1, "2")
... except TypeError as te:
...    print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
...    replace_regex("yy", "1", 3)
... except TypeError as te:
...    print(te)
inside should be an instance of str but is int, namely 3.
>>> try:
...    replace_regex("adad", "1", None)
... except TypeError as te:
...    print(te)
inside should be an instance of str but is None.
>>> try:
...    replace_regex(1, "1", 3)
... except TypeError as te:
...    print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
...    replace_regex(1, 3, 5)
... except TypeError as te:
...    print(str(te)[0:60])
search should be an instance of any in {str, typing.Pattern}
>>> try:
...     replace_regex("abab|baab|bbab|aaab|aaaa|bbbb", "baba",
...                   "ababababab")
... except ValueError as ve:
...     print(str(ve)[:50])
Too many replacements, pattern re.compile('abab|ba
pycommons.strings.tools.replace_str(find, replace, src)[source]

Perform a recursive replacement of strings.

After applying this function, there will not be any occurence of find left in src. All of them will have been replaced by replace. If that produces new instances of find, these will be replaced as well unless they do not make the string shorter. In other words, the replacement is continued only if the new string becomes shorter.

See replace_regex() for regular-expression based replacements.

Parameters:
  • find (str) – the string to find

  • replace (str) – the string with which it will be replaced

  • src (str) – the string in which we search

Return type:

str

Returns:

the string src, with all occurrences of find replaced by replace

Raises:

TypeError – if any of the parameters are not strings

>>> replace_str("a", "b", "abc")
'bbc'
>>> replace_str("aa", "a", "aaaaa")
'a'
>>> replace_str("aba", "a", "abaababa")
'aa'
>>> replace_str("aba", "aba", "abaababa")
'abaababa'
>>> replace_str("aa", "aa", "aaaaaaaa")
'aaaaaaaa'
>>> replace_str("a", "aa", "aaaaaaaa")
'aaaaaaaaaaaaaaaa'
>>> replace_str("a", "xx", "aaaaaaaa")
'xxxxxxxxxxxxxxxx'
>>> try:
...     replace_str(None, "a", "b")
... except TypeError as te:
...     print(te)
replace() argument 1 must be str, not None
>>> try:
...     replace_str(1, "a", "b")
... except TypeError as te:
...     print(te)
replace() argument 1 must be str, not int
>>> try:
...     replace_str("a", None, "b")
... except TypeError as te:
...     print(te)
replace() argument 2 must be str, not None
>>> try:
...     replace_str("x", 1, "b")
... except TypeError as te:
...     print(te)
replace() argument 2 must be str, not int
>>> try:
...     replace_str("a", "v", None)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> try:
...     replace_str("x", "xy", 1)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
pycommons.strings.tools.split_str(source, split_by)[source]

Split a string by the given other string.

The goal is to provide a less memory intense variant of the method str.split(). This routine should iteratively divide a given string based on a splitting character or string. This function may be useful if we are dealing with a very big source string and want to iteratively split it into smaller strings. Instead of creating a list with many small strings, what str.split() does, it creates these strings iteratively

Parameters:
  • source (str) – the source string

  • split_by (str) – the split string

Return type:

Generator[str, None, None]

Returns:

each split element

>>> list(split_str("", ""))
['']
>>> list(split_str("", "x"))
['']
>>> list(split_str("a", ""))
['a']
>>> list(split_str("abc", ""))
['a', 'b', 'c']
>>> list(split_str("a;b;c", ";"))
['a', 'b', 'c']
>>> list(split_str("a;b;c;", ";"))
['a', 'b', 'c', '']
>>> list(split_str(";a;b;;c;", ";"))
['', 'a', 'b', '', 'c', '']
>>> list(split_str("a;aaa;aba;aa;aca;a", "a;a"))
['', 'a', 'b', '', 'c', '']