pycommons.ds package

Some common and shared data structures.

Submodules

pycommons.ds.cache module

A factory for functions checking whether argument values are new.

pycommons.ds.cache.str_is_new()[source]

Create a function returning True when seeing new str values.

Creates a function which returns True only the first time it receives a given string argument and False all subsequent times. This is based on https://stackoverflow.com/questions/27427067

Return type:

Callable[[str], bool]

Returns:

a function str_is_new(xx) that will return True the first time it encounters any value xx and False for all values it has already seen

>>> check = str_is_new()
>>> print(check("a"))
True
>>> print(check("a"))
False
>>> print(check("b"))
True
>>> print(check("b"))
False

pycommons.ds.immutable_map module

An immutable version of the typing.Mapping interface.

class pycommons.ds.immutable_map.K

the type variable for mapping keys

alias of TypeVar(‘K’)

class pycommons.ds.immutable_map.V

the type variable for mapping values

alias of TypeVar(‘V’)

pycommons.ds.immutable_map.immutable_mapping(a)[source]

Create an immutable view of a Mapping.

Parameters:

a (Mapping[TypeVar(K), TypeVar(V)]) – the input Mapping

Return type:

Mapping[TypeVar(K), TypeVar(V)]

Returns:

an immutable view on the Mapping a (the view will change if a is changed, but you cannot change a via the view)

>>> x = {1: 1, 2: 7, 3: 8}
>>> y = immutable_mapping(x)
>>> x is y
False
>>> x == y
True
>>> x[1] == y[1]
True
>>> x[2] == y[2]
True
>>> x[3] == y[3]
True
>>> z = immutable_mapping(x)
>>> x is z
False
>>> x == z
True
>>> y is z
False
>>> z = immutable_mapping(y)
>>> x is z
False
>>> y is z
True
>>> x == z
True
>>> x[9] = 23
>>> y[9] == x[9]
True
>>> try:
...     y[1] = 2
... except TypeError as te:
...     print(te)
'mappingproxy' object does not support item assignment
>>> try:
...     immutable_mapping(5)
... except TypeError as e:
...     print(e)
a should be an instance of typing.Mapping but is int, namely 5.

pycommons.ds.sequences module

Tools for working with sequences.

class pycommons.ds.sequences.T

the type of the element of the sequences to process

alias of TypeVar(‘T’)

pycommons.ds.sequences.merge_sorted_and_return_unique(*seqs)[source]

Merge sorted sequences of integers and return only unique values.

You can provide multiple sequences, all of which must be sorted. This function then merges them into a single sorted sequence which contains each elemenet at most once. A typical use case would be to combine the result of pycommons.math.primes.primes() with some pre-defined values into a sorted sequence.

Notice that the elements of the sequence must support the less-than operator, i.e., have a __lt__ dunder method. Otherwise this function will crash.

The returned sequence is guaranteed to provide strictly increasing values.

Parameters:

seqs (Iterable[TypeVar(T)]) – the sequences, i.e., some instances of Iterable or Iterator

Return type:

Generator[TypeVar(T), None, None]

Returns:

a merged sequence of integers

Raises:

TypeError – if any of the provided iterators or any of their elements is None, or if any of the elements in seqs`is not an :class:`Iterable.

>>> list(merge_sorted_and_return_unique([1, 2, 3,], [2, 2]))
[1, 2, 3]
>>> from pycommons.math.primes import primes
>>> list(merge_sorted_and_return_unique(primes(14), [1, 10]))
[1, 2, 3, 5, 7, 10, 11, 13]
>>> list(merge_sorted_and_return_unique(
...     primes(14), primes(17), [1, 2, 10, 100]))
[1, 2, 3, 5, 7, 10, 11, 13, 17, 100]
>>> try:
...     for _ in merge_sorted_and_return_unique(1):
...         pass
... except TypeError as te:
...     print(te)
'int' object is not iterable
>>> try:
...     for j in merge_sorted_and_return_unique([3], 1):
...         print(j)
... except TypeError as te:
...     print(te)
'int' object is not iterable
>>> try:
...     for j in merge_sorted_and_return_unique([None], [None]):
...         print(j)
... except TypeError as te:
...     print(te)
Element must not be None.
>>> try:
...     for j in merge_sorted_and_return_unique([None], [1]):
...         print(j)
... except TypeError as te:
...     print(te)
'<' not supported between instances of 'NoneType' and 'int'
>>> try:
...     for j in merge_sorted_and_return_unique(None, [1]):
...         print(j)
... except TypeError as te:
...     print(te)
'NoneType' object is not iterable
>>> try:
...     for j in merge_sorted_and_return_unique([print, len], [repr]):
...         print(j)
... except TypeError as te:
...     print(te)
'<' not supported between instances of 'builtin_function_or_method' and 'builtin_function_or_method'
pycommons.ds.sequences.reiterable(source)[source]

Ensure that an Iterable can be iterated over multiple times.

This function will solidify an Iterator into an Iterable. In Python, Iterator is a sub-class of Iterable. This means that if your function accepts instances of Iterable as input, it may expect to be able to iterate over them multiple times. However, if an Iterator is passed in, which also is an instance of Iterable and thus fulfills the function’s type requirement, this is not the case. A typical example of this would be if a Generator is passed in. A Generator is an instance of Iterator, which, in turn, is an instance of Iterable. However, you can iterate over a Generator only once.

For such single-use objects, a new Iterable wrapper is created. This wrapper will iterate over the original sequence, but cache all elements in an internal list. When you iterate over the sequence again, the elements in the list will be used. This means that all elements of the original sequence will be stored in memory. However, they are only stored if/when they are actually accessed via the iteration sequence. If you do not iterate over them completely, they are not all stored.

This form of re-iterabling is useful if you maybe generate items from a slower sequence or do not plan to use all of them. If you want to use all elements several times anyway, it may be more efficient to just wrap the original source into a tuple. But if, for example, your sequence is the result of iterating over a directory tree on the file system, or maybe if it comes from loading a file, then using reiterable() could be useful.

This is also true if you actually process the generated sequence in some way that may fail or terminate early. Then, first loading all data into a tuple may be annoying if your first processed element after that causes a failure or early termination. The bit of overhead of reiterable() may then well be worth your while.

Of course, this can only work if the Iterator is not otherwise used after calling this function. If you extract elements from the Iterator by yourself otherwise, maybe via next(), then reiterable() cannot work. However, if you only apply next() or other looping paradigms to the Iterable returned by reiterable(), then you can iterate as often as you want over a Generator, for example.

Parameters:

source (Union[Iterable[TypeVar(T)], Iterator[TypeVar(T)]]) – the data source

Return type:

Iterable[TypeVar(T)]

Returns:

the resulting re-iterable iterator

Raises:

TypeError – if source is neither an Iterable nor an Iterator.

>>> g = (i ** 2 for i in range(5))
>>> r = reiterable(g)
>>> tuple(r)
(0, 1, 4, 9, 16)
>>> tuple(r)
(0, 1, 4, 9, 16)
>>> tuple(r)
(0, 1, 4, 9, 16)
>>> tuple(r)
(0, 1, 4, 9, 16)
>>> tuple(g)
()
>>> g = (i ** 2 for i in range(5))
>>> r = reiterable(g)
>>> i1 = iter(r)
>>> i2 = iter(r)
>>> next(i1)
0
>>> next(i2)
0
>>> next(i2)
1
>>> next(i1)
1
>>> next(i1)
4
>>> next(i1)
9
>>> next(i2)
4
>>> next(i2)
9
>>> i3 = iter(r)
>>> next(i3)
0
>>> next(i3)
1
>>> next(i3)
4
>>> next(i3)
9
>>> next(i3)
16
>>> next(i2)
16
>>> try:
...     next(i2)
... except StopIteration as si:
...     print(type(si))
<class 'StopIteration'>
>>> try:
...     next(i3)
... except StopIteration as si:
...     print(type(si))
<class 'StopIteration'>
>>> next(i1)
16
>>> try:
...     next(i1)
... except StopIteration as si:
...     print(type(si))
<class 'StopIteration'>
>>> a = [1, 2, 3]
>>> reiterable(a) is a
True
>>> a = (1, 2, 3)
>>> reiterable(a) is a
True
>>> a = {1, 2, 3}
>>> reiterable(a) is a
True
>>> a = {1: 1, 2: 2, 3: 3}
>>> reiterable(a) is a
True
>>> k = a.keys()
>>> reiterable(k) is k
True
>>> k = a.values()
>>> reiterable(k) is k
True
>>> tuple(reiterable((x for x in range(5))))
(0, 1, 2, 3, 4)
>>> try:
...     reiterable(None)
... except TypeError as te:
...     print(str(te)[:60])
source should be an instance of any in {typing.Iterable, typ
>>> try:
...     reiterable(1)
... except TypeError as te:
...     print(str(te)[:60])
source should be an instance of any in {typing.Iterable, typ
>>> type(merge_sorted_and_return_unique([1, 2, 3,], [2, 2]))
<class 'generator'>
>>> type(reiterable(merge_sorted_and_return_unique([1, 2, 3,], [2, 2])))
<class 'pycommons.ds.sequences.__Reiterator'>