pycommons.net package

Utilities for networking.

Submodules

pycommons.net.url module

Come string splitting and processing routines.

class pycommons.net.url.URL(value: Any, base_url: Any | None = None)[source]

Bases: str

A normalized and expanded URL.

This is a very strict URL parsing routine. The idea is that it will only produce URLs that are safe for use in almost any environment and throw exceptions otherwise.

We limit the URLs to very few different types and allowed schemes. Non-ASCII characters are not allowed, and neither are spaces, ‘%’, ‘*’, ‘?’, ‘+’, ‘&’, ‘<’, ‘>’, ‘,’, ‘$’, ‘§’, “’”, ‘”’, ‘[’, ‘]’, ‘{’, ‘}’, ‘(’, ‘)’, ` nor ‘' and a few more.

We also allow ‘@’ to occur at most once. This means that URLs cannot have any parameters and also that URL-escaping non-ASCII characters is not possible either. We thus limit the URLs to mainly static content pointers.

We also only permit simple schemes such as http, https, mailto, and ssh.

The final URL also cannot contain any ‘/./’ or ‘/../’ or consist of any component that equals ‘..’. No URL or component must be longer than 255 characters either. It is also not allowed that ‘://’ occurs twice. If the URL is a mailto or ssh URL, it must provide a username component.

If a port is provided, it must be greater than 0 and less than 65536. If a port is specified, a host must be specified as well. Only if a netloc is found, then a port or a host may be specified.

The URL value may be a relative URL that is turned into an absolute URL using the base URL base_url. Of course, then the same restrictions apply to the relative original URL, the base URL, and the final absolute URL.

This function tries to detect email addresses and turns them into valid mailto:// urls. This function gobbles up single trailing / characters.

An instance of URL is also an instance of str, so you can use it as string whereever you want. It additionally offers the following attributes:

  • scheme: the URL scheme, e.g., “http”

  • netloc: the URL network location, including user (if any),

    host, and port (if any)

  • host: the host of the URL

  • port: the port of the URL, or None if no port is

    specified

  • path: the path part of the URL (without the

    fragment part, if any), or None if no path part is specified

  • fragment: the fragment part of the path, or None if the

    path has no fragment

>>> u1 = URL("mailto:tweise@hfuu.edu.cn")
>>> print(u1)
mailto://tweise@hfuu.edu.cn
>>> print(u1.scheme)
mailto
>>> print(u1.netloc)
tweise@hfuu.edu.cn
>>> print(u1.host)
hfuu.edu.cn
>>> print(u1.port)
None
>>> print(u1.path)
None
>>> print(u1.fragment)
None
>>> u = URL("tweise@hfuu.edu.cn")
>>> print(u)
mailto://tweise@hfuu.edu.cn
>>> print(u.scheme)
mailto
>>> print(u.netloc)
tweise@hfuu.edu.cn
>>> print(u.host)
hfuu.edu.cn
>>> print(u.port)
None
>>> print(u.path)
None
>>> print(u.fragment)
None
>>> URL("mailto://tweise@hfuu.edu.cn")
'mailto://tweise@hfuu.edu.cn'
>>> u2 = URL("https://example.com/abc")
>>> print(u2)
https://example.com/abc
>>> print(u2.scheme)
https
>>> print(u2.netloc)
example.com
>>> print(u2.host)
example.com
>>> print(u2.port)
None
>>> print(u2.path)
/abc
>>> print(u2.fragment)
None
>>> u1.host != u2.host
True
>>> u = URL("https://example.com/abc/")
>>> print(u)
https://example.com/abc
>>> print(u.scheme)
https
>>> print(u.netloc)
example.com
>>> print(u.host)
example.com
>>> print(u.port)
None
>>> print(u.path)
/abc
>>> print(u.fragment)
None
>>> u = URL("https://example.com/")
>>> print(u)
https://example.com
>>> print(u.scheme)
https
>>> print(u.netloc)
example.com
>>> print(u.host)
example.com
>>> print(u.port)
None
>>> print(u.path)
None
>>> print(u.fragment)
None
>>> u = URL("ssh://git@example.com/abc")
>>> print(u)
ssh://git@example.com/abc
>>> print(u.scheme)
ssh
>>> print(u.netloc)
git@example.com
>>> print(u.host)
example.com
>>> print(u.port)
None
>>> print(u.path)
/abc
>>> print(u.fragment)
None
>>> URL("1.txt", "http://example.com/thomasWeise")
'http://example.com/1.txt'
>>> URL("1.txt", "http://example.com/thomasWeise/")
'http://example.com/thomasWeise/1.txt'
>>> URL("../1.txt", "http://example.com/thomasWeise/")
'http://example.com/1.txt'
>>> URL("https://example.com/1.txt",
...     "http://github.com/thomasWeise/")
'https://example.com/1.txt'
>>> URL("http://example.com:123/1")
'http://example.com:123/1'
>>> u = URL("http://example.com:34/index.html#1")
>>> print(u)
http://example.com:34/index.html#1
>>> print(u.scheme)
http
>>> print(u.netloc)
example.com:34
>>> print(u.host)
example.com
>>> print(u.port)
34
>>> print(u.path)
/index.html
>>> print(u.fragment)
1
>>> try:
...     URL("tweise@@hfuu.edu.cn")
... except ValueError as ve:
...     print(ve)
URL part 'tweise@@hfuu.edu.cn' contains the forbidden text '@@'.
>>> try:
...     URL("http://example.com/index.html#")
... except ValueError as ve:
...     print(ve)
URL part must not end in '#', but 'http://example.com/index.html#' does.
>>> try:
...     URL("http://example.com/index.html@")
... except ValueError as ve:
...     print(ve)
URL part must not end in '@', but 'http://example.com/index.html@' does.
>>> try:
...     URL("https://example.com/abc(/23")
... except ValueError as ve:
...     print(ve)
URL part 'https://example.com/abc(/23' contains the forbidden text '('.
>>> try:
...     URL("https://example.com/abc]/23")
... except ValueError as ve:
...     print(ve)
URL part 'https://example.com/abc]/23' contains the forbidden text ']'.
>>> try:
...     URL("https://example.com/abcä/23")
... except ValueError as ve:
...     print(ve)
URL part 'https://example.com/abcä/23' contains non-ASCII characters.
>>> try:
...     URL("https://example.com/abc/./23")
... except ValueError as ve:
...     print(ve)
URL part 'https://example.com/abc/./23' contains the forbidden text '/./'.
>>> try:
...     URL("https://example.com/abc/../1.txt")
... except ValueError as ve:
...     print(str(ve)[:-4])
URL part 'https://example.com/abc/../1.txt' contains the forbidden text '/.
>>> try:
...     URL(r"https://example.com/abc\./23")
... except ValueError as ve:
...     print(ve)
URL part 'https://example.com/abc\\./23' contains the forbidden text '\\'.
>>> try:
...     URL("https://1.2.com/abc/23/../r")
... except ValueError as ve:
...     print(ve)
URL part 'https://1.2.com/abc/23/../r' contains the forbidden text '/../'.
>>> try:
...     URL("https://exa mple.com")
... except ValueError as ve:
...     print(ve)
URL part 'https://exa mple.com' contains the forbidden text ' '.
>>> try:
...     URL("ftp://example.com")
... except ValueError as ve:
...     print(str(ve)[:66])
Invalid scheme 'ftp' of url 'ftp://example.com' under base None, o
>>> try:
...     URL("http://example.com%32")
... except ValueError as ve:
...     print(str(ve))
URL part 'http://example.com%32' contains the forbidden text '%'.
>>> try:
...     URL("mailto://example.com")
... except ValueError as ve:
...     print(str(ve)[:66])
'mailto' url 'mailto://example.com' must contain '@' and have user
>>> try:
...     URL("ssh://example.com")
... except ValueError as ve:
...     print(str(ve)[:65])
'ssh' url 'ssh://example.com' must contain '@' and have username,
>>> try:
...     URL("ftp://example.com*32")
... except ValueError as ve:
...     print(str(ve))
URL part 'ftp://example.com*32' contains the forbidden text '*'.
>>> try:
...     URL("http://example.com/https://h")
... except ValueError as ve:
...     print(str(ve)[:74])
URL part 'http://example.com/https://h' contains the forbidden text '://ex
>>> try:
...     URL("http://user@example.com")
... except ValueError as ve:
...     print(str(ve)[:66])
'http' url 'http://user@example.com' must not contain '@' and have
>>> try:
...     URL("http://" + ("a" * 250))
... except ValueError as ve:
...     print(str(ve)[-30:])
aaaaa' has invalid length 257.
>>> try:
...     URL("http://.")
... except ValueError as ve:
...     print(ve)
URL part '.' contains the forbidden text '.'.
>>> try:
...     URL("http://..")
... except ValueError as ve:
...     print(ve)
URL part 'http://..' contains the forbidden text '..'.
>>> try:
...     URL("http://www.example.com/../1")
... except ValueError as ve:
...     print(ve)
URL part 'http://www.example.com/../1' contains the forbidden text '/../'.
>>> try:
...     URL("http://www.example.com/./1")
... except ValueError as ve:
...     print(ve)
URL part 'http://www.example.com/./1' contains the forbidden text '/./'.
>>> try:
...     URL("http://user@example.com/@1")
... except ValueError as ve:
...     print(str(ve)[:-9])
URL part 'http://user@example.com/@1' contains the forbidden text '@exampl
>>> try:
...     URL("http://:45/1.txt")
... except ValueError as ve:
...     print(ve)
URL 'http://:45/1.txt' has no host?
>>> try:
...     URL("http://example.com:-3/@1")
... except ValueError as ve:
...     print(ve)
Port could not be cast to integer value as '-3'
>>> try:
...     URL("http://example.com:0/@1")
... except ValueError as ve:
...     print(ve)
port=0 is invalid, must be in 1..65535.
>>> try:
...     URL("http://example.com:65536/@1")
... except ValueError as ve:
...     print(ve)
Port out of range 0-65535
>>> try:
...     URL(1)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
...     URL(None)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'NoneType'
>>> try:
...     URL("http::/1.txt", 1)
... except TypeError as te:
...     print(te)
descriptor '__len__' requires a 'str' object but received a 'int'
>>> try:
...     URL("http::/1.txt?x=1")
... except ValueError as ve:
...     print(ve)
URL part 'http::/1.txt?x=1' contains the forbidden text '?'.
>>> try:
...     URL("http::/1.txt&x=1")
... except ValueError as ve:
...     print(ve)
URL part 'http::/1.txt&x=1' contains the forbidden text '&'.
>>> try:
...     URL("http::/1.+txt&x=1")
... except ValueError as ve:
...     print(ve)
URL part 'http::/1.+txt&x=1' contains the forbidden text '+'.
>>> try:
...     URL("http::/1*.+txt&x=1")
... except ValueError as ve:
...     print(ve)
URL part 'http::/1*.+txt&x=1' contains the forbidden text '*'.
>>> try:
...     URL("http://example.com#1#2")
... except ValueError as ve:
...     print(ve)
URL part '1#2' contains the forbidden text '#'.
fragment: Final[str | None]

the path fragment, i.e., the part following a “#”, if any (else None)

host: Final[str]

the host str

netloc: Final[str]

the network location, usually of the form “user@host:port”, i.e., composed of user name (if present), host, and port (if present)

path: Final[str | None]

the path, if any (else None), but without the fragment component

port: Final[int | None]

the port, if any (else None)

scheme: Final[str]

the protocol scheme, e.g., “https”