URL Helpers

werkzeug.urls

werkzeug.urls used to provide several wrapper functions for Python 2urlparse, whose main purpose were to work around the behavior of the Py2stdlib and its lack of unicode support. While this was already a somewhatinconvenient situation, it got even more complicated because Python 3’surllib.parse actually does handle unicode properly. In other words,this module would wrap two libraries with completely different behavior. Sonow this module contains a 2-and-3-compatible backport of Python 3’surllib.parse, which is mostly API-compatible.

  • class werkzeug.urls.BaseURL
  • Superclass of URL and BytesURL.

    • ascii_host
    • Works exactly like host but will return a result thatis restricted to ASCII. If it finds a netloc that is not ASCIIit will attempt to idna decode it. This is useful for socketoperations when the URL might include internationalized characters.

    • auth

    • The authentication part in the URL if available, _None_otherwise.

    • decode_netloc()

    • Decodes the netloc part into a string.

    • decodequery(args, *kwargs_)

    • Decodes the query part of the URL. Ths is a shortcut forcalling url_decode() on the query argument. The arguments andkeyword arguments are forwarded to url_decode() unchanged.

    • getfile_location(_pathformat=None)

    • Returns a tuple with the location of the file in the form(server, location). If the netloc is empty in the URL orpoints to localhost, it’s represented as None.

The pathformat by default is autodetection but needs to be setwhen working with URLs of a specific system. The supported valuesare 'windows' when working with Windows or DOS paths and'posix' when working with posix paths.

If the URL does not point to a local file, the server and locationare both represented as None.

Parameters:pathformat – The expected format of the path component.Currently 'windows' and 'posix' aresupported. Defaults to None which isautodetect.

  • host
  • The host part of the URL if available, otherwise None. Thehost is either the hostname or the IP address mentioned in theURL. It will not contain the port.

  • join(*args, **kwargs)

  • Joins this URL with another one. This is just a conveniencefunction for calling into url_join() and then parsing thereturn value again.

  • password

  • The password if it was part of the URL, None otherwise.This undergoes URL decoding and will always be a unicode string.

  • port

  • The port in the URL as an integer if it was present, _None_otherwise. This does not fill in default ports.

  • raw_password

  • The password if it was part of the URL, None otherwise.Unlike password this one is not being decoded.

  • raw_username

  • The username if it was part of the URL, None otherwise.Unlike username this one is not being decoded.

  • replace(**kwargs)

  • Return an URL with the same values, except for those parametersgiven new values by whichever keyword arguments are specified.

  • to_iri_tuple()

  • Returns a URL tuple that holds a IRI. This will tryto decode as much information as possible in the URL withoutlosing information similar to how a web browser does it for theURL bar.

It’s usually more interesting to directly call uri_to_iri() whichwill return a string.

  • to_uri_tuple()
  • Returns a BytesURL tuple that holds a URI. This willencode all the information in the URL properly to ASCII using therules a web browser would follow.

It’s usually more interesting to directly call iri_to_uri() whichwill return a string.

  • to_url()
  • Returns a URL string or bytes depending on the type of theinformation stored. This is just a convenience functionfor calling url_unparse() for this URL.

  • username

  • The username if it was part of the URL, None otherwise.This undergoes URL decoding and will always be a unicode string.
  • class werkzeug.urls.BytesURL
  • Represents a parsed URL in bytes.

    • decode(charset='utf-8', errors='replace')
    • Decodes the URL to a tuple made out of strings. The charset isonly being used for the path, query and fragment.

    • encode_netloc()

    • Returns the netloc unchanged as bytes.
  • class werkzeug.urls.Href(base='./', charset='utf-8', sort=False, key=None)
  • Implements a callable that constructs URLs with the given base. Thefunction can be called with any number of positional and keywordarguments which than are used to assemble the URL. Works with URLsand posix paths.

Positional arguments are appended as individual segments tothe path of the URL:

  1. >>> href = Href('/foo')
  2. >>> href('bar', 23)
  3. '/foo/bar/23'
  4. >>> href('foo', bar=23)
  5. '/foo/foo?bar=23'

If any of the arguments (positional or keyword) evaluates to None itwill be skipped. If no keyword arguments are given the last argumentcan be a dict or MultiDict (or any other dict subclass),otherwise the keyword arguments are used for the query parameters, cuttingoff the first trailing underscore of the parameter name:

  1. >>> href(is_=42)
  2. '/foo?is=42'
  3. >>> href({'foo': 'bar'})
  4. '/foo?foo=bar'

Combining of both methods is not allowed:

  1. >>> href({'foo': 'bar'}, bar=42)
  2. Traceback (most recent call last):
  3. ...
  4. TypeError: keyword arguments and query-dicts can't be combined

Accessing attributes on the href object creates a new href object withthe attribute name as prefix:

  1. >>> bar_href = href.bar
  2. >>> bar_href("blub")
  3. '/foo/bar/blub'

If sort is set to True the items are sorted by key or the defaultsorting algorithm:

  1. >>> href = Href("/", sort=True)
  2. >>> href(a=1, b=2, c=3)
  3. '/?a=1&b=2&c=3'

New in version 0.5: sort and key were added.

  • class werkzeug.urls.URL
  • Represents a parsed URL. This behaves like a regular tuple butalso has some extra attributes that give further insight into theURL.

    • encode(charset='utf-8', errors='replace')
    • Encodes the URL to a tuple made out of bytes. The charset isonly being used for the path, query and fragment.

    • encode_netloc()

    • Encodes the netloc part to an ASCII safe URL as bytes.
  • werkzeug.urls.irito_uri(_iri, charset='utf-8', errors='strict', safe_conversion=False)
  • Convert an IRI to a URI. All non-ASCII and unsafe characters arequoted. If the URL has a domain, it is encoded to Punycode.
  1. >>> iri_to_uri('http://\u2603.net/p\xe5th?q=\xe8ry%DF')
  2. 'http://xn--n3h.net/p%C3%A5th?q=%C3%A8ry%DF'

Parameters:

  • iri – The IRI to convert.
  • charset – The encoding of the IRI.
  • errors – Error handler to use during bytes.encode.
  • safe_conversion – Return the URL unchanged if it only containsASCII characters and no whitespace. See the explanation below.

There is a general problem with IRI conversion with some protocolsthat are in violation of the URI specification. Consider thefollowing two IRIs:

  1. magnet:?xt=uri:whatever
  2. itms-services://?action=download-manifest

After parsing, we don’t know if the scheme requires the //,which is dropped if empty, but conveys different meanings in thefinal URL if it’s present or not. In this case, you can usesafe_conversion, which will return the URL unchanged if it onlycontains ASCII characters and no whitespace. This can result in aURI with unquoted characters if it was not already quoted correctly,but preserves the URL’s semantics. Werkzeug uses this for theLocation header for redirects.

Changed in version 0.15: All reserved characters remain unquoted. Previously, only somereserved characters were left unquoted.

Changed in version 0.9.6: The safe_conversion parameter was added.

New in version 0.6.

  • werkzeug.urls.urito_iri(_uri, charset='utf-8', errors='werkzeug.url_quote')
  • Convert a URI to an IRI. All valid UTF-8 characters are unquoted,leaving all reserved and invalid characters quoted. If the URL hasa domain, it is decoded from Punycode.
  1. >>> uri_to_iri("http://xn--n3h.net/p%C3%A5th?q=%C3%A8ry%DF")
  2. 'http://\u2603.net/p\xe5th?q=\xe8ry%DF'

Parameters:

  • uri – The URI to convert.
  • charset – The encoding to encode unquoted bytes with.
  • errors – Error handler to use during bytes.encode. Bydefault, invalid bytes are left quoted.

Changed in version 0.15: All reserved and invalid characters remain quoted. Previously,only some reserved characters were preserved, and invalid byteswere replaced instead of left quoted.

New in version 0.6.

  • werkzeug.urls.urldecode(_s, charset='utf-8', decode_keys=False, include_empty=True, errors='replace', separator='&', cls=None)
  • Parse a querystring and return it as MultiDict. There is adifference in key decoding on different Python versions. On Python 3keys will always be fully decoded whereas on Python 2, keys willremain bytestrings if they fit into ASCII. On 2.x keys can be forcedto be unicode by setting decode_keys to True.

If the charset is set to None no unicode decoding will happen andraw bytes will be returned.

Per default a missing value for a key will default to an empty key. Ifyou don’t want that behavior you can set include_empty to False.

Per default encoding errors are ignored. If you want a different behavioryou can set errors to 'replace' or 'strict'. In strict mode aHTTPUnicodeError is raised.

Changed in version 0.5: In previous versions “;” and “&” could be used for url decoding.This changed in 0.5 where only “&” is supported. If you want touse “;” instead a different separator can be provided.

The cls parameter was added.

Parameters:

  • s – a string with the query string to decode.
  • charset – the charset of the query string. If set to _None_no unicode decoding will take place.
  • decode_keys – Used on Python 2.x to control whether keys shouldbe forced to be unicode objects. If set to True_then keys will be unicode in all cases. Otherwise,they remain _str if they fit into ASCII.
  • include_empty – Set to False if you don’t want empty values toappear in the dict.
  • errors – the decoding error behavior.
  • separator – the pair separator to be used, defaults to &
  • cls – an optional dict class to use. If this is not specifiedor None the default MultiDict is used.
  • werkzeug.urls.urldecode_stream(_stream, charset='utf-8', decode_keys=False, include_empty=True, errors='replace', separator='&', cls=None, limit=None, return_iterator=False)
  • Works like url_decode() but decodes a stream. The behaviorof stream and limit follows functions likemake_line_iter(). The generator of pairs isdirectly fed to the cls so you can consume the data while it’sparsed.

New in version 0.8.

Parameters:

  • stream – a stream with the encoded querystring
  • charset – the charset of the query string. If set to _None_no unicode decoding will take place.
  • decode_keys – Used on Python 2.x to control whether keys shouldbe forced to be unicode objects. If set to True,keys will be unicode in all cases. Otherwise, theyremain str if they fit into ASCII.
  • include_empty – Set to False if you don’t want empty values toappear in the dict.
  • errors – the decoding error behavior.
  • separator – the pair separator to be used, defaults to &
  • cls – an optional dict class to use. If this is not specifiedor None the default MultiDict is used.
  • limit – the content length of the URL data. Not necessary ifa limited stream is provided.
  • return_iterator – if set to True the cls argument is ignoredand an iterator over all decoded pairs isreturned
  • werkzeug.urls.urlencode(_obj, charset='utf-8', encode_keys=False, sort=False, key=None, separator=b'&')
  • URL encode a dict/MultiDict. If a value is None it will not appearin the result string. Per default only values are encoded into the targetcharset strings. If encode_keys is set to True unicode keys aresupported too.

If sort is set to True the items are sorted by key or the defaultsorting algorithm.

New in version 0.5: sort, key, and separator were added.

Parameters:

  • obj – the object to encode into a query string.
  • charset – the charset of the query string.
  • encode_keys – set to True if you have unicode keys. (Ignored onPython 3.x)
  • sort – set to True if you want parameters to be sorted by key.
  • separator – the separator to be used for the pairs.
  • key – an optional function to be used for sorting. For more detailscheck out the sorted() documentation.
  • werkzeug.urls.urlencode_stream(_obj, stream=None, charset='utf-8', encode_keys=False, sort=False, key=None, separator=b'&')
  • Like url_encode() but writes the results to a streamobject. If the stream is None a generator over all encodedpairs is returned.

New in version 0.8.

Parameters:

  • obj – the object to encode into a query string.
  • stream – a stream to write the encoded object into or None ifan iterator over the encoded pairs should be returned. Inthat case the separator argument is ignored.
  • charset – the charset of the query string.
  • encode_keys – set to True if you have unicode keys. (Ignored onPython 3.x)
  • sort – set to True if you want parameters to be sorted by key.
  • separator – the separator to be used for the pairs.
  • key – an optional function to be used for sorting. For more detailscheck out the sorted() documentation.
  • werkzeug.urls.urlfix(_s, charset='utf-8')
  • Sometimes you get an URL by a user that just isn’t a real URL becauseit contains unsafe characters like ‘ ‘ and so on. This function can fixsome of the problems in a similar way browsers handle data entered by theuser:
  1. >>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffskl\xe4rung)')
  2. 'http://de.wikipedia.org/wiki/Elf%20(Begriffskl%C3%A4rung)'

Parameters:

  • s – the string with the URL to fix.
  • charset – The target charset for the URL if the url was given asunicode string.
  • werkzeug.urls.urljoin(_base, url, allow_fragments=True)
  • Join a base URL and a possibly relative URL to form an absoluteinterpretation of the latter.

Parameters:

  • base – the base URL for the join operation.
  • url – the URL to join.
  • allow_fragments – indicates whether fragments should be allowed.
  • werkzeug.urls.urlparse(_url, scheme=None, allow_fragments=True)
  • Parses a URL from a string into a URL tuple. If the URLis lacking a scheme it can be provided as second argument. Otherwise,it is ignored. Optionally fragments can be stripped from the URLby setting allow_fragments to False.

The inverse of this function is url_unparse().

Parameters:

  • url – the URL to parse.
  • scheme – the default schema to use if the URL is schemaless.
  • allow_fragments – if set to False a fragment will be removedfrom the URL.
  • werkzeug.urls.urlquote(_string, charset='utf-8', errors='strict', safe='/:', unsafe='')
  • URL encode a single string with a given encoding.

Parameters:

  • s – the string to quote.
  • charset – the charset to be used.
  • safe – an optional sequence of safe characters.
  • unsafe – an optional sequence of unsafe characters.

New in version 0.9.2: The unsafe parameter was added.

  • werkzeug.urls.urlquote_plus(_string, charset='utf-8', errors='strict', safe='')
  • URL encode a single string with the given encoding and convertwhitespace to “+”.

Parameters:

  • s – The string to quote.
  • charset – The charset to be used.
  • safe – An optional sequence of safe characters.
  • werkzeug.urls.urlunparse(_components)
  • The reverse operation to url_parse(). This accepts arbitraryas well as URL tuples and returns a URL as a string.

Parameters:components – the parsed URL as tuple which should be convertedinto a URL string.

  • werkzeug.urls.urlunquote(_string, charset='utf-8', errors='replace', unsafe='')
  • URL decode a single string with a given encoding. If the charsetis set to None no unicode decoding is performed and raw bytesare returned.

Parameters:

  • s – the string to unquote.
  • charset – the charset of the query string. If set to _None_no unicode decoding will take place.
  • errors – the error handling for the charset decoding.
  • werkzeug.urls.urlunquote_plus(_s, charset='utf-8', errors='replace')
  • URL decode a single string with the given charset and decode “+” towhitespace.

Per default encoding errors are ignored. If you want a different behavioryou can set errors to 'replace' or 'strict'. In strict mode aHTTPUnicodeError is raised.

Parameters:

  • s – The string to unquote.
  • charset – the charset of the query string. If set to _None_no unicode decoding will take place.
  • errors – The error handling for the charset decoding.