Sqlite User-Defined Functions

The sqlite_udf playhouse module contains a number of user-defined functions, aggregates, and table-valued functions, which you may find useful. The functions are grouped in collections and you can register these user-defined extensions individually, by collection, or register everything.

Scalar functions are functions which take a number of parameters and return a single value. For example, converting a string to upper-case, or calculating the MD5 hex digest.

Aggregate functions are like scalar functions that operate on multiple rows of data, producing a single result. For example, calculating the sum of a list of integers, or finding the smallest value in a particular column.

Table-valued functions are simply functions that can return multiple rows of data. For example, a regular-expression search function that returns all the matches in a given string, or a function that accepts two dates and generates all the intervening days.

Note

To use table-valued functions, you will need to build the playhouse._sqlite_ext C extension.

Registering user-defined functions:

  1. db = SqliteDatabase('my_app.db')
  2. # Register *all* functions.
  3. register_all(db)
  4. # Alternatively, you can register individual groups. This will just
  5. # register the DATE and MATH groups of functions.
  6. register_groups(db, 'DATE', 'MATH')
  7. # If you only wish to register, say, the aggregate functions for a
  8. # particular group or groups, you can:
  9. register_aggregate_groups(db, 'DATE')

Using a library function (“hostname”):

  1. # Assume we have a model, Link, that contains lots of arbitrary URLs.
  2. # We want to discover the most common hosts that have been linked.
  3. query = (Link
  4. .select(fn.hostname(Link.url).alias('host'), fn.COUNT(Link.id))
  5. .group_by(fn.hostname(Link.url))
  6. .order_by(fn.COUNT(Link.id).desc())
  7. .tuples())
  8. # Print the hostname along with number of links associated with it.
  9. for host, count in query:
  10. print('%s: %s' % (host, count))

Functions, listed by collection name

Scalar functions are indicated by (f), aggregate functions by (a), and table-valued functions by (t).

CONTROL_FLOW

if_then_else(cond, truthy[, falsey=None])

Simple ternary-type operator, where, depending on the truthiness of the cond parameter, either the truthy or falsey value will be returned.

DATE

strip_tz(date_str)

Parameters:date_str – A datetime, encoded as a string.
Returns:The datetime with any timezone info stripped off.

The time is not adjusted in any way, the timezone is simply removed.

humandelta(nseconds[, glue=’, ])

Parameters:
  • nseconds (int) – Number of seconds, total, in timedelta.
  • glue (str) – Fragment to join values.
Returns:

Easy-to-read description of timedelta.

Example, 86471 -> “1 day, 1 minute, 11 seconds”

mintdiff(datetime_value)

Parameters:datetime_value – A date-time.
Returns:Minimum difference between any two values in list.

Aggregate function that computes the minimum difference between any two datetimes.

avgtdiff(datetime_value)

Parameters:datetime_value – A date-time.
Returns:Average difference between values in list.

Aggregate function that computes the average difference between consecutive values in the list.

duration(datetime_value)

Parameters:datetime_value – A date-time.
Returns:Duration from smallest to largest value in list, in seconds.

Aggregate function that computes the duration from the smallest to the largest value in the list, returned in seconds.

date_series(start, stop[, step_seconds=86400])

Parameters:
  • start (datetime) – Start datetime
  • stop (datetime) – Stop datetime
  • step_seconds (int) – Number of seconds comprising a step.

Table-value function that returns rows consisting of the date/+time values encountered iterating from start to stop, step_seconds at a time.

Additionally, if start does not have a time component and step_seconds is greater-than-or-equal-to one day (86400 seconds), the values returned will be dates. Conversely, if start does not have a date component, values will be returned as times. Otherwise values are returned as datetimes.

Example:

  1. SELECT * FROM date_series('2017-01-28', '2017-02-02');
  2. value
  3. -----
  4. 2017-01-28
  5. 2017-01-29
  6. 2017-01-30
  7. 2017-01-31
  8. 2017-02-01
  9. 2017-02-02

FILE

file_ext(filename)

Parameters:filename (str) – Filename to extract extension from.
Returns:Returns the file extension, including the leading “.”.

file_read(filename)

Parameters:filename (str) – Filename to read.
Returns:Contents of the file.

HELPER

gzip(data[, compression=9])

Parameters:
  • data (bytes) – Data to compress.
  • compression (int) – Compression level (9 is max).
Returns:

Compressed binary data.

gunzip(data)

Parameters:data (bytes) – Compressed data.
Returns:Uncompressed binary data.

hostname(url)

Parameters:url (str) – URL to extract hostname from.
Returns:hostname portion of URL

toggle(key)

Parameters:key – Key to toggle.

Toggle a key between True/False state. Example:

  1. >>> toggle('my-key')
  2. True
  3. >>> toggle('my-key')
  4. False
  5. >>> toggle('my-key')
  6. True

setting(key[, value=None])

Parameters:
  • key – Key to set/retrieve.
  • value – Value to set.
Returns:

Value associated with key.

Store/retrieve a setting in memory and persist during lifetime of application. To get the current value, only specify the key. To set a new value, call with key and new value.

clear_toggles()

Clears all state associated with the toggle() function.

clear_settings()

Clears all state associated with the setting() function.

MATH

randomrange(start[, stop=None[, step=None]])

Parameters:
  • start (int) – Start of range (inclusive)
  • end (int) – End of range(not inclusive)
  • step (int) – Interval at which to return a value.

Return a random integer between [start, end).

gauss_distribution(mean, sigma)

Parameters:
  • mean (float) – Mean value
  • sigma (float) – Standard deviation

sqrt(n)

Calculate the square root of n.

tonumber(s)

Parameters:s (str) – String to convert to number.
Returns:Integer, floating-point or NULL on failure.

mode(val)

Parameters:val – Numbers in list.
Returns:The mode, or most-common, number observed.

Aggregate function which calculates mode of values.

minrange(val)

Parameters:val – Value
Returns:Min difference between two values.

Aggregate function which calculates the minimal distance between two numbers in the sequence.

avgrange(val)

Parameters:val – Value
Returns:Average difference between values.

Aggregate function which calculates the average distance between two consecutive numbers in the sequence.

range(val)

Parameters:val – Value
Returns:The range from the smallest to largest value in sequence.

Aggregate function which returns range of values observed.

median(val)

Parameters:val – Value
Returns:The median, or middle, value in a sequence.

Aggregate function which calculates the middle value in a sequence.

Note

Only available if you compiled the _sqlite_udf extension.

STRING

substr_count(haystack, needle)

Returns number of times needle appears in haystack.

strip_chars(haystack, chars)

Strips any characters in chars from beginning and end of haystack.

damerau_levenshtein_dist(s1, s2)

Computes the edit distance from s1 to s2 using the damerau variant of the levenshtein algorithm.

Note

Only available if you compiled the _sqlite_udf extension.

levenshtein_dist(s1, s2)

Computes the edit distance from s1 to s2 using the levenshtein algorithm.

Note

Only available if you compiled the _sqlite_udf extension.

str_dist(s1, s2)

Computes the edit distance from s1 to s2 using the standard library SequenceMatcher’s algorithm.

Note

Only available if you compiled the _sqlite_udf extension.

regex_search(regex, search_string)

Parameters:
  • regex (str) – Regular expression
  • search_string (str) – String to search for instances of regex.

Table-value function that searches a string for substrings that match the provided regex. Returns rows for each match found.

Example:

  1. SELECT * FROM regex_search('\w+', 'extract words, ignore! symbols');
  2. value
  3. -----
  4. extract
  5. words
  6. ignore
  7. symbols