Sqlite User-Defined Functions

The sqlite_udf playhouse module contains a number of user-definedfunctions, aggregates, and table-valued functions, which you may find useful.The functions are grouped in collections and you can register theseuser-defined extensions individually, by collection, or register everything.

Scalar functions are functions which take a number of parameters and return asingle value. For example, converting a string to upper-case, or calculatingthe MD5 hex digest.

Aggregate functions are like scalar functions that operate on multiple rows ofdata, producing a single result. For example, calculating the sum of a list ofintegers, or finding the smallest value in a particular column.

Table-valued functions are simply functions that can return multiple rows ofdata. For example, a regular-expression search function that returns all thematches in a given string, or a function that accepts two dates and generatesall the intervening days.

Note

To use table-valued functions, you will need to build theplayhouse._sqlite_ext C extension.

Registering user-defined functions:

  1. db = SqliteDatabase('my_app.db')
  2.  
  3. # Register *all* functions.
  4. register_all(db)
  5.  
  6. # Alternatively, you can register individual groups. This will just
  7. # register the DATE and MATH groups of functions.
  8. register_groups(db, 'DATE', 'MATH')
  9.  
  10. # If you only wish to register, say, the aggregate functions for a
  11. # particular group or groups, you can:
  12. register_aggregate_groups(db, 'DATE')
  13.  
  14. # If you only wish to register a single function, then you can:
  15. from playhouse.sqlite_udf import gzip, gunzip
  16. db.register_function(gzip, 'gzip')
  17. db.register_function(gunzip, 'gunzip')

Using a library function (“hostname”):

  1. # Assume we have a model, Link, that contains lots of arbitrary URLs.
  2. # We want to discover the most common hosts that have been linked.
  3. query = (Link
  4. .select(fn.hostname(Link.url).alias('host'), fn.COUNT(Link.id))
  5. .group_by(fn.hostname(Link.url))
  6. .order_by(fn.COUNT(Link.id).desc())
  7. .tuples())
  8.  
  9. # Print the hostname along with number of links associated with it.
  10. for host, count in query:
  11. print('%s: %s' % (host, count))

Functions, listed by collection name

Scalar functions are indicated by (f), aggregate functions by (a), andtable-valued functions by (t).

CONTROL_FLOW

  • ifthen_else(_cond, truthy[, falsey=None])
  • Simple ternary-type operator, where, depending on the truthiness of thecond parameter, either the truthy or falsey value will bereturned.

DATE

  • striptz(_date_str)

Parameters:date_str – A datetime, encoded as a string.Returns:The datetime with any timezone info stripped off.

The time is not adjusted in any way, the timezone is simply removed.

  • humandelta(nseconds[, glue=', '])

Parameters:

  • nseconds (int) – Number of seconds, total, in timedelta.
  • glue (str) – Fragment to join values.Returns:Easy-to-read description of timedelta.

Example, 86471 -> “1 day, 1 minute, 11 seconds”

  • mintdiff(datetime_value)

Parameters:datetime_value – A date-time.Returns:Minimum difference between any two values in list.

Aggregate function that computes the minimum difference between any twodatetimes.

  • avgtdiff(datetime_value)

Parameters:datetime_value – A date-time.Returns:Average difference between values in list.

Aggregate function that computes the average difference between consecutivevalues in the list.

  • duration(datetime_value)

Parameters:datetime_value – A date-time.Returns:Duration from smallest to largest value in list, in seconds.

Aggregate function that computes the duration from the smallest to thelargest value in the list, returned in seconds.

  • dateseries(_start, stop[, step_seconds=86400])

Parameters:

  • start (datetime) – Start datetime
  • stop (datetime) – Stop datetime
  • step_seconds (int) – Number of seconds comprising a step.

Table-value function that returns rows consisting of the date/+time valuesencountered iterating from start to stop, step_seconds at a time.

Additionally, if start does not have a time component and step_seconds isgreater-than-or-equal-to one day (86400 seconds), the values returned willbe dates. Conversely, if start does not have a date component, values willbe returned as times. Otherwise values are returned as datetimes.

Example:

  1. SELECT * FROM date_series('2017-01-28', '2017-02-02');
  2.  
  3. value
  4. -----
  5. 2017-01-28
  6. 2017-01-29
  7. 2017-01-30
  8. 2017-01-31
  9. 2017-02-01
  10. 2017-02-02

FILE

  • fileext(_filename)

Parameters:filename (str) – Filename to extract extension from.Returns:Returns the file extension, including the leading “.”.

  • fileread(_filename)

Parameters:filename (str) – Filename to read.Returns:Contents of the file.

HELPER

  • gzip(data[, compression=9])

Parameters:

  • data (bytes) – Data to compress.
  • compression (int) – Compression level (9 is max).Returns:Compressed binary data.
  • gunzip(data)

Parameters:data (bytes) – Compressed data.Returns:Uncompressed binary data.

  • hostname(url)

Parameters:url (str) – URL to extract hostname from.Returns:hostname portion of URL

  • toggle(key)

Parameters:key – Key to toggle.

Toggle a key between True/False state. Example:

  1. >>> toggle('my-key')
  2. True
  3. >>> toggle('my-key')
  4. False
  5. >>> toggle('my-key')
  6. True
  • setting(key[, value=None])

Parameters:

  • key – Key to set/retrieve.
  • value – Value to set.Returns:Value associated with key.

Store/retrieve a setting in memory and persist during lifetime ofapplication. To get the current value, only specify the key. To set a newvalue, call with key and new value.

  • clear_toggles()
  • Clears all state associated with the toggle() function.
  • clear_settings()
  • Clears all state associated with the setting() function.

MATH

  • randomrange(start[, stop=None[, step=None]])

Parameters:

  • start (int) – Start of range (inclusive)
  • end (int) – End of range(not inclusive)
  • step (int) – Interval at which to return a value.

Return a random integer between [start, end).

  • gaussdistribution(_mean, sigma)

Parameters:

  • mean (float) – Mean value
  • sigma (float) – Standard deviation
  • sqrt(n)
  • Calculate the square root of n.
  • tonumber(s)

Parameters:s (str) – String to convert to number.Returns:Integer, floating-point or NULL on failure.

  • mode(val)

Parameters:val – Numbers in list.Returns:The mode, or most-common, number observed.

Aggregate function which calculates mode of values.

  • minrange(val)

Parameters:val – ValueReturns:Min difference between two values.

Aggregate function which calculates the minimal distance between twonumbers in the sequence.

  • avgrange(val)

Parameters:val – ValueReturns:Average difference between values.

Aggregate function which calculates the average distance between twoconsecutive numbers in the sequence.

  • range(val)

Parameters:val – ValueReturns:The range from the smallest to largest value in sequence.

Aggregate function which returns range of values observed.

  • median(val)

Parameters:val – ValueReturns:The median, or middle, value in a sequence.

Aggregate function which calculates the middle value in a sequence.

Note

Only available if you compiled the _sqlite_udf extension.

STRING

  • substrcount(_haystack, needle)
  • Returns number of times needle appears in haystack.
  • stripchars(_haystack, chars)
  • Strips any characters in chars from beginning and end of haystack.
  • dameraulevenshtein_dist(_s1, s2)
  • Computes the edit distance from s1 to s2 using the damerau variant of thelevenshtein algorithm.

Note

Only available if you compiled the _sqlite_udf extension.

  • levenshteindist(_s1, s2)
  • Computes the edit distance from s1 to s2 using the levenshtein algorithm.

Note

Only available if you compiled the _sqlite_udf extension.

  • strdist(_s1, s2)
  • Computes the edit distance from s1 to s2 using the standard librarySequenceMatcher’s algorithm.

Note

Only available if you compiled the _sqlite_udf extension.

  • regexsearch(_regex, search_string)

Parameters:

  • regex (str) – Regular expression
  • search_string (str) – String to search for instances of regex.

Table-value function that searches a string for substrings that matchthe provided regex. Returns rows for each match found.

Example:

  1. SELECT * FROM regex_search('\w+', 'extract words, ignore! symbols');
  2.  
  3. value
  4. -----
  5. extract
  6. words
  7. ignore
  8. symbols