[experimental] Natural Language Processing functions

Warning

This is an experimental feature that is currently in development and is not ready for general use. It will change in unpredictable backwards-incompatible ways in future releases. Set allow_experimental_nlp_functions = 1 to enable it.

stem

Performs stemming on a given word.

Syntax

  1. stem('language', word)

Arguments

  • language — Language which rules will be applied. Must be in lowercase. String.
  • word — word that needs to be stemmed. Must be in lowercase. String.

Examples

Query:

  1. SELECT SELECT arrayMap(x -> stem('en', x), ['I', 'think', 'it', 'is', 'a', 'blessing', 'in', 'disguise']) as res;

Result:

  1. ┌─res────────────────────────────────────────────────┐
  2. ['I','think','it','is','a','bless','in','disguis']
  3. └────────────────────────────────────────────────────┘

lemmatize

Performs lemmatization on a given word. Needs dictionaries to operate, which can be obtained here.

Syntax

  1. lemmatize('language', word)

Arguments

  • language — Language which rules will be applied. String.
  • word — Word that needs to be lemmatized. Must be lowercase. String.

Examples

Query:

  1. SELECT lemmatize('en', 'wolves');

Result:

  1. ┌─lemmatize("wolves")─┐
  2. "wolf"
  3. └─────────────────────┘

Configuration:

  1. <lemmatizers>
  2. <lemmatizer>
  3. <lang>en</lang>
  4. <path>en.bin</path>
  5. </lemmatizer>
  6. </lemmatizers>

synonyms

Finds synonyms to a given word. There are two types of synonym extensions: plain and wordnet.

With the plain extension type we need to provide a path to a simple text file, where each line corresponds to a certain synonym set. Words in this line must be separated with space or tab characters.

With the wordnet extension type we need to provide a path to a directory with WordNet thesaurus in it. Thesaurus must contain a WordNet sense index.

Syntax

  1. synonyms('extension_name', word)

Arguments

  • extension_name — Name of the extension in which search will be performed. String.
  • word — Word that will be searched in extension. String.

Examples

Query:

  1. SELECT synonyms('list', 'important');

Result:

  1. ┌─synonyms('list', 'important')────────────┐
  2. ['important','big','critical','crucial']
  3. └──────────────────────────────────────────┘

Configuration:

  1. <synonyms_extensions>
  2. <extension>
  3. <name>en</name>
  4. <type>plain</type>
  5. <path>en.txt</path>
  6. </extension>
  7. <extension>
  8. <name>en</name>
  9. <type>wordnet</type>
  10. <path>en/</path>
  11. </extension>
  12. </synonyms_extensions>