Synonym词典

Synonym词典用于定义、识别token的同义词并转化,不支持词组(词组形式的同义词可用Thesaurus词典定义,详细请参见Thesaurus词典)。

示例

  • Synonym词典可用于解决语言学相关问题,例如,为避免使单词”Paris”变成”pari”,可在Synonym词典文件中定义一行”Paris paris”,并将该词典放置在预定义的english_stem词典之前。

    1. postgres=# SELECT * FROM ts_debug('english', 'Paris');
    2. alias | description | token | dictionaries | dictionary | lexemes
    3. -----------+-----------------+-------+----------------+--------------+---------
    4. asciiword | Word, all ASCII | Paris | {english_stem} | english_stem | {pari}
    5. (1 row)
    6. postgres=# CREATE TEXT SEARCH DICTIONARY my_synonym (
    7. TEMPLATE = synonym,
    8. SYNONYMS = my_synonyms,
    9. FILEPATH = 'file:///home/dicts/'
    10. );
    11. postgres=# ALTER TEXT SEARCH CONFIGURATION english
    12. ALTER MAPPING FOR asciiword
    13. WITH my_synonym, english_stem;
    14. postgres=# SELECT * FROM ts_debug('english', 'Paris');
    15. alias | description | token | dictionaries | dictionary | lexemes
    16. -----------+-----------------+-------+---------------------------+------------+---------
    17. asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris}
    18. (1 row)
    19. postgres=# SELECT * FROM ts_debug('english', 'paris');
    20. alias | description | token | dictionaries | dictionary | lexemes
    21. -----------+-----------------+-------+---------------------------+------------+---------
    22. asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris}
    23. (1 row)
    24. postgres=# ALTER TEXT SEARCH DICTIONARY my_synonym ( CASESENSITIVE=true);
    25. postgres=# SELECT * FROM ts_debug('english', 'Paris');
    26. alias | description | token | dictionaries | dictionary | lexemes
    27. -----------+-----------------+-------+---------------------------+------------+---------
    28. asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris}
    29. (1 row)
    30. postgres=# SELECT * FROM ts_debug('english', 'paris');
    31. alias | description | token | dictionaries | dictionary | lexemes
    32. -----------+-----------------+-------+---------------------------+------------+---------
    33. asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {pari}
    34. (1 row)

    其中,同义词词典文件全名为my_synonyms.syn,所在目录为当前连接数据库主节点的/home/dicts/下。关于创建词典的语法和更多参数,请参见ALTER TEXT SEARCH DICTIONARY

  • 星号(*)可用于词典文件中的同义词结尾,表示该同义词是一个前缀。在to_tsvector()中该星号将被忽略,但在to_tsquery()中会匹配该前缀并对应输出结果(参照处理查询一节)。

    假设词典文件synonym_sample.syn内容如下:

    1. postgres pgsql
    2. postgresql pgsql
    3. postgre pgsql
    4. gogle googl
    5. indices index*

    创建并使用词典:

    1. postgres=# CREATE TEXT SEARCH DICTIONARY syn (
    2. TEMPLATE = synonym,
    3. SYNONYMS = synonym_sample
    4. );
    5. postgres=# SELECT ts_lexize('syn','indices');
    6. ts_lexize
    7. -----------
    8. {index}
    9. (1 row)
    10. postgres=# CREATE TEXT SEARCH CONFIGURATION tst (copy=simple);
    11. postgres=# ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR asciiword WITH syn;
    12. postgres=# SELECT to_tsvector('tst','indices');
    13. to_tsvector
    14. -------------
    15. 'index':1
    16. (1 row)
    17. postgres=# SELECT to_tsquery('tst','indices');
    18. to_tsquery
    19. ------------
    20. 'index':*
    21. (1 row)
    22. postgres=# SELECT 'indexes are very useful'::tsvector;
    23. tsvector
    24. ---------------------------------
    25. 'are' 'indexes' 'useful' 'very'
    26. (1 row)
    27. postgres=# SELECT 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices');
    28. ?column?
    29. ----------
    30. t
    31. (1 row)