解析器测试

函数ts_parse可以直接测试文本搜索解析器。

  1. ts_parse(parser_name text, document text,
  2. OUT tokid integer, OUT token text) returns setof record

ts_parse解析指定的document并返回一系列的记录,一条记录代表一个解析生成的token。每条记录包括标识token类型的tokid,及token文本。例如:

  1. openGauss=# SELECT * FROM ts_parse('default', '123 - a number');
  2. tokid | token
  3. -------+--------
  4. 22 | 123
  5. 12 |
  6. 12 | -
  7. 1 | a
  8. 12 |
  9. 1 | number
  10. (6 rows)

函数ts_token_type返回指定解析器的token类型及其描述信息。

  1. ts_token_type(parser_name text, OUT tokid integer,
  2. OUT alias text, OUT description text) returns setof record

ts_token_type返回一个表,这个表描述了指定解析器可以识别的每种token类型。对于每个token类型,表中给出了整数类型的tokid–用于解析器标记对应的token类型;alias——命名分词器命令中的token类型;及简单描述。比如:

  1. openGauss=# SELECT * FROM ts_token_type('default');
  2. tokid | alias | description
  3. -------+-----------------+------------------------------------------
  4. 1 | asciiword | Word, all ASCII
  5. 2 | word | Word, all letters
  6. 3 | numword | Word, letters and digits
  7. 4 | email | Email address
  8. 5 | url | URL
  9. 6 | host | Host
  10. 7 | sfloat | Scientific notation
  11. 8 | version | Version number
  12. 9 | hword_numpart | Hyphenated word part, letters and digits
  13. 10 | hword_part | Hyphenated word part, all letters
  14. 11 | hword_asciipart | Hyphenated word part, all ASCII
  15. 12 | blank | Space symbols
  16. 13 | tag | XML tag
  17. 14 | protocol | Protocol head
  18. 15 | numhword | Hyphenated word, letters and digits
  19. 16 | asciihword | Hyphenated word, all ASCII
  20. 17 | hword | Hyphenated word, all letters
  21. 18 | url_path | URL path
  22. 19 | file | File or path name
  23. 20 | float | Decimal notation
  24. 21 | int | Signed integer
  25. 22 | uint | Unsigned integer
  26. 23 | entity | XML entity
  27. (23 rows)