Search profiling

How a query is interpreted

Example of a complex query:

  1. "hello world" @title "example program"~5 @body python -(php|perl) @* code

The full meaning of this search is:

  • Find the words ‘hello’ and ‘world’ adjacently in any field in a document;
  • Additionally, the same document must also contain the words ‘example’ and ‘program’ in the title field, with up to, but not including, 5 words between the words in question; (E.g. “example PHP program” would be matched however “example script to introduce outside data into the correct context for your program” would not because two terms have 5 or more words between them)
  • Additionally, the same document must contain the word ‘python’ in the body field, but not contain either ‘php’ or ‘perl’;
  • Additionally, the same document must contain the word ‘code’ in any field.

OR operator precedence is higher than AND, so “looking for cat | dog | mouse” means “looking for ( cat | dog | mouse )” and not “(looking for cat) | dog | mouse”.

To understand how a query will be executed, Manticore Search offer query profile tooling for viewing the query tree created by a query expression.

Profiling the query tree in SQL

When using SQL statement the full-text query profiling needs to be enabled before running the desired query:

  1. SET profiling =1;
  2. SELECT * FROM test WHERE MATCH('@title abc* @body hey');

To view the query tree, we must run SHOW PLAN right after the execution of the query:

  1. SHOW PLAN;

The command will return the structure of the executed query. Please note that the 3 statements - SET profiling, the query and SHOW - must run on the same session.

Profiling the query in HTTP JSON

When using the HTTP JSON protocol we can just enable "profile":true to get in response the full-text query tree structure.

  1. {
  2. "index":"test",
  3. "profile":true,
  4. "query":
  5. {
  6. "match_phrase": { "_all" : "had grown quite" }
  7. }
  8. }

The response will contain a profile object in which we can find a member query.

query property contains the transformed full-text query tree. Each node contains:

  • type: node type. Can be AND, OR, PHRASE, KEYWORD etc.
  • description: query subtree for this node shown as a string (in SHOW PLAN format)
  • children: child nodes, if any
  • max_field_pos: maximum position within a field

A keyword node will also provide:

  • word: transformed keyword.
  • querypos: position of this keyword in a query.
  • excluded: keyword excluded from query.
  • expanded: keyword added by prefix expansion.
  • field_start: keyword must occur at the very start of the field.
  • field_end: keyword must occur at the very end of the field.
  • boost: keyword IDF will be multiplied by this.
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java

SQL JSON PHP Python javascript Java

  1. SET profiling=1;
  2. SELECT * FROM test WHERE MATCH('@title abc* @body hey');
  3. SHOW PLAN \G
  1. POST /search
  2. {
  3. "index": "forum",
  4. "query": {"query_string": "i me"},
  5. "_source": { "excludes":["*"] },
  6. "limit": 1,
  7. "profile":true
  8. }
  1. $result = $index->search('i me')->setSource(['excludes'=>['*']])->setLimit(1)->profile()->get();
  2. print_r($result->getProfile());
  1. searchApi.search({"index":"forum","query":{"query_string":"i me"},"_source":{"excludes":["*"]},"limit":1,"profile":True})
  1. res = await searchApi.search({"index":"forum","query":{"query_string":"i me"},"_source":{"excludes":["*"]},"limit":1,"profile":true});
  1. query = new HashMap<String,Object>();
  2. query.put("query_string","i me");
  3. searchRequest = new SearchRequest();
  4. searchRequest.setIndex("forum");
  5. searchRequest.setQuery(query);
  6. searchRequest.setProfile(true);
  7. searchRequest.setLimit(1);
  8. searchRequest.setSort(new ArrayList<String>(){{
  9. add("*");
  10. }});
  11. searchResponse = searchApi.search(searchRequest);

Response

  1. *************************** 1\. row ***************************
  2. Variable: transformed_tree
  3. Value: AND(
  4. OR(fields=(title), KEYWORD(abcx, querypos=1, expanded), KEYWORD(abcm, querypos=1, expanded)),
  5. AND(fields=(body), KEYWORD(hey, querypos=2)))
  6. 1 row in set (0.00 sec)
  1. {
  2. "took":1503,
  3. "timed_out":false,
  4. "hits":
  5. {
  6. "total":406301,
  7. "hits":
  8. [
  9. {
  10. "_id":"406443",
  11. "_score":3493,
  12. "_source":{}
  13. }
  14. ]
  15. },
  16. "profile":
  17. {
  18. "query":
  19. {
  20. "type":"AND",
  21. "description":"AND( AND(KEYWORD(i, querypos=1)), AND(KEYWORD(me, querypos=2)))",
  22. "children":
  23. [
  24. {
  25. "type":"AND",
  26. "description":"AND(KEYWORD(i, querypos=1))",
  27. "children":
  28. [
  29. {
  30. "type":"KEYWORD",
  31. "word":"i",
  32. "querypos":1
  33. }
  34. ]
  35. },
  36. {
  37. "type":"AND",
  38. "description":"AND(KEYWORD(me, querypos=2))",
  39. "children":
  40. [
  41. {
  42. "type":"KEYWORD",
  43. "word":"me",
  44. "querypos":2
  45. }
  46. ]
  47. }
  48. ]
  49. }
  50. }
  51. }
  1. Array
  2. (
  3. [query] => Array
  4. (
  5. [type] => AND
  6. [description] => AND( AND(KEYWORD(i, querypos=1)), AND(KEYWORD(me, querypos=2)))
  7. [children] => Array
  8. (
  9. [0] => Array
  10. (
  11. [type] => AND
  12. [description] => AND(KEYWORD(i, querypos=1))
  13. [children] => Array
  14. (
  15. [0] => Array
  16. (
  17. [type] => KEYWORD
  18. [word] => i
  19. [querypos] => 1
  20. )
  21. )
  22. )
  23. [1] => Array
  24. (
  25. [type] => AND
  26. [description] => AND(KEYWORD(me, querypos=2))
  27. [children] => Array
  28. (
  29. [0] => Array
  30. (
  31. [type] => KEYWORD
  32. [word] => me
  33. [querypos] => 2
  34. )
  35. )
  36. )
  37. )
  38. )
  39. )
  1. {'hits': {'hits': [{u'_id': u'100', u'_score': 2500, u'_source': {}}],
  2. 'total': 1},
  3. 'profile': {u'query': {u'children': [{u'children': [{u'querypos': 1,
  4. u'type': u'KEYWORD',
  5. u'word': u'i'}],
  6. u'description': u'AND(KEYWORD(i, querypos=1))',
  7. u'type': u'AND'},
  8. {u'children': [{u'querypos': 2,
  9. u'type': u'KEYWORD',
  10. u'word': u'me'}],
  11. u'description': u'AND(KEYWORD(me, querypos=2))',
  12. u'type': u'AND'}],
  13. u'description': u'AND( AND(KEYWORD(i, querypos=1)), AND(KEYWORD(me, querypos=2)))',
  14. u'type': u'AND'}},
  15. 'timed_out': False,
  16. 'took': 0}
  1. {"hits": {"hits": [{"_id": "100", "_score": 2500, "_source": {}}],
  2. "total": 1},
  3. "profile": {"query": {"children": [{"children": [{"querypos": 1,
  4. "type": "KEYWORD",
  5. "word": "i"}],
  6. "description": "AND(KEYWORD(i, querypos=1))",
  7. "type": "AND"},
  8. {"children": [{"querypos": 2,
  9. "type": "KEYWORD",
  10. "word": "me"}],
  11. "description": "AND(KEYWORD(me, querypos=2))",
  12. "type": "AND"}],
  13. "description": "AND( AND(KEYWORD(i, querypos=1)), AND(KEYWORD(me, querypos=2)))",
  14. "type": "AND"}},
  15. "timed_out": False,
  16. "took": 0}
  1. class SearchResponse {
  2. took: 18
  3. timedOut: false
  4. hits: class SearchResponseHits {
  5. total: 1
  6. hits: [{_id=100, _score=2500, _source={}}]
  7. aggregations: null
  8. }
  9. profile: {query={type=AND, description=AND( AND(KEYWORD(i, querypos=1)), AND(KEYWORD(me, querypos=2))), children=[{type=AND, description=AND(KEYWORD(i, querypos=1)), children=[{type=KEYWORD, word=i, querypos=1}]}, {type=AND, description=AND(KEYWORD(me, querypos=2)), children=[{type=KEYWORD, word=me, querypos=2}]}]}}
  10. }

In some cases the evaluated query tree can be rather different from the original one because of expansions and other transformations.

  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java

SQL JSON PHP Python javascript Java

  1. SET profiling=1;
  2. SELECT id FROM forum WHERE MATCH('@title way* @content hey') LIMIT 1;
  3. SHOW PLAN;
  1. POST /search
  2. {
  3. "index": "forum",
  4. "query": {"query_string": "@title way* @content hey"},
  5. "_source": { "excludes":["*"] },
  6. "limit": 1,
  7. "profile":true
  8. }
  1. $result = $index->search('@title way* @content hey')->setSource(['excludes'=>['*']])->setLimit(1)->profile()->get();
  2. print_r($result->getProfile());
  1. searchApi.search({"index":"forum","query":{"query_string":"@title way* @content hey"},"_source":{"excludes":["*"]},"limit":1,"profile":true})
  1. res = await searchApi.search({"index":"forum","query":{"query_string":"@title way* @content hey"},"_source":{"excludes":["*"]},"limit":1,"profile":true});
  1. query = new HashMap<String,Object>();
  2. query.put("query_string","@title way* @content hey");
  3. searchRequest = new SearchRequest();
  4. searchRequest.setIndex("forum");
  5. searchRequest.setQuery(query);
  6. searchRequest.setProfile(true);
  7. searchRequest.setLimit(1);
  8. searchRequest.setSort(new ArrayList<String>(){{
  9. add("*");
  10. }});
  11. searchResponse = searchApi.search(searchRequest);

Response

  1. Query OK, 0 rows affected (0.00 sec)
  2. +--------+
  3. | id |
  4. +--------+
  5. | 711651 |
  6. +--------+
  7. 1 row in set (0.04 sec)
  8. +------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  9. | Variable | Value |
  10. +------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  11. | transformed_tree | AND(
  12. OR(
  13. OR(
  14. AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),
  15. OR(
  16. AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),
  17. AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),
  18. AND(fields=(title), KEYWORD(way, querypos=1, expanded)),
  19. OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),
  20. AND(fields=(content), KEYWORD(hey, querypos=2))) |
  21. +------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  22. 1 row in set (0.00 sec)
  1. {
  2. "took":33,
  3. "timed_out":false,
  4. "hits":
  5. {
  6. "total":105,
  7. "hits":
  8. [
  9. {
  10. "_id":"711651",
  11. "_score":2539,
  12. "_source":{}
  13. }
  14. ]
  15. },
  16. "profile":
  17. {
  18. "query":
  19. {
  20. "type":"AND",
  21. "description":"AND( OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)), OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)), AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))), AND(fields=(title), KEYWORD(way, querypos=1, expanded)), OR(fields=(title), KEYWORD(way*, querypos=1, expanded))), AND(fields=(content), KEYWORD(hey, querypos=2)))",
  22. "children":
  23. [
  24. {
  25. "type":"OR",
  26. "description":"OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)), OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)), AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))), AND(fields=(title), KEYWORD(way, querypos=1, expanded)), OR(fields=(title), KEYWORD(way*, querypos=1, expanded)))",
  27. "children":
  28. [
  29. {
  30. "type":"OR",
  31. "description":"OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)), OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)), AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))))",
  32. "children":
  33. [
  34. {
  35. "type":"AND",
  36. "description":"AND(fields=(title), KEYWORD(wayne, querypos=1, expanded))",
  37. "fields":["title"],
  38. "max_field_pos":0,
  39. "children":
  40. [
  41. {
  42. "type":"KEYWORD",
  43. "word":"wayne",
  44. "querypos":1,
  45. "expanded":true
  46. }
  47. ]
  48. },
  49. {
  50. "type":"OR",
  51. "description":"OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)), AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))",
  52. "children":
  53. [
  54. {
  55. "type":"AND",
  56. "description":"AND(fields=(title), KEYWORD(ways, querypos=1, expanded))",
  57. "fields":["title"],
  58. "max_field_pos":0,
  59. "children":
  60. [
  61. {
  62. "type":"KEYWORD",
  63. "word":"ways",
  64. "querypos":1,
  65. "expanded":true
  66. }
  67. ]
  68. },
  69. {
  70. "type":"AND",
  71. "description":"AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))",
  72. "fields":["title"],
  73. "max_field_pos":0,
  74. "children":
  75. [
  76. {
  77. "type":"KEYWORD",
  78. "word":"wayyy",
  79. "querypos":1,
  80. "expanded":true
  81. }
  82. ]
  83. }
  84. ]
  85. }
  86. ]
  87. },
  88. {
  89. "type":"AND",
  90. "description":"AND(fields=(title), KEYWORD(way, querypos=1, expanded))",
  91. "fields":["title"],
  92. "max_field_pos":0,
  93. "children":
  94. [
  95. {
  96. "type":"KEYWORD",
  97. "word":"way",
  98. "querypos":1,
  99. "expanded":true
  100. }
  101. ]
  102. },
  103. {
  104. "type":"OR",
  105. "description":"OR(fields=(title), KEYWORD(way*, querypos=1, expanded))",
  106. "fields":["title"],
  107. "max_field_pos":0,
  108. "children":
  109. [
  110. {
  111. "type":"KEYWORD",
  112. "word":"way*",
  113. "querypos":1,
  114. "expanded":true
  115. }
  116. ]
  117. }
  118. ]
  119. },
  120. {
  121. "type":"AND",
  122. "description":"AND(fields=(content), KEYWORD(hey, querypos=2))",
  123. "fields":["content"],
  124. "max_field_pos":0,
  125. "children":
  126. [
  127. {
  128. "type":"KEYWORD",
  129. "word":"hey",
  130. "querypos":2
  131. }
  132. ]
  133. }
  134. ]
  135. }
  136. }
  137. }
  1. Array
  2. (
  3. [query] => Array
  4. (
  5. [type] => AND
  6. [description] => AND( OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)), OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)), AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))), AND(fields=(title), KEYWORD(way, querypos=1, expanded)), OR(fields=(title), KEYWORD(way*, querypos=1, expanded))), AND(fields=(content), KEYWORD(hey, querypos=2)))
  7. [children] => Array
  8. (
  9. [0] => Array
  10. (
  11. [type] => OR
  12. [description] => OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)), OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)), AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))), AND(fields=(title), KEYWORD(way, querypos=1, expanded)), OR(fields=(title), KEYWORD(way*, querypos=1, expanded)))
  13. [children] => Array
  14. (
  15. [0] => Array
  16. (
  17. [type] => OR
  18. [description] => OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)), OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)), AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))))
  19. [children] => Array
  20. (
  21. [0] => Array
  22. (
  23. [type] => AND
  24. [description] => AND(fields=(title), KEYWORD(wayne, querypos=1, expanded))
  25. [fields] => Array
  26. (
  27. [0] => title
  28. )
  29. [max_field_pos] => 0
  30. [children] => Array
  31. (
  32. [0] => Array
  33. (
  34. [type] => KEYWORD
  35. [word] => wayne
  36. [querypos] => 1
  37. [expanded] => 1
  38. )
  39. )
  40. )
  41. [1] => Array
  42. (
  43. [type] => OR
  44. [description] => OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)), AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))
  45. [children] => Array
  46. (
  47. [0] => Array
  48. (
  49. [type] => AND
  50. [description] => AND(fields=(title), KEYWORD(ways, querypos=1, expanded))
  51. [fields] => Array
  52. (
  53. [0] => title
  54. )
  55. [max_field_pos] => 0
  56. [children] => Array
  57. (
  58. [0] => Array
  59. (
  60. [type] => KEYWORD
  61. [word] => ways
  62. [querypos] => 1
  63. [expanded] => 1
  64. )
  65. )
  66. )
  67. [1] => Array
  68. (
  69. [type] => AND
  70. [description] => AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))
  71. [fields] => Array
  72. (
  73. [0] => title
  74. )
  75. [max_field_pos] => 0
  76. [children] => Array
  77. (
  78. [0] => Array
  79. (
  80. [type] => KEYWORD
  81. [word] => wayyy
  82. [querypos] => 1
  83. [expanded] => 1
  84. )
  85. )
  86. )
  87. )
  88. )
  89. )
  90. )
  91. [1] => Array
  92. (
  93. [type] => AND
  94. [description] => AND(fields=(title), KEYWORD(way, querypos=1, expanded))
  95. [fields] => Array
  96. (
  97. [0] => title
  98. )
  99. [max_field_pos] => 0
  100. [children] => Array
  101. (
  102. [0] => Array
  103. (
  104. [type] => KEYWORD
  105. [word] => way
  106. [querypos] => 1
  107. [expanded] => 1
  108. )
  109. )
  110. )
  111. [2] => Array
  112. (
  113. [type] => OR
  114. [description] => OR(fields=(title), KEYWORD(way*, querypos=1, expanded))
  115. [fields] => Array
  116. (
  117. [0] => title
  118. )
  119. [max_field_pos] => 0
  120. [children] => Array
  121. (
  122. [0] => Array
  123. (
  124. [type] => KEYWORD
  125. [word] => way*
  126. [querypos] => 1
  127. [expanded] => 1
  128. )
  129. )
  130. )
  131. )
  132. )
  133. [1] => Array
  134. (
  135. [type] => AND
  136. [description] => AND(fields=(content), KEYWORD(hey, querypos=2))
  137. [fields] => Array
  138. (
  139. [0] => content
  140. )
  141. [max_field_pos] => 0
  142. [children] => Array
  143. (
  144. [0] => Array
  145. (
  146. [type] => KEYWORD
  147. [word] => hey
  148. [querypos] => 2
  149. )
  150. )
  151. )
  152. )
  153. )
  154. )
  1. {'hits': {'hits': [{u'_id': u'2811025403043381551',
  2. u'_score': 2643,
  3. u'_source': {}}],
  4. 'total': 1},
  5. 'profile': {u'query': {u'children': [{u'children': [{u'expanded': True,
  6. u'querypos': 1,
  7. u'type': u'KEYWORD',
  8. u'word': u'way*'}],
  9. u'description': u'AND(fields=(title), KEYWORD(way*, querypos=1, expanded))',
  10. u'fields': [u'title'],
  11. u'type': u'AND'},
  12. {u'children': [{u'querypos': 2,
  13. u'type': u'KEYWORD',
  14. u'word': u'hey'}],
  15. u'description': u'AND(fields=(content), KEYWORD(hey, querypos=2))',
  16. u'fields': [u'content'],
  17. u'type': u'AND'}],
  18. u'description': u'AND( AND(fields=(title), KEYWORD(way*, querypos=1, expanded)), AND(fields=(content), KEYWORD(hey, querypos=2)))',
  19. u'type': u'AND'}},
  20. 'timed_out': False,
  21. 'took': 0}
  1. {"hits": {"hits": [{"_id": "2811025403043381551",
  2. "_score": 2643,
  3. "_source": {}}],
  4. "total": 1},
  5. "profile": {"query": {"children": [{"children": [{"expanded": True,
  6. "querypos": 1,
  7. "type": "KEYWORD",
  8. "word": "way*"}],
  9. "description": "AND(fields=(title), KEYWORD(way*, querypos=1, expanded))",
  10. "fields": ["title"],
  11. "type": "AND"},
  12. {"children": [{"querypos": 2,
  13. "type": "KEYWORD",
  14. "word": "hey"}],
  15. "description": "AND(fields=(content), KEYWORD(hey, querypos=2))",
  16. "fields": ["content"],
  17. "type": "AND"}],
  18. "description": "AND( AND(fields=(title), KEYWORD(way*, querypos=1, expanded)), AND(fields=(content), KEYWORD(hey, querypos=2)))",
  19. "type": "AND"}},
  20. "timed_out": False,
  21. "took": 0}
  1. class SearchResponse {
  2. took: 18
  3. timedOut: false
  4. hits: class SearchResponseHits {
  5. total: 1
  6. hits: [{_id=2811025403043381551, _score=2643, _source={}}]
  7. aggregations: null
  8. }
  9. profile: {query={type=AND, description=AND( AND(fields=(title), KEYWORD(way*, querypos=1, expanded)), AND(fields=(content), KEYWORD(hey, querypos=2))), children=[{type=AND, description=AND(fields=(title), KEYWORD(way*, querypos=1, expanded)), fields=[title], children=[{type=KEYWORD, word=way*, querypos=1, expanded=true}]}, {type=AND, description=AND(fields=(content), KEYWORD(hey, querypos=2)), fields=[content], children=[{type=KEYWORD, word=hey, querypos=2}]}]}}
  10. }

Profiling without running a query

The SQL statement EXPLAIN QUERY allows displaying the execution tree of a provided full-text query without running an actual search query on the table.

  • SQL

SQL

  1. EXPLAIN QUERY index_base '@title running @body dog'\G

Response

  1. EXPLAIN QUERY index_base '@title running @body dog'\G
  2. *************************** 1\. row ***************************
  3. Variable: transformed_tree
  4. Value: AND(
  5. OR(
  6. AND(fields=(title), KEYWORD(run, querypos=1, morphed)),
  7. AND(fields=(title), KEYWORD(running, querypos=1, morphed))))
  8. AND(fields=(body), KEYWORD(dog, querypos=2, morphed)))

EXPLAIN QUERY ... option format=dot allows displaying the execution tree of a provided full-text query in hierarchical format suitable for visualization by existing tools, for example https://dreampuf.github.io/GraphvizOnline :

EXPLAIN QUERY graphviz example

  • SQL

SQL

  1. EXPLAIN QUERY tbl 'i me' option format=dot\G

Response

  1. EXPLAIN QUERY tbl 'i me' option format=dot\G
  2. *************************** 1. row ***************************
  3. Variable: transformed_tree
  4. Value: digraph "transformed_tree"
  5. {
  6. 0 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
  7. 0 -> 1
  8. 1 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
  9. 1 -> 2
  10. 2 [shape=record label="i | { querypos=1 }"]
  11. 0 -> 3
  12. 3 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
  13. 3 -> 4
  14. 4 [shape=record label="me | { querypos=2 }"]
  15. }

Viewing the match factors values

When expression ranker is used, it is possible to expose the values of the calculated factors using PACKEDFACTORS()).

The function returns:

  • the values of document level factors (like bm25, field_mask, doc_word_count)
  • list with each field that returned a hit (like lcs, hit_count, word_count, sum_idf, min_hit_pos etc.)
  • list with each keyword from the query and their tf and idf values

The values can be used to understand why certain documents get scored lower or higher in a search or to improve the existing ranking expression.

  • SQL

SQL

  1. SELECT id, PACKEDFACTORS() FROM test1 WHERE MATCH('test one') OPTION ranker=expr('1')\G

Response

  1. id: 1
  2. packedfactors(): bm25=569, bm25a=0.617197, field_mask=2, doc_word_count=2,
  3. field1=(lcs=1, hit_count=2, word_count=2, tf_idf=0.152356,
  4. min_idf=-0.062982, max_idf=0.215338, sum_idf=0.152356, min_hit_pos=4,
  5. min_best_span_pos=4, exact_hit=0, max_window_hits=1, min_gaps=2,
  6. exact_order=1, lccs=1, wlccs=0.215338, atc=-0.003974),
  7. word0=(tf=1, idf=-0.062982),
  8. word1=(tf=1, idf=0.215338)
  9. 1 row in set (0.00 sec)

Boolean optimization

Queries may be automatically optimized if OPTION boolean_simplify=1 is specified. Some transformations performed by this optimization include:

  • Excess brackets: ((A | B) | C) becomes ( A | B | C ); ((A B) C) becomes ( A B C )
  • Excess AND NOT: ((A !N1) !N2) becomes (A !(N1 | N2))
  • Common NOT: ((A !N) | (B !N)) becomes ((A|B) !N)
  • Common Compound NOT: ((A !(N AA)) | (B !(N BB))) becomes (((A|B) !N) | (A !AA) | (B !BB)) if the cost of evaluating N is greater than the added together costs of evaluating A and B
  • Common subterm: ((A (N | AA)) | (B (N | BB))) becomes (((A|B) N) | (A AA) | (B BB)) if the cost of evaluating N is greater than the added together costs of evaluating A and B
  • Common keywords: (A | “A B”~N) becomes A; (“A B” | “A B C”) becomes “A B”; (“A B”~N | “A B C”~N) becomes (“A B”~N)
  • Common phrase: (“X A B” | “Y A B”) becomes (“(“X”|”Y”) A B”)
  • Common AND NOT: ((A !X) | (A !Y) | (A !Z)) becomes (A !(X Y Z))
  • Common OR NOT: ((A !(N | N1)) | (B !(N | N2))) becomes (( (A !N1) | (B !N2) ) !N) Note that optimizing the queries consumes CPU time, so for simple queries -or for hand-optimized queries- you’ll do better with the default boolean_simplify=0 value. Simplifications are often better for complex queries, or algorithmically generated queries.

Queries like “-dog”, which implicitly include all documents from the collection, can not be evaluated. This is both for technical and performance reasons. Technically, Manticore does not always keep a list of all IDs. Performance-wise, when the collection is huge (ie. 10-100M documents), evaluating such queries could take very long.

Search results

SQL

When you run a query via SQL over mysql protocol as a result you get the requested columns back or empty result set in case nothing is found.

  • SQL

SQL

  1. SELECT * FROM tbl;

Response

  1. +------+------+--------+
  2. | id | age | name |
  3. +------+------+--------+
  4. | 1 | 25 | joe |
  5. | 2 | 25 | mary |
  6. | 3 | 33 | albert |
  7. +------+------+--------+
  8. 3 rows in set (0.00 sec)

In addition to that you can use SHOW META call to see additional meta-information about the latest query.

  • SQL

SQL

  1. SELECT id,story_author,comment_author FROM hn_small WHERE story_author='joe' LIMIT 3; SHOW META;

Response

  1. ++--------+--------------+----------------+
  2. | id | story_author | comment_author |
  3. +--------+--------------+----------------+
  4. | 152841 | joe | SwellJoe |
  5. | 161323 | joe | samb |
  6. | 163735 | joe | jsjenkins168 |
  7. +--------+--------------+----------------+
  8. 3 rows in set (0.01 sec)
  9. +----------------+-------+
  10. | Variable_name | Value |
  11. +----------------+-------+
  12. | total | 3 |
  13. | total_found | 20 |
  14. | total_relation | gte |
  15. | time | 0.010 |
  16. +----------------+-------+
  17. 4 rows in set (0.00 sec)

In some cases, e.g. when you do faceted search you can get multiple result sets as a response to your SQL query.

  • SQL

SQL

  1. SELECT * FROM tbl WHERE MATCH('joe') FACET age;

Response

  1. +------+------+
  2. | id | age |
  3. +------+------+
  4. | 1 | 25 |
  5. +------+------+
  6. 1 row in set (0.00 sec)
  7. +------+----------+
  8. | age | count(*) |
  9. +------+----------+
  10. | 25 | 1 |
  11. +------+----------+
  12. 1 row in set (0.00 sec)

In case of a warning the result set will include a warning flag and you can see the warning using SHOW WARNINGS.

  • SQL

SQL

  1. SELECT * from tbl where match('"joe"/3'); show warnings;

Response

  1. +------+------+------+
  2. | id | age | name |
  3. +------+------+------+
  4. | 1 | 25 | joe |
  5. +------+------+------+
  6. 1 row in set, 1 warning (0.00 sec)
  7. +---------+------+--------------------------------------------------------------------------------------------+
  8. | Level | Code | Message |
  9. +---------+------+--------------------------------------------------------------------------------------------+
  10. | warning | 1000 | quorum threshold too high (words=1, thresh=3); replacing quorum operator with AND operator |
  11. +---------+------+--------------------------------------------------------------------------------------------+
  12. 1 row in set (0.00 sec)

If your query fails you will get an error:

  • SQL

SQL

  1. SELECT * from tbl where match('@surname joe');

Response

  1. ERROR 1064 (42000): index idx: query error: no field 'surname' found in schema

HTTP

Via HTTP JSON iterface query result is sent as a JSON document. Example:

  1. {
  2. "took":10,
  3. "timed_out": false,
  4. "hits":
  5. {
  6. "total": 2,
  7. "hits":
  8. [
  9. {
  10. "_id": "1",
  11. "_score": 1,
  12. "_source": { "gid": 11 }
  13. },
  14. {
  15. "_id": "2",
  16. "_score": 1,
  17. "_source": { "gid": 12 }
  18. }
  19. ]
  20. }
  21. }
  • took: time in milliseconds it took to execute the search
  • timed_out: if the query timed out or not
  • hits: search results. has the following properties:
  • total: total number of matching documents
  • hits: an array containing matches

Query result can also include query profile information, see Query profile.

Each match in the hits array has the following properties:

  • _id: match id
  • _score: match weight, calculated by ranker
  • _source: an array containing the attributes of this match

Source selection

By default all attributes are returned in the _source array. You can use the _source property in the request payload to select the fields you want to be included in the result set. Example:

  1. {
  2. "index":"test",
  3. "_source":"attr*",
  4. "query": { "match_all": {} }
  5. }

You can specify the attributes which you want to include in the query result as a string ("_source": "attr*") or as an array of strings ("_source": [ "attr1", "attri*" ]"). Each entry can be an attribute name or a wildcard (*, % and ? symbols are supported).

You can also explicitly specify which attributes you want to include and which to exclude from the result set using the includes and excludes properties:

  1. "_source":
  2. {
  3. "includes": [ "attr1", "attri*" ],
  4. "excludes": [ "*desc*" ]
  5. }

An empty list of includes is interpreted as “include all attributes” while an empty list of excludes does not match anything. If an attribute matches both the includes and excludes, then the excludes win.