Intervals query

The intervals query matches documents based on the proximity and order of matching terms. It applies a set of matching rules to terms contained in the specified field. The query generates sequences of minimal intervals that span terms in the text. You can combine the intervals and filter them by parent sources.

Consider an index containing the following documents:

  1. PUT testindex/_doc/1
  2. {
  3. "title": "key-value pairs are efficiently stored in a hash table"
  4. }

copy

  1. PUT /testindex/_doc/2
  2. {
  3. "title": "store key-value pairs in a hash map"
  4. }

copy

For example, the following query searches for documents containing the phrase key-value pairs (with no gap separating the terms) followed by either hash table or hash map:

  1. GET /testindex/_search
  2. {
  3. "query": {
  4. "intervals": {
  5. "title": {
  6. "all_of": {
  7. "ordered": true,
  8. "intervals": [
  9. {
  10. "match": {
  11. "query": "key-value pairs",
  12. "max_gaps": 0,
  13. "ordered": true
  14. }
  15. },
  16. {
  17. "any_of": {
  18. "intervals": [
  19. {
  20. "match": {
  21. "query": "hash table"
  22. }
  23. },
  24. {
  25. "match": {
  26. "query": "hash map"
  27. }
  28. }
  29. ]
  30. }
  31. }
  32. ]
  33. }
  34. }
  35. }
  36. }
  37. }

copy

The query returns both documents:

Response

  1. {
  2. "took": 1011,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 2,
  13. "relation": "eq"
  14. },
  15. "max_score": 0.25,
  16. "hits": [
  17. {
  18. "_index": "testindex",
  19. "_id": "2",
  20. "_score": 0.25,
  21. "_source": {
  22. "title": "store key-value pairs in a hash map"
  23. }
  24. },
  25. {
  26. "_index": "testindex",
  27. "_id": "1",
  28. "_score": 0.14285713,
  29. "_source": {
  30. "title": "key-value pairs are efficiently stored in a hash table"
  31. }
  32. }
  33. ]
  34. }
  35. }

Parameters

The query accepts the name of the field (<field>) as a top-level parameter:

  1. GET _search
  2. {
  3. "query": {
  4. "intervals": {
  5. "<field>": {
  6. ...
  7. }
  8. }
  9. }
  10. }

copy

The <field> accepts the following rule objects that are used to match documents based on terms, order, and proximity.

RuleDescription
matchMatches analyzed text.
prefixMatches terms that start with a specified set of characters.
wildcardMatches terms using a wildcard pattern.
fuzzyMatches terms that are similar to the provided term within a specified edit distance.
all_ofCombines multiple rules using a conjunction (AND).
any_ofCombines multiple rules using a disjunction (OR).

The match rule

The match rule matches analyzed text. The following table lists all parameters the match rule supports.

ParameterRequired/OptionalData typeDescription
queryRequiredStringText for which to search.
analyzerOptionalStringThe analyzer used to analyze the query text. Default is the analyzer specified for the <field>.
filterOptionalInterval filter rule objectA rule used to filter returned intervals.
max_gapsOptionalIntegerThe maximum allowed number of positions between the matching terms. Terms further apart than max_gaps are not considered matches. If max_gaps is not specified or is set to -1, terms are considered matches regardless of their position. If max_gaps is set to 0, matching terms must appear next to each other. Default is -1.
orderedOptionalBooleanSpecifies whether matching terms must appear in their specified order. Default is false.
use_fieldOptionalStringSpecifies to search this field instead of the top-level . Terms are analyzed using the search analyzer specified for this field. By specifying use_field, you can search across multiple fields as if they were all the same field. For example, if you index the same text into stemmed and unstemmed fields, you can search for stemmed tokens that are near unstemmed ones.

The prefix rule

The prefix rule matches terms that start with a specified set of characters (prefix). The prefix can expand to match at most 128 terms. If the prefix matches more than 128 terms, an error is returned. The following table lists all parameters the prefix rule supports.

ParameterRequired/OptionalData typeDescription
prefixRequiredStringThe prefix used to match terms.
analyzerOptionalStringThe analyzer used to normalize the prefix. Default is the analyzer specified for the <field>.
use_fieldOptionalStringSpecifies to search this field instead of the top-level . The prefix is normalized using the search analyzer specified for this field, unless you specify an analyzer.

The wildcard rule

The wildcard rule matches terms using a wildcard pattern. The wildcard pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, an error is returned. The following table lists all parameters the wildcard rule supports.

ParameterRequired/OptionalData typeDescription
patternRequiredStringThe wildcard pattern used to match terms. Specify ? to match any single character or * to match zero or more characters.
analyzerOptionalStringThe analyzer used to normalize the pattern. Default is the analyzer specified for the <field>.
use_fieldOptionalStringSpecifies to search this field instead of the top-level . The prefix is normalized using the search analyzer specified for this field, unless you specify an analyzer.

Specifying patterns that start with * or ? can hinder search performance because it increases the number of iterations required to match terms.

The fuzzy rule

The fuzzy rule matches terms that are similar to the provided term within a specified edit distance. The fuzzy pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, an error is returned. The following table lists all parameters the fuzzy rule supports.

ParameterRequired/OptionalData typeDescription
termRequiredStringThe term to match.
analyzerOptionalStringThe analyzer used to normalize the term. Default is the analyzer specified for the <field>.
fuzzinessOptionalStringThe number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between wined and wind is 1. Valid values are non-negative integers or AUTO. The default, AUTO, chooses a value based on the length of each term and is a good choice for most use cases.
transpositionsOptionalBooleanSetting transpositions to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the fuzziness option. For example, the distance between wind and wnid is 1 if transpositions is true (swap “n” and “i”) and 2 if it is false (delete “n”, insert “n”). If transpositions is false, rewind and wnid have the same distance (2) from wind, despite the more human-centric opinion that wnid is an obvious typo. The default is a good choice for most use cases.
prefix_lengthOptionalIntegerThe number of beginning characters left unchanged for fuzzy matching. Default is 0.
use_fieldOptionalStringSpecifies to search this field instead of the top-level . The term is normalized using the search analyzer specified for this field, unless you specify an analyzer.

The all_of rule

The all_of rule combines multiple rules using a conjunction (AND). The following table lists all parameters the all_of rule supports.

ParameterRequired/OptionalData typeDescription
intervalsRequiredArray of rule objectsAn array of rules to combine. A document must match all rules in order to be returned in the results.
filterOptionalInterval filter rule objectA rule used to filter returned intervals.
max_gapsOptionalIntegerThe maximum allowed number of positions between the matching terms. Terms further apart than max_gaps are not considered matches. If max_gaps is not specified or is set to -1, terms are considered matches regardless of their position. If max_gaps is set to 0, matching terms must appear next to each other. Default is -1.
orderedOptionalBooleanIf true, intervals generated by the rules should appear in the specified order. Default is false.

The any_of rule

The any_of rule combines multiple rules using a disjunction (OR). The following table lists all parameters the any_of rule supports.

ParameterRequired/OptionalData typeDescription
intervalsRequiredArray of rule objectsAn array of rules to combine. A document must match at least one rule in order to be returned in the results.
filterOptionalInterval filter rule objectA rule used to filter returned intervals.

The filter rule

The filter rule is used to restrict the results. The following table lists all parameters the filter rule supports.

ParameterRequired/OptionalData typeDescription
afterOptionalQuery objectA query used to return intervals that follow an interval specified in the filter rule.
beforeOptionalQuery objectA query used to return intervals that are before an interval specified in the filter rule.
contained_byOptionalQuery objectA query used to return intervals contained by an interval specified in the filter rule.
containingOptionalQuery objectA query used to return intervals that contain an interval specified in the filter rule.
not_contained_byOptionalQuery objectA query used to return intervals that are not contained by an interval specified in the filter rule.
not_containingOptionalQuery objectA query used to return intervals that do not contain an interval specified in the filter rule.
not_overlappingOptionalQuery objectA query used to return intervals that do not overlap with an interval specified in the filter rule.
overlappingOptionalQuery objectA query used to return intervals that overlap with an interval specified in the filter rule.
scriptOptionalScript objectA script used to match documents. This script must return true or false.

Example: Filters

The following query searches for documents containing the words pairs and hash that are within five positions of each other and don’t contain the word efficiently between them:

  1. POST /testindex/_search
  2. {
  3. "query": {
  4. "intervals" : {
  5. "title" : {
  6. "match" : {
  7. "query" : "pairs hash",
  8. "max_gaps" : 5,
  9. "filter" : {
  10. "not_containing" : {
  11. "match" : {
  12. "query" : "efficiently"
  13. }
  14. }
  15. }
  16. }
  17. }
  18. }
  19. }
  20. }

copy

The response contains only document 2:

Response

  1. {
  2. "took": 2,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 0.25,
  16. "hits": [
  17. {
  18. "_index": "testindex",
  19. "_id": "2",
  20. "_score": 0.25,
  21. "_source": {
  22. "title": "store key-value pairs in a hash map"
  23. }
  24. }
  25. ]
  26. }
  27. }

Example: Script filters

Alternatively, you can write your own script filter to include with the intervals query using the following variables:

  • interval.start: The position (term number) where the interval starts.
  • interval.end: The position (term number) where the interval ends.
  • interval.gap: The number of words between the terms.

For example, the following query searches for the words map and hash that are next to each other within the specified interval. Terms are numbered starting with 0, so in the text store key-value pairs in a hash map, store is at position 0, keyis at position 1, and so on. The specified interval should start after a and end before the end of string:

  1. POST /testindex/_search
  2. {
  3. "query": {
  4. "intervals" : {
  5. "title" : {
  6. "match" : {
  7. "query" : "map hash",
  8. "filter" : {
  9. "script" : {
  10. "source" : "interval.start > 5 && interval.end < 8 && interval.gaps == 0"
  11. }
  12. }
  13. }
  14. }
  15. }
  16. }
  17. }

copy

The response contains document 2:

Response

  1. {
  2. "took": 1,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 0.5,
  16. "hits": [
  17. {
  18. "_index": "testindex",
  19. "_id": "2",
  20. "_score": 0.5,
  21. "_source": {
  22. "title": "store key-value pairs in a hash map"
  23. }
  24. }
  25. ]
  26. }
  27. }

Interval minimization

To ensure that queries run in linear time, the intervals query minimizes the intervals. For example, consider a document containing the text a b c d c. You can use the following query to search for d that is contained by a and c:

  1. POST /testindex/_search
  2. {
  3. "query": {
  4. "intervals" : {
  5. "my_text" : {
  6. "match" : {
  7. "query" : "d",
  8. "filter" : {
  9. "contained_by" : {
  10. "match" : {
  11. "query" : "a c"
  12. }
  13. }
  14. }
  15. }
  16. }
  17. }
  18. }
  19. }

copy

The query returns no results because it matches the first two terms a c and finds no d between these terms.