[[phrase-matching]]
=== Phrase Matching

In the same way that the match query is the go-to query for standard
full-text search, the match_phrase query(((“proximity matching”, “phrase matching”)))(((“phrase matching”)))(((“match_phrase query”))) is the one you should reach for
when you want to find words that are near each other:

[source,js]

GET /my_index/my_type/_search
{
“query”: {
“match_phrase”: {
“title”: “quick brown fox”
}
}

}

// SENSE: 120_Proximity_Matching/05_Match_phrase_query.json

Like the match query, the match_phrase query first analyzes the query
string to produce a list of terms. It then searches for all the terms, but
keeps only documents that contain all of the search terms, in the same
positions relative to each other. A query for the phrase quick fox
would not match any of our documents, because no document contains the word
quick immediately followed by fox.

[TIP]

The match_phrase query can also be written as a match query with type
phrase:

[source,js]

“match”: {
“title”: {
“query”: “quick brown fox”,
“type”: “phrase”
}

}

// SENSE: 120_Proximity_Matching/05_Match_phrase_query.json

==================================================

==== Term Positions

When a string is analyzed, the analyzer returns not(((“phrase matching”, “term positions”)))(((“matchphrase query”, “position of terms”)))(((“position-aware matching”))) only a list of terms, but
also the _position
, or order, of each term in the original string:

[source,js]

GET /_analyze?analyzer=standard

Quick brown fox

// SENSE: 120_Proximity_Matching/05_Term_positions.json

This returns the following:

[role=”pagebreak-before”]

[source,js]

{
“tokens”: [
{
“token”: “quick”,
“start_offset”: 0,
“end_offset”: 5,
“type”: ““,
“position”: 1 <1>
},
{
“token”: “brown”,
“start_offset”: 6,
“end_offset”: 11,
“type”: ““,
“position”: 2 <1>
},
{
“token”: “fox”,
“start_offset”: 12,
“end_offset”: 15,
“type”: ““,
“position”: 3 <1>
}
]

}

<1> The position of each term in the original string.

Positions can be stored in the inverted index, and position-aware queries like
the match_phrase query can use them to match only documents that contain
all the words in exactly the order specified, with no words in-between.

==== What Is a Phrase

For a document to be considered a(((“match_phrase query”, “documents matching a phrase”)))(((“phrase matching”, “criteria for matching documents”))) match for the phrase ``quick brown fox,’’ the following must be true:

  • quick, brown, and fox must all appear in the field.

  • The position of brown must be 1 greater than the position of quick.

  • The position of fox must be 2 greater than the position of quick.

If any of these conditions is not met, the document is not considered a match.

[TIP]

Internally, the match_phrase query uses the low-level span query family to
do position-aware matching. (((“match_phrase query”, “use of span queries for position-aware matching”)))(((“span queries”)))Span queries are term-level queries, so they have
no analysis phase; they search for the exact term specified.

Thankfully, most people never need to use the span queries directly, as the
match_phrase query is usually good enough. However, certain specialized
fields, like patent searches, use these low-level queries to perform very
specific, carefully constructed positional searches.

==================================================