Typo tolerance

MeiliSearch is typo tolerant; this means it understands your search even if there are typos or spelling mistakes.

Example

On a movie dataset, let’s search for botman.

  1. {
  2. "hits": [
  3. {
  4. "title": "Batman: Hush",
  5. },
  6. {
  7. "title": "Batman vs. Teenage Mutant Ninja Turtles",
  8. },
  9. {
  10. "title": "Batman Ninja",
  11. },
  12. {
  13. "title": "Batman: Gotham by Gaslight",
  14. },
  15. ],
  16. "offset": 0,
  17. "limit": 20,
  18. "processingTimeMs": 1,
  19. "query": "botman"
  20. }

Typo tolerance rules

The typo rules are used before sorting the documents. They are used to aggregate them, to choose which documents contain words similar to the queried words.

We use a prefix Levenshtein algorithmTypo tolerance - 图1 (opens new window) to check if the words match. The only difference with a Levenshtein algorithm is that it accepts every word that starts with the query words too. Therefore, words are accepted if they start with or have equal length.

The Levenshtein distance between two words M and P is called “the minimum cost of transforming M into P“ by performing the following elementary operations:

  • substitution of a character of M by a character other than P. (e.g. kitten → sitten)
  • insertion in M of a character of P. (e.g. siting → sitting)
  • deletion of a character from M. (e.g. saturday → satuday)

There are some rules about what can be considered “similar”. These rules are by word and not for the whole query string.

  • If the query word is between 1 and 4 characters long, therefore, no typo is allowed. Only documents that contain words that start with or are of equal length with this query word are considered valid for this request.
  • If the query word is between 5 and 8 characters long, one typo is allowed. Documents that contain words that match with one typo are retained for the next steps.
  • If the query word contains more than 8 characters, we accept a maximum of two typos.

This means that “saturday”, which is 7 characters long use the second rule, and every document containing words that only have one typo will match. For example:

  • “saturday” is accepted because it is the same word
  • “sat” is not accepted because the query word is not a prefix of it (it is the opposite)
  • “satuday” is accepted because it contains one typo
  • “sutuday” is not accepted because it contains two typos