Case-insensitive Search with ArangoSearch

You can normalize values for case-insensitive matching and to ignore diacritics, also in combination with other search techniques

Normalizing a Single Token

Dataset: IMDB movie dataset

Custom Analyzer:

Create a norm Analyzer in arangosh to normalize case to all lowercase and to remove diacritics:

  1. //db._useDatabase("your_database"); // Analyzer will be created in current database
  2. var analyzers = require("@arangodb/analyzers");
  3. analyzers.save("norm_en", "norm", { locale: "en.utf-8", accent: false, case: "lower" }, []);

View definition:

  1. {
  2. "links": {
  3. "imdb_vertices": {
  4. "fields": {
  5. "title": {
  6. "analyzers": [
  7. "norm_en"
  8. ]
  9. }
  10. }
  11. }
  12. }
  13. }

AQL queries:

Match movie title, ignoring capitalization and using the base characters instead of accented characters (full string):

  1. FOR doc IN imdb
  2. SEARCH ANALYZER(doc.title == TOKENS("thé mäTRïX", "norm_en")[0], "norm_en")
  3. RETURN doc.title
Result
The Matrix

Match a title prefix (case-insensitive):

  1. FOR doc IN imdb
  2. SEARCH ANALYZER(STARTS_WITH(doc.title, "the matr"), "norm_en")
  3. RETURN doc.title
Result
The Matrix Revisited
The Matrix
The Matrix Reloaded
The Matrix Revolutions
The Matrix Trilogy