Fulltext queries

It is recommended to use AQL instead, see Fulltext functions.

ArangoDB allows to run queries on text contained in document attributes. To usethis, a fulltext index must be defined for the attribute of the collection thatcontains the text. Creating the index will parse the text in the specifiedattribute for all documents of the collection. Only documents will be indexedthat contain a textual value in the indexed attribute. For such documents, thetext value will be parsed, and the individual words will be inserted into thefulltext index.

When a fulltext index exists, it can be queried using a fulltext query.

Fulltext

queries the fulltext indexcollection.fulltext(attribute, query)

The fulltext simple query functions performs a fulltext search on the specifiedattribute and the specified query.

Details about the fulltext query syntax can be found below.

Note: the fulltext simple query function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferredway for executing fulltext queries is to use an AQL query using the FULLTEXTAQL function as follows:

  1. FOR doc IN FULLTEXT(@@collection, @attributeName, @queryString, @limit)
  2. RETURN doc

Examples

  1. arangosh> db.emails.ensureFulltextIndex("content");
  2. arangosh> db.emails.save({ content:
  3. ........> "Hello Alice, how are you doing? Regards, Bob"});
  4. arangosh> db.emails.save({ content:
  5. ........> "Hello Charlie, do Alice and Bob know about it?"});
  6. arangosh> db.emails.save({ content: "I think they don't know. Regards, Eve" });
  7. arangosh> db.emails.fulltext("content", "charlie,|eve").toArray();

Show execution results

Hide execution results

  1. {
  2. "fields" : [
  3. "content"
  4. ],
  5. "id" : "emails/73862",
  6. "isNewlyCreated" : true,
  7. "minLength" : 2,
  8. "name" : "idx_1655126002109513728",
  9. "sparse" : true,
  10. "type" : "fulltext",
  11. "unique" : false,
  12. "code" : 201
  13. }
  14. {
  15. "_id" : "emails/73866",
  16. "_key" : "73866",
  17. "_rev" : "_Z2KDPkS---"
  18. }
  19. {
  20. "_id" : "emails/73868",
  21. "_key" : "73868",
  22. "_rev" : "_Z2KDPkW---"
  23. }
  24. {
  25. "_id" : "emails/73870",
  26. "_key" : "73870",
  27. "_rev" : "_Z2KDPkW--A"
  28. }
  29. [
  30. {
  31. "_key" : "73868",
  32. "_id" : "emails/73868",
  33. "_rev" : "_Z2KDPkW---",
  34. "content" : "Hello Charlie, do Alice and Bob know about it?"
  35. },
  36. {
  37. "_key" : "73870",
  38. "_id" : "emails/73870",
  39. "_rev" : "_Z2KDPkW--A",
  40. "content" : "I think they don't know. Regards, Eve"
  41. }
  42. ]

Syntax

In the simplest form, a fulltext query contains just the sought word. Ifmultiple search words are given in a query, they should be separated by commas.All search words will be combined with a logical AND by default, and only suchdocuments will be returned that contain all search words. This default behaviorcan be changed by providing the extra control characters in the fulltext query,which are:

  • +: logical AND (intersection)
  • |: logical OR (union)
  • -: negation (exclusion)

Examples:

  • "banana": searches for documents containing “banana”
  • "banana,apple": searches for documents containing both “banana” AND “apple”
  • "banana,|orange": searches for documents containing either “banana” OR “orange” OR both
  • "banana,-apple": searches for documents that contains “banana” but NOT “apple”.

Logical operators are evaluated from left to right.

Each search word can optionally be prefixed with complete: or prefix:, withcomplete: being the default. This allows searching for complete words or forword prefixes. Suffix searches or any other forms are partial-word matching arecurrently not supported.

Examples:

  • "complete:banana": searches for documents containing the exact word “banana”
  • "prefix:head": searches for documents with words that start with prefix “head”
  • "prefix:head,banana": searches for documents contain words starting with prefix “head” and that also contain the exact word “banana”.

Complete match and prefix search options can be combined with the logicaloperators.

Please note that only words with a minimum length will get indexed. This minimumlength can be defined when creating the fulltext index. For words tokenization,the libicu text boundary analysis is used, which takes into account the defaultas defined at server startup (—server.default-language startupoption). Generally, the word boundary analysis will filter out punctuation butwill not do much more.

Especially no word normalization, stemming, or similarity analysis will beperformed when indexing or searching. If any of these features is required, itis suggested that the user does the text normalization on the client side, andprovides for each document an extra attribute containing just a comma-separatedlist of normalized words. This attribute can then be indexed with a fulltextindex, and the user can send fulltext queries for this index, with the fulltextqueries also containing the stemmed or normalized versions of words as requiredby the user.