Specify a Language for Text Index

This tutorial describes how to specify the default languageassociated with the text indexand also how to create text indexes for collections that containdocuments in different languages.

Specify the Default Language for a text Index

The default language associated with the indexed data determines therules to parse word roots (i.e. stemming) and ignore stop words. Thedefault language for the indexed data is english.

To specify a different language, use the default_language optionwhen creating the text index. See Text Search Languages forthe languages available for default_language.

The following example creates for the quotes collection a textindex on the content field and sets the default_language tospanish:

  1. db.quotes.createIndex(
  2. { content : "text" },
  3. { default_language: "spanish" }
  4. )

Create a text Index for a Collection in Multiple Languages

Specify the Index Language within the Document

If a collection contains documents or embedded documents that are indifferent languages, include a field named language in thedocuments or embedded documents and specify as its value the language forthat document or embedded document.

MongoDB will use the specified language for that document orembedded document when building the text index:

  • The specified language in the document overrides the default languagefor the text index.
  • The specified language in an embedded document override the languagespecified in an enclosing document or the default language for theindex.

See Text Search Languages for a list of supported languages.

For example, a collection quotes contains multi-language documentsthat include the language field in the document and/or theembedded document as needed:

  1. {
  2. _id: 1,
  3. language: "portuguese",
  4. original: "A sorte protege os audazes.",
  5. translation:
  6. [
  7. {
  8. language: "english",
  9. quote: "Fortune favors the bold."
  10. },
  11. {
  12. language: "spanish",
  13. quote: "La suerte protege a los audaces."
  14. }
  15. ]
  16. }
  17. {
  18. _id: 2,
  19. language: "spanish",
  20. original: "Nada hay más surrealista que la realidad.",
  21. translation:
  22. [
  23. {
  24. language: "english",
  25. quote: "There is nothing more surreal than reality."
  26. },
  27. {
  28. language: "french",
  29. quote: "Il n'y a rien de plus surréaliste que la réalité."
  30. }
  31. ]
  32. }
  33. {
  34. _id: 3,
  35. original: "is this a dagger which I see before me.",
  36. translation:
  37. {
  38. language: "spanish",
  39. quote: "Es este un puñal que veo delante de mí."
  40. }
  41. }

If you create a text index on the quote field with the defaultlanguage of English.

  1. db.quotes.createIndex( { original: "text", "translation.quote": "text" } )

Then, for the documents and embedded documents that contain the languagefield, the text index uses that language to parse word stems andother linguistic characteristics.

For embedded documents that do not contain the language field,

  • If the enclosing document contains the language field, then theindex uses the document’s language for the embedded document.
  • Otherwise, the index uses the default language for the embedded documents.

For documents that do not contain the language field, the indexuses the default language, which is English.

Use any Field to Specify the Language for a Document

To use a field with a name other than language, includethe language_override option when creating the index.

For example, give the following command to use idioma as the fieldname instead of language:

  1. db.quotes.createIndex( { quote : "text" },
  2. { language_override: "idioma" } )

The documents of the quotes collection may specify a language withthe idioma field:

  1. { _id: 1, idioma: "portuguese", quote: "A sorte protege os audazes" }
  2. { _id: 2, idioma: "spanish", quote: "Nada hay más surrealista que la realidad." }
  3. { _id: 3, idioma: "english", quote: "is this a dagger which I see before me" }