Model Data to Support Keyword Search

Note

Keyword search is not the same as text search or full textsearch, and does not provide stemming or other text-processingfeatures. See the Limitations of Keyword Indexes section for moreinformation.

In 2.4, MongoDB provides a text search feature. SeeText Indexes for more information.

If your application needs to perform queries on the content of a fieldthat holds text you can perform exact matches on the text or use$regex to use regular expression pattern matches. However,for many operations on text, these methods do not satisfy applicationrequirements.

This pattern describes one method for supporting keyword search usingMongoDB to support application search functionality, that useskeywords stored in an array in the same document as the textfield. Combined with a multi-key index,this pattern can support application’s keyword search operations.

Pattern

To add structures to your document to support keyword-based queries,create an array field in your documents and add the keywords asstrings in the array. You can then create a multi-key index on the array and create queries that selectvalues from the array.

Example

Given a collection of library volumes that you want to providetopic-based search. For each volume, you add the array topics,and you add as many keywords as needed for a given volume.

For the Moby-Dick volume you might have the following document:

  1. { title : "Moby-Dick" ,
  2. author : "Herman Melville" ,
  3. published : 1851 ,
  4. ISBN : 0451526996 ,
  5. topics : [ "whaling" , "allegory" , "revenge" , "American" ,
  6. "novel" , "nautical" , "voyage" , "Cape Cod" ]
  7. }

You then create a multi-key index on the topics array:

  1. db.volumes.createIndex( { topics: 1 } )

The multi-key index creates separate index entries for each keyword inthe topics array. For example the index contains one entry forwhaling and another for allegory.

You then query based on the keywords. For example:

  1. db.volumes.findOne( { topics : "voyage" }, { title: 1 } )

Note

An array with a large number of elements, such as one withseveral hundreds or thousands of keywords will incur greaterindexing costs on insertion.

Limitations of Keyword Indexes

MongoDB can support keyword searches using specific data models andmulti-key indexes; however, these keywordindexes are not sufficient or comparable to full-text products in thefollowing respects:

  • Stemming. Keyword queries in MongoDB can not parse keywords forroot or related words.
  • Synonyms. Keyword-based search features must provide support forsynonym or related queries in the application layer.
  • Ranking. The keyword look ups described in this document do notprovide a way to weight results.
  • Asynchronous Indexing. MongoDB builds indexes synchronously, whichmeans that the indexes used for keyword indexes are always currentand can operate in real-time. However, asynchronous bulk indexesmay be more efficient for some kinds of content and workloads.