搜索

A common task for web applications is to search some data in the database withuser input. In a simple case, this could be filtering a list of objects by acategory. A more complex use case might require searching with weighting,categorization, highlighting, multiple languages, and so on. This documentexplains some of the possible use cases and the tools you can use.

We'll refer to the same models used in 进行查询.

Use Cases

Standard textual queries

Text-based fields have a selection of simple matching operations. For example,you may wish to allow lookup up an author like so:

  1. >>> Author.objects.filter(name__contains='Terry')
  2. [<Author: Terry Gilliam>, <Author: Terry Jones>]

This is a very fragile solution as it requires the user to know an exactsubstring of the author's name. A better approach could be a case-insensitivematch (icontains), but this is only marginally better.

A database's more advanced comparison functions

If you're using PostgreSQL, Django provides a selection of databasespecific tools to allow you to leverage morecomplex querying options. Other databases have different selections of tools,possibly via plugins or user-defined functions. Django doesn't include anysupport for them at this time. We'll use some examples from PostgreSQL todemonstrate the kind of functionality databases may have.

Searching in other databases

All of the searching tools provided by django.contrib.postgres areconstructed entirely on public APIs such as custom lookups and database functions. Depending on your database, you shouldbe able to construct queries to allow similar APIs. If there are specificthings which cannot be achieved this way, please open a ticket.

In the above example, we determined that a case insensitive lookup would bemore useful. When dealing with non-English names, a further improvement is touse unaccented comparison:

  1. >>> Author.objects.filter(name__unaccent__icontains='Helen')
  2. [<Author: Helen Mirren>, <Author: Helena Bonham Carter>, <Author: Hélène Joy>]

This shows another issue, where we are matching against a different spelling ofthe name. In this case we have an asymmetry though - a search for Helenwill pick up Helena or Hélène, but not the reverse. Another optionwould be to use a trigram_similar comparison, which comparessequences of letters.

例如:

  1. >>> Author.objects.filter(name__unaccent__lower__trigram_similar='Hélène')
  2. [<Author: Helen Mirren>, <Author: Hélène Joy>]

Now we have a different problem - the longer name of "Helena Bonham Carter"doesn't show up as it is much longer. Trigram searches consider allcombinations of three letters, and compares how many appear in both search andsource strings. For the longer name, there are more combinations which appearin the source string so it is no longer considered a close match.

The correct choice of comparison functions here depends on your particular dataset, for example the language(s) used and the type of text being searched. Allof the examples we've seen are on short strings where the user is likely toenter something close (by varying definitions) to the source data.

Simple database operations are too simple an approach when you startconsidering large blocks of text. Whereas the examples above can be thought ofas operations on a string of characters, full text search looks at the actualwords. Depending on the system used, it's likely to use some of the followingideas:

  • Ignoring "stop words" such as "a", "the", "and".
  • Stemming words, so that "pony" and "ponies" are considered similar.
  • Weighting words based on different criteria such as how frequently theyappear in the text, or the importance of the fields, such as the title orkeywords, that they appear in.
    There are many alternatives for using searching software, some of the mostprominent are Elastic and Solr. These are full document-based searchsolutions. To use them with data from Django models, you'll need a layer whichtranslates your data into a textual document, including back-references to thedatabase ids. When a search using the engine returns a certain document, youcan then look it up in the database. There are a variety of third-partylibraries which are designed to help with this process.

PostgreSQL support

PostgreSQL has its own full text search implementation built-in. While not aspowerful as some other search engines, it has the advantage of being insideyour database and so can easily be combined with other relational queries suchas categorization.

The django.contrib.postgres module provides some helpers to make thesequeries. For example, a simple query might be to select all the blog entrieswhich mention "cheese":

  1. >>> Entry.objects.filter(body_text__search='cheese')
  2. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

You can also filter on a combination of fields and on related models:

  1. >>> Entry.objects.annotate(
  2. ... search=SearchVector('blog__tagline', 'body_text'),
  3. ... ).filter(search='cheese')
  4. [
  5. <Entry: Cheese on Toast recipes>,
  6. <Entry: Pizza Recipes>,
  7. <Entry: Dairy farming in Argentina>,
  8. ]

See the contrib.postgres Full text search document forcomplete details.