全文搜索

django.contrib.postgres.search 模块中的数据库函数方便了 PostgreSQL 的 全文搜索引擎 的使用。

在本文档的例子中,我们将使用 执行查询 中定义的模型。

参见

有关搜索的高级概述,请参见 主题文档

search 查找

A common way to use full text search is to search a single term against a single column in the database. For example:

  1. >>> Entry.objects.filter(body_text__search="Cheese")
  2. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

这将使用默认的数据库搜索配置,从 body_text 字段在数据库中创建一个 to_tsvector,从搜索词 'Cheese' 中创建一个 plainto_tsquery。通过匹配查询和向量得到结果。

要使用 search 查找,'django.contrib.postgres' 必须在你的 INSTALLED_APPS

SearchVector

class SearchVector(*expressions, config=None, weight=None)

Searching against a single field is great but rather limiting. The Entry instances we’re searching belong to a Blog, which has a tagline field. To query against both fields, use a SearchVector:

  1. >>> from django.contrib.postgres.search import SearchVector
  2. >>> Entry.objects.annotate(
  3. ... search=SearchVector("body_text", "blog__tagline"),
  4. ... ).filter(search="Cheese")
  5. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

SearchVector 的参数可以是任何 Expression 或字段名。多个参数将使用空格连接在一起,这样搜索文档就会包含所有参数。

SearchVector objects can be combined together, allowing you to reuse them. For example:

  1. >>> Entry.objects.annotate(
  2. ... search=SearchVector("body_text") + SearchVector("blog__tagline"),
  3. ... ).filter(search="Cheese")
  4. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

关于 configweight 参数的解释,请参见 更改搜索配置加权查询

SearchQuery

class SearchQuery(value, config=None, search_type=’plain’)

SearchQuery 将用户提供的术语转化为搜索查询对象,数据库将其与搜索向量进行比较。默认情况下,用户提供的所有词语都会通过词干算法,然后寻找所有结果词语的匹配。

如果 search_type'plain',即默认值,则将术语作为单独的关键字处理。如果 search_type'phrase',则将术语作为一个单一的短语处理。如果 search_type'raw',那么你可以提供一个带有术语和运算符的格式化搜索查询。如果 search_type'websearch',那么你可以提供一个格式化的搜索查询,类似于网络搜索引擎使用的格式。'websearch' 需要 PostgreSQL ≥ 11。请阅读 PostgreSQL 的 全文搜索文档 来了解两者的区别和语法。举例说明。

  1. >>> from django.contrib.postgres.search import SearchQuery
  2. >>> SearchQuery("red tomato") # two keywords
  3. >>> SearchQuery("tomato red") # same results as above
  4. >>> SearchQuery("red tomato", search_type="phrase") # a phrase
  5. >>> SearchQuery("tomato red", search_type="phrase") # a different phrase
  6. >>> SearchQuery("'tomato' & ('red' | 'green')", search_type="raw") # boolean operators
  7. >>> SearchQuery(
  8. ... "'tomato' ('red' OR 'green')", search_type="websearch"
  9. ... ) # websearch operators

SearchQuery terms can be combined logically to provide more flexibility:

  1. >>> from django.contrib.postgres.search import SearchQuery
  2. >>> SearchQuery("meat") & SearchQuery("cheese") # AND
  3. >>> SearchQuery("meat") | SearchQuery("cheese") # OR
  4. >>> ~SearchQuery("meat") # NOT

参见 更改搜索配置config 参数的解释。

SearchRank

class SearchRank(vector, query, weights=None, normalization=None, cover_density=False)

So far, we’ve returned the results for which any match between the vector and the query are possible. It’s likely you may wish to order the results by some sort of relevancy. PostgreSQL provides a ranking function which takes into account how often the query terms appear in the document, how close together the terms are in the document, and how important the part of the document is where they occur. The better the match, the higher the value of the rank. To order by relevancy:

  1. >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
  2. >>> vector = SearchVector("body_text")
  3. >>> query = SearchQuery("cheese")
  4. >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by("-rank")
  5. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

参见 加权查询 关于 weights 参数的解释。

cover_density 参数设置为 True,启用覆盖密度排序,即考虑匹配的查询词的接近程度。

Provide an integer to the normalization parameter to control rank normalization. This integer is a bit mask, so you can combine multiple behaviors:

  1. >>> from django.db.models import Value
  2. >>> Entry.objects.annotate(
  3. ... rank=SearchRank(
  4. ... vector,
  5. ... query,
  6. ... normalization=Value(2).bitor(Value(4)),
  7. ... )
  8. ... )

PostgreSQL 文档中有更多关于 不同排序归一化选项 的细节。

SearchHeadline

class SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)

接受一个文本字段或一个表达式、一个查询、一个配置和一组选项。返回高亮显示的搜索结果。

start_selstop_sel 参数设置为字符串值,用于在文档中高亮显示查询词。PostgreSQL 的默认值是 <b></b>

max_wordsmin_words 参数提供整数值,以确定最长和最短的标题。PostgreSQL 的默认值是 35 和 15。

short_word 参数提供一个整数值,以便在每个标题中丢弃这个长度或更少的字。PostgreSQL 的默认值是 3。

highlight_all 参数设置为 True,以使用整个文档来代替片段,并忽略 max_wordsmin_wordsshort_word 参数。这在 PostgreSQL 中是默认禁用的。

max_fragments 提供一个非零的整数值,以设置要显示的最大片段数。在 PostgreSQL 中默认是禁用的。

设置 fragment_delimiter 字符串参数来配置片段之间的定界符。PostgreSQL 的默认值是 " ... "

PostgreSQL 文档中有更多关于 高亮搜索结果 的细节。

Usage example:

  1. >>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
  2. >>> query = SearchQuery("red tomato")
  3. >>> entry = Entry.objects.annotate(
  4. ... headline=SearchHeadline(
  5. ... "body_text",
  6. ... query,
  7. ... start_sel="<span>",
  8. ... stop_sel="</span>",
  9. ... ),
  10. ... ).get()
  11. >>> print(entry.headline)
  12. Sandwich with <span>tomato</span> and <span>red</span> cheese.

参见 更改搜索配置config 参数的解释。

更改搜索配置

You can specify the config attribute to a SearchVector and SearchQuery to use a different search configuration. This allows using different language parsers and dictionaries as defined by the database:

  1. >>> from django.contrib.postgres.search import SearchQuery, SearchVector
  2. >>> Entry.objects.annotate(
  3. ... search=SearchVector("body_text", config="french"),
  4. ... ).filter(search=SearchQuery("œuf", config="french"))
  5. [<Entry: Pain perdu>]

The value of config could also be stored in another column:

  1. >>> from django.db.models import F
  2. >>> Entry.objects.annotate(
  3. ... search=SearchVector("body_text", config=F("blog__language")),
  4. ... ).filter(search=SearchQuery("œuf", config=F("blog__language")))
  5. [<Entry: Pain perdu>]

加权查询

Every field may not have the same relevance in a query, so you can set weights of various vectors before you combine them:

  1. >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
  2. >>> vector = SearchVector("body_text", weight="A") + SearchVector(
  3. ... "blog__tagline", weight="B"
  4. ... )
  5. >>> query = SearchQuery("cheese")
  6. >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by(
  7. ... "rank"
  8. ... )

The weight should be one of the following letters: D, C, B, A. By default, these weights refer to the numbers 0.1, 0.2, 0.4, and 1.0, respectively. If you wish to weight them differently, pass a list of four floats to SearchRank as weights in the same order above:

  1. >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
  2. >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by("-rank")

性能

使用这些函数都不需要特殊的数据库配置,但是,如果你搜索的记录超过几百条,你很可能会遇到性能问题。例如,全文搜索是一个比比较整数大小更密集的过程。

In the event that all the fields you’re querying on are contained within one particular model, you can create a functional GIN or GiST index which matches the search vector you wish to use. For example:

  1. GinIndex(
  2. SearchVector("body_text", "headline", config="english"),
  3. name="search_vector_idx",
  4. )

The PostgreSQL documentation has details on creating indexes for full text search.

SearchVectorField

class SearchVectorField

If this approach becomes too slow, you can add a SearchVectorField to your model. You’ll need to keep it populated with triggers, for example, as described in the PostgreSQL documentation. You can then query the field as if it were an annotated SearchVector:

  1. >>> Entry.objects.update(search_vector=SearchVector("body_text"))
  2. >>> Entry.objects.filter(search_vector="cheese")
  3. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

三元相似度

Another approach to searching is trigram similarity. A trigram is a group of three consecutive characters. In addition to the trigram_similar, trigram_word_similar, and trigram_strict_word_similar lookups, you can use a couple of other expressions.

要使用它们,你需要激活 PostgreSQL 上的 pg_trgm 扩展 。你可以使用 TrigramExtension 迁移操作来安装它。

TrigramSimilarity

class TrigramSimilarity(expression, string, **extra)

接受一个字段名或表达式,以及一个字符串或表达式。返回两个参数之间的三元相似度。

Usage example:

  1. >>> from django.contrib.postgres.search import TrigramSimilarity
  2. >>> Author.objects.create(name="Katy Stevens")
  3. >>> Author.objects.create(name="Stephen Keats")
  4. >>> test = "Katie Stephens"
  5. >>> Author.objects.annotate(
  6. ... similarity=TrigramSimilarity("name", test),
  7. ... ).filter(
  8. ... similarity__gt=0.3
  9. ... ).order_by("-similarity")
  10. [<Author: Katy Stevens>, <Author: Stephen Keats>]

TrigramWordSimilarity

class TrigramWordSimilarity(string, expression, **extra)

接受一个字符串或表达式,以及一个字段名或表达式。返回两个参数之间的三元相似度。

Usage example:

  1. >>> from django.contrib.postgres.search import TrigramWordSimilarity
  2. >>> Author.objects.create(name="Katy Stevens")
  3. >>> Author.objects.create(name="Stephen Keats")
  4. >>> test = "Kat"
  5. >>> Author.objects.annotate(
  6. ... similarity=TrigramWordSimilarity(test, "name"),
  7. ... ).filter(
  8. ... similarity__gt=0.3
  9. ... ).order_by("-similarity")
  10. [<Author: Katy Stevens>]

TrigramStrictWordSimilarity

class TrigramStrictWordSimilarity(string, expression, **extra)

New in Django 4.2.

Accepts a string or expression, and a field name or expression. Returns the trigram strict word similarity between the two arguments. Similar to TrigramWordSimilarity(), except that it forces extent boundaries to match word boundaries.

TrigramDistance

class TrigramDistance(expression, string, **extra)

接受一个字段名或表达式,以及一个字符串或表达式。返回两个参数之间的三元距离。

Usage example:

  1. >>> from django.contrib.postgres.search import TrigramDistance
  2. >>> Author.objects.create(name="Katy Stevens")
  3. >>> Author.objects.create(name="Stephen Keats")
  4. >>> test = "Katie Stephens"
  5. >>> Author.objects.annotate(
  6. ... distance=TrigramDistance("name", test),
  7. ... ).filter(
  8. ... distance__lte=0.7
  9. ... ).order_by("distance")
  10. [<Author: Katy Stevens>, <Author: Stephen Keats>]

TrigramWordDistance

class TrigramWordDistance(string, expression, **extra)

接受一个字符串或表达式,以及一个字段名或表达式。返回两个参数之间的三元字距离。

Usage example:

  1. >>> from django.contrib.postgres.search import TrigramWordDistance
  2. >>> Author.objects.create(name="Katy Stevens")
  3. >>> Author.objects.create(name="Stephen Keats")
  4. >>> test = "Kat"
  5. >>> Author.objects.annotate(
  6. ... distance=TrigramWordDistance(test, "name"),
  7. ... ).filter(
  8. ... distance__lte=0.7
  9. ... ).order_by("distance")
  10. [<Author: Katy Stevens>]

TrigramStrictWordDistance

class TrigramStrictWordDistance(string, expression, **extra)

New in Django 4.2.

Accepts a string or expression, and a field name or expression. Returns the trigram strict word distance between the two arguments.