normalizer

The normalizer property of keyword fields is similar to analyzer except that it guarantees that the analysis chain produces a single token.

The normalizer is applied prior to indexing the keyword, as well as at search-time when the keyword field is searched via a query parser such as the match query or via a term-level query such as the term query.

A simple normalizer called lowercase ships with elasticsearch and can be used. Custom normalizers can be defined as part of analysis settings as follows.

  1. PUT index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "normalizer": {
  6. "my_normalizer": {
  7. "type": "custom",
  8. "char_filter": [],
  9. "filter": ["lowercase", "asciifolding"]
  10. }
  11. }
  12. }
  13. },
  14. "mappings": {
  15. "properties": {
  16. "foo": {
  17. "type": "keyword",
  18. "normalizer": "my_normalizer"
  19. }
  20. }
  21. }
  22. }
  23. PUT index/_doc/1
  24. {
  25. "foo": "BÀR"
  26. }
  27. PUT index/_doc/2
  28. {
  29. "foo": "bar"
  30. }
  31. PUT index/_doc/3
  32. {
  33. "foo": "baz"
  34. }
  35. POST index/_refresh
  36. GET index/_search
  37. {
  38. "query": {
  39. "term": {
  40. "foo": "BAR"
  41. }
  42. }
  43. }
  44. GET index/_search
  45. {
  46. "query": {
  47. "match": {
  48. "foo": "BAR"
  49. }
  50. }
  51. }

The above queries match documents 1 and 2 since BÀR is converted to bar at both index and query time.

  1. {
  2. "took": $body.took,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped" : 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total" : {
  12. "value": 2,
  13. "relation": "eq"
  14. },
  15. "max_score": 0.4700036,
  16. "hits": [
  17. {
  18. "_index": "index",
  19. "_type": "_doc",
  20. "_id": "1",
  21. "_score": 0.4700036,
  22. "_source": {
  23. "foo": "BÀR"
  24. }
  25. },
  26. {
  27. "_index": "index",
  28. "_type": "_doc",
  29. "_id": "2",
  30. "_score": 0.4700036,
  31. "_source": {
  32. "foo": "bar"
  33. }
  34. }
  35. ]
  36. }
  37. }

Also, the fact that keywords are converted prior to indexing also means that aggregations return normalized values:

  1. GET index/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "foo_terms": {
  6. "terms": {
  7. "field": "foo"
  8. }
  9. }
  10. }
  11. }

returns

  1. {
  2. "took": 43,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped" : 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total" : {
  12. "value": 3,
  13. "relation": "eq"
  14. },
  15. "max_score": null,
  16. "hits": []
  17. },
  18. "aggregations": {
  19. "foo_terms": {
  20. "doc_count_error_upper_bound": 0,
  21. "sum_other_doc_count": 0,
  22. "buckets": [
  23. {
  24. "key": "bar",
  25. "doc_count": 2
  26. },
  27. {
  28. "key": "baz",
  29. "doc_count": 1
  30. }
  31. ]
  32. }
  33. }
  34. }