Whitespace analyzer

The whitespace analyzer breaks text into terms whenever it encounters a whitespace character.

Example output

  1. POST _analyze
  2. {
  3. "analyzer": "whitespace",
  4. "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
  5. }

The above sentence would produce the following terms:

  1. [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]

Configuration

The whitespace analyzer is not configurable.

Definition

It consists of:

Tokenizer

If you need to customize the whitespace analyzer then you need to recreate it as a custom analyzer and modify it, usually by adding token filters. This would recreate the built-in whitespace analyzer and you can use it as a starting point for further customization:

  1. PUT /whitespace_example
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "rebuilt_whitespace": {
  7. "tokenizer": "whitespace",
  8. "filter": [
  9. ]
  10. }
  11. }
  12. }
  13. }
  14. }

You’d add any token filters here.