Validating your Data and Structure

Fluent Bit is a powerful log processing tool that can deal with different sources and formats, in addition it provides several filters that can be used to perform custom modifications. This flexibility is really good but while your pipeline grows, it’s strongly recommended to validate your data and structure.

We encourage Fluent Bit users to integrate data validation in their CI systems

A simplified view of our data processing pipeline is as follows:

Validating your Data and Structure - 图1

In a normal production environment, many Inputs, Filters, and Outputs are defined in the configuration, so integrating a continuous validation of your configuration against expected results is a must. For this requirement, Fluent Bit provides a specific Filter called Expect which can be used to validate expected Keys and Values from your records and takes some action when an exception is found.

How it Works

As an example, consider the following pipeline where your source of data is a normal file with JSON content on it and then two filters: grep to exclude certain records and record_modifier to alter the record content adding and removing specific keys.

Validating your Data and Structure - 图2

Ideally you want to add checkpoints of validation of your data between each step so you can know if your data structure is correct, we do this by using expect filter.

Validating your Data and Structure - 图3

Expect filter sets rules that aims to validate certain criteria like:

  • does the record contain a key A ?
  • does the record not contains key A?
  • does the record key A value equals NULL ?
  • does the record key A value a different value than NULL ?
  • does the record key A value equals B ?

Every expect filter configuration can expose specific rules to validate the content of your records, it supports the following configuration properties:

Property Description
key_exists Check if a key with a given name exists in the record.
key_not_exists Check if a key does not exist in the record.
key_val_is_null check that the value of the key is NULL.
key_val_is_not_null check that the value of the key is NOT NULL.
key_val_eq check that the value of the key equals the given value in the configuration.
action action to take when a rule does not match. The available options are warn or exit. On warn, a warning message is sent to the logging layer when a mismatch of the rules above is found; using exit makes Fluent Bit abort with status code 255.

Start Testing

Consider the following JSON file called data.log with the following content:

  1. {"color": "blue", "label": {"name": null}}
  2. {"color": "red", "label": {"name": "abc"}, "meta": "data"}
  3. {"color": "green", "label": {"name": "abc"}, "meta": null}

The following Fluent Bit configuration file will configure a pipeline to consume the log above apply an expect filter to validate that keys color and label exists:

  1. [SERVICE]
  2. flush 1
  3. log_level info
  4. parsers_file parsers.conf
  5. [INPUT]
  6. name tail
  7. path ./data.log
  8. parser json
  9. exit_on_eof on
  10. # First 'expect' filter to validate that our data was structured properly
  11. [FILTER]
  12. name expect
  13. match *
  14. key_exists color
  15. key_exists $label['name']
  16. action exit
  17. [OUTPUT]
  18. name stdout
  19. match *

note that if for some reason the JSON parser failed or is missing in the tail input (line 9), the expect filter will trigger the exit action. As a test, go ahead and comment out or remove line 9.

As a second step, we will extend our pipeline and we will add a grep filter to match records that map label contains a key called name with value abc, then an expect filter to re-validate that condition:

  1. [SERVICE]
  2. flush 1
  3. log_level info
  4. parsers_file parsers.conf
  5. [INPUT]
  6. name tail
  7. path ./data.log
  8. parser json
  9. exit_on_eof on
  10. # First 'expect' filter to validate that our data was structured properly
  11. [FILTER]
  12. name expect
  13. match *
  14. key_exists color
  15. key_exists label
  16. action exit
  17. # Match records that only contains map 'label' with key 'name' = 'abc'
  18. [FILTER]
  19. name grep
  20. match *
  21. regex $label['name'] ^abc$
  22. # Check that every record contains 'label' with a non-null value
  23. [FILTER]
  24. name expect
  25. match *
  26. key_val_eq $label['name'] abc
  27. action exit
  28. # Append a new key to the record using an environment variable
  29. [FILTER]
  30. name record_modifier
  31. match *
  32. record hostname ${HOSTNAME}
  33. # Check that every record contains 'hostname' key
  34. [FILTER]
  35. name expect
  36. match *
  37. key_exists hostname
  38. action exit
  39. [OUTPUT]
  40. name stdout
  41. match *

Deploying in Production

When deploying your configuration in production, you might want to remove the expect filters from your configuration since it’s an unnecessary extra work unless you want to have a 100% coverage of checks at runtime.