Stream Processor

Fluent Bit v1.1 comes with a new and optional Stream Processor Engine that allows to do data processing through SQL queries. This article covers the format of the expected configuration file.

For more details about the Stream Processor Engine use please refer to the following guide:

Concepts

The stream processor can be configured through defining Tasks which have a name and an execution SQL statement:

Concept Description
Task Definition of a Stream Processor task to be executed. A task is defined through a section called STREAM_TASK.
Name Tasks have a name for debugging and testing purposes.
Exec SQL statement to be executed when a Task runs.

Streams File Configuration

The Stream Processor is configured through a streams file that is referenced from the main fluent-bit.conf configuration file through the Streams_File key. The content of the streams file must have the following format specified in the table below:

Section Key Description Mandatory?
STREAM_TASK
Name Set a name for the task in question. The value is used as a reference only. Yes
Exec SQL statement to be executed by the task. Note that the SQL statement must be finished with a semicolon. The SQL statement must be set in one single line (no multiline support in the configuration) Yes

Configuration Example

Consider the following fluent-bit.conf configuration file:

  1. [SERVICE]
  2. Flush 1
  3. Log_Level info
  4. Streams_File stream_processor.conf
  5. [INPUT]
  6. Name cpu
  7. alias cpu_data
  8. [OUTPUT]
  9. Name stdout
  10. Match results

Now creates a stream_processor.conf configuration file with the following content:

  1. [STREAM_TASK]
  2. Name cpu_test
  3. Exec CREATE STREAM cpu WITH (tag='results') AS SELECT AVG(cpu_p) from STREAM:cpu_data WINDOW TUMBLING (5 SECOND);

On the query there are a few things happening:

  • Fluent Bit will gather CPU usage metrics through CPU input plugin (metrics are calculated by default every second).
  • Stream Processor have a Task attached to any incoming Stream of data called cpu_data (check the alias set in the Input section).
  • Stream Processor will aggregate the value of cpu_p record field and calculate it average during a window of 5 seconds.
  • Stream Processor every 5 seconds will send the results back into Fluent Bit pipeline with a tag called results.
  • Fluent Bit output section will match results tagged records and print them to the standard output interface.

You should see the following output in your terminal:

  1. $ bin/fluent-bit -c fluent-bit.conf
  2. Fluent Bit v1.1.0
  3. Copyright (C) Treasure Data
  4. [2019/05/17 11:26:34] [ info] [storage] initializing...
  5. [2019/05/17 11:26:34] [ info] [storage] in-memory
  6. [2019/05/17 11:26:34] [ info] [storage] normal synchronization mode, checksum disabled
  7. [2019/05/17 11:26:34] [ info] [engine] started (pid=16769)
  8. [2019/05/17 11:26:34] [ info] [sp] stream processor started
  9. [2019/05/17 11:26:34] [ info] [sp] registered task: cpu_test
  10. [0] results: [1558085199.000175517, {"AVG(cpu_p)"=>2.750000}]
  11. [0] results: [1558085204.000151430, {"AVG(cpu_p)"=>3.400000}]
  12. [0] results: [1558085209.000131753, {"AVG(cpu_p)"=>1.700000}]
  13. [0] results: [1558085214.000147562, {"AVG(cpu_p)"=>3.500000}]
  14. [0] results: [1558085219.000118591, {"AVG(cpu_p)"=>2.050000}]
  15. [0] results: [1558085224.000179645, {"AVG(cpu_p)"=>26.375000}]

If you want to learn more about our Stream Processor engine please read the official guide.