Pipelines

A detailed look at how to setup Promtail to process your log lines, includingextracting metrics and labels.

Pipeline

A pipeline is used to transform a single log line, its labels, and itstimestamp. A pipeline is comprised of a set of stages. There are 4 types ofstages:

  1. Parsing stages parse the current log line and extract data out of it. Theextracted data is then available for use by other stages.
  2. Transform stages transform extracted data from previous stages.
  3. Action stages take extracted data from previous stages and do somethingwith them. Actions can:
    1. Add or modify existing labels to the log line
    2. Change the timestamp of the log line
    3. Change the content of the log line
    4. Create a metric based on the extracted data
  4. Filtering stages optionally apply a subset of stages based on somecondition.

Typical pipelines will start with a parsing stage (such as aregex or json stage) to extract datafrom the log line. Then, a series of action stages will be present to dosomething with that extracted data. The most common action stage will be alabels stage to turn extracted data into a label.

A common stage will also be the match stage to selectivelyapply stages based on the current labels.

Note that pipelines can not currently be used to deduplicate logs; Loki willreceive the same log line multiple times if, for example:

  1. Two scrape configs read from the same file
  2. Duplicate log lines in a file are sent through a pipeline. Deduplication isnot done.

However, Loki will perform some deduplication at query time for logs that havethe exact same nanosecond timestamp, labels, and log contents.

This documented example gives a good glimpse of what you can achieve with apipeline:

  1. scrape_configs:
  2. - job_name: kubernetes-pods-name
  3. kubernetes_sd_configs: ....
  4. pipeline_stages:
  5. # This stage is only going to run if the scraped target has a label
  6. # of "name" with value "promtail".
  7. - match:
  8. selector: '{name="promtail"}'
  9. stages:
  10. # The regex stage parses out a level, timestamp, and component. At the end
  11. # of the stage, the values for level, timestamp, and component are only
  12. # set internally for the pipeline. Future stages can use these values and
  13. # decide what to do with them.
  14. - regex:
  15. expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)'
  16. # The labels stage takes the level and component entries from the previous
  17. # regex stage and promotes them to a label. For example, level=error may
  18. # be a label added by this stage.
  19. - labels:
  20. level:
  21. component:
  22. # Finally, the timestamp stage takes the timestamp extracted from the
  23. # regex stage and promotes it to be the new timestamp of the log entry,
  24. # parsing it as an RFC3339Nano-formatted value.
  25. - timestamp:
  26. format: RFC3339Nano
  27. source: timestamp
  28. # This stage is only going to run if the scraped target has a label of
  29. # "name" with a value of "nginx".
  30. - match:
  31. selector: '{name="nginx"}'
  32. stages:
  33. # This regex stage extracts a new output by matching against some
  34. # values and capturing the rest.
  35. - regex:
  36. expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P<output>.*)
  37. # The output stage changes the content of the captured log line by
  38. # setting it to the value of output from the regex stage.
  39. - output:
  40. source: output
  41. # This stage is only going to run if the scraped target has a label of
  42. # "name" with a value of "jaeger-agent".
  43. - match:
  44. selector: '{name="jaeger-agent"}'
  45. stages:
  46. # The JSON stage reads the log line as a JSON string and extracts
  47. # the "level" field from the object for use in further stages.
  48. - json:
  49. expressions:
  50. level: level
  51. # The labels stage pulls the value from "level" that was extracted
  52. # from the previous stage and promotes it to a label.
  53. - labels:
  54. level:
  55. - job_name: kubernetes-pods-app
  56. kubernetes_sd_configs: ....
  57. pipeline_stages:
  58. # This stage will only run if the scraped target has a label of "app"
  59. # with a name of *either* grafana or prometheus.
  60. - match:
  61. selector: '{app=~"grafana|prometheus"}'
  62. stages:
  63. # The regex stage will extract a level and component for use in further
  64. # stages, allowing the level to be defined as either lvl=<level> or
  65. # level=<level> and the component to be defined as either
  66. # logger=<component> or component=<component>
  67. - regex:
  68. expression: ".*(lvl|level)=(?P<level>[a-zA-Z]+).*(logger|component)=(?P<component>[a-zA-Z]+)"
  69. # The labels stage then promotes the level and component extracted from
  70. # the regex stage to labels.
  71. - labels:
  72. level:
  73. component:
  74. # This stage will only run if the scraped target has a label of "app"
  75. # and a value of "some-app".
  76. - match:
  77. selector: '{app="some-app"}'
  78. stages:
  79. # The regex stage tries to extract a Go panic by looking for panic:
  80. # in the log message.
  81. - regex:
  82. expression: ".*(?P<panic>panic: .*)"
  83. # The metrics stage is going to increment a panic_total metric counter
  84. # which Promtail exposes. The counter is only incremented when panic
  85. # was extracted from the regex stage.
  86. - metrics:
  87. - panic_total:
  88. type: Counter
  89. description: "total count of panic"
  90. source: panic
  91. config:
  92. action: inc

Data Accessible to Stages

The following sections further describe the types that are accessible to eachstage (although not all may be used):

Label Set

The current set of labels for the log line. Initialized to be the set of labelsthat were scraped along with the log line. The label set is only modified by anaction stage, but filtering stages read from it.

The final label set will be index by Loki and can be used for queries.

Extracted Map

A collection of key-value pairs extracted during a parsing stage. Subsequentstages operate on the extracted map, either transforming them or taking actionwith them. At the end of a pipeline, the extracted map is discarded; for aparsing stage to be useful, it must always be paired with at least one actionstage.

The extracted map is initialized with the same set of initial labels that werescraped along with the log line. This initial data allows for taking action onthe values of labels inside pipeline stages that only manipulate the extractedmap. For example, log entries tailed from files have the label filename whosevalue is the file path that was tailed. When a pipeline executes for that logentry, the initial extracted map would contain filename using the same valueas the label.

Log Timestamp

The current timestamp for the log line. Action stages can modify this value.If left unset, it defaults to the time when the log was scraped.

The final value for the timestamp is sent to Loki.

Log Line

The current log line, represented as text. Initialized to be the text thatPromtail scraped. Action stages can modify this value.

The final value for the log line is sent to Loki as the text content for thegiven log entry.

Stages

Parsing stages:

  • docker: Extract data by parsing the log line using the standard Docker format.
  • cri: Extract data by parsing the log line using the standard CRI format.
  • regex: Extract data using a regular expression.
  • json: Extract data by parsing the log line as JSON.

Transform stages:

  • template: Use Go templates to modify extracted data.

Action stages:

  • timestamp: Set the timestamp value for the log entry.
  • output: Set the log line text.
  • labels: Update the label set for the log entry.
  • metrics: Calculate metrics based on extracted data.

Filtering stages:

  • match: Conditionally run stages based on the label set.