file

file source is used for log collection.

Example

  1. sources:
  2. - type: file
  3. name: accesslog

Tips

If you use logconfig/clusterlogconfig to collect container logs, additional fields are added to the file source, please refer to here.

paths

fieldtyperequireddefaultdescription
pathsstring arraytruenoneThe collected paths are matched using glob expressions. Support glob expansion expressions Brace Expansion and Glob Star

Example

Object files to be collected:

  1. /tmp/loggie/service/order/access.log
  2. /tmp/loggie/service/order/access.log.2022-04-11
  3. /tmp/loggie/service/pay/access.log
  4. /tmp/loggie/service/pay/access.log.2022-04-11

Corresponding configuration:

  1. sources:
  2. - type: file
  3. paths:
  4. - /tmp/loggie/**/access.log{,.[2-9][0-9][0-9][0-9]-[01][0-9]-[0123][0-9]}

excludeFiles

fieldtyperequireddefaultdescription
excludeFilesstring arrayfalsenoneExclude collected files regular expression

Example

  1. sources:
  2. - type: file
  3. paths:
  4. - /tmp/*.log
  5. excludeFiles:
  6. - \.gz$

ignoreOlder

fieldtyperequireddefaultdescription
ignoreOldertime.Durationfalsenonefor example, 48h, which means to ignore files whose update time is 2 days ago
fieldtyperequireddefaultdescription
ignoreSymlinkboolfalsefalsewhether to ignore symbolic links (soft links) files

addonMeta

fieldtyperequireddefaultdescription
addonMetaboolfalsefalsewhether to add the default log collection state meta information

event example

  1. {
  2. "body": "this is test",
  3. "state": {
  4. "pipeline": "local",
  5. "source": "demo",
  6. "filename": "/var/log/a.log",
  7. "timestamp": "2006-01-02T15:04:05.000Z",
  8. "offset": 1024,
  9. "bytes": 4096,
  10. "hostname": "node-1"
  11. }
  12. }

state explanation:

  • pipeline: the name of the pipeline where it is located
  • source: the name of the source where it is located
  • filename: the name of the collected file
  • timestamp: the timestamp of the collection time
  • offset: the offset of the collected data in the file
  • bytes: the number of bytes of data collected
  • hostname: the name of the node where it is located

workerCount

fieldtyperequireddefaultdescription
workerCountintfalse1The number of worker threads (goroutines) that read the contents of the file. Consider increasing it when there are more than 100 files on a single node

readBufferSize

fieldtyperequireddefaultdescription
readBufferSizeintfalse65536The amount of data to read from the file at a time. Default 64K=65536

maxContinueRead

fieldtyperequireddefaultdescription
maxContinueReadintfalse16The number of times the content of the same file is read continuously. Reaching this number of times cause forced switch to the next file to read. The main function is to prevent active files from occupying reading resources all the time, in which case inactive files cannot be read and collected for a long time.

maxContinueReadTimeout

fieldtyperequireddefaultdescription
maxContinueReadTimeouttime.Durationfalse3sThe maximum reading time of the same file. If this time is exceeded, the next file will be forced to be read. Similar to maxContinueRead

inactiveTimeout

fieldtyperequireddefaultdescription
inactiveTimeouttime.Durationfalse3sIf the file has exceeded inactiveTimeout from the last collection, it is considered that the file has entered an inactive state (that is, the last log has been written), and that the last line of log can be collected safely.

firstNBytesForIdentifier

fieldtyperequireddefaultdescription
firstNBytesForIdentifierintfalse128Use the first n characters of the collected target file to generate the file unique code. If the size of the file is less than n, the file will not be collected temporarily. The main purpose is to accurately identify a file in combination with file inode information and to determine whether the file is deleted or renamed.

charset

Encoding conversion, used to convert different encodings to utf8.

Example

  1. sources:
  2. - type: file
  3. name: demo
  4. paths:
  5. - /tmp/log/*.log
  6. fields:
  7. topic: "loggie"
  8. charset: "gbk"
fieldtyperequireddefaultdescription
charsetstringfalseutf-8Matching model for extracted fields

The currently supported encoding formats for converting to utf-8 are:

  • nop
  • plain
  • utf-8
  • gbk
  • big5
  • euc-jp
  • iso2022-jp
  • shift-jis
  • euc-kr
  • iso8859-6e
  • iso8859-6i
  • iso8859-8e
  • iso8859-8i
  • iso8859-1
  • iso8859-2
  • iso8859-3
  • iso8859-4
  • iso8859-5
  • iso8859-6
  • iso8859-7
  • iso8859-8
  • iso8859-9
  • iso8859-10
  • iso8859-13
  • iso8859-14
  • iso8859-15
  • iso8859-16
  • cp437
  • cp850
  • cp852
  • cp855
  • cp858
  • cp860
  • cp862
  • cp863
  • cp865
  • cp866
  • ebcdic-037
  • ebcdic-1040
  • ebcdic-1047
  • koi8r
  • koi8u
  • macintosh
  • macintosh-cyrillic
  • windows1250
  • windows1251
  • windows1252
  • windows1253
  • windows1254
  • windows1255
  • windows1256
  • windows1257
  • windows1258
  • windows874
  • utf-16be-bom
  • utf-16le-bom

lineDelimiter

Newline symbol configuration

Example

  1. sources:
  2. - type: file
  3. name: demo
  4. lineDelimiter:
  5. type: carriage_return_line_feed
  6. value: "\r\n"
  7. charset: gbk

type

fieldtyperequireddefaultdescription
typeboolfalseautovalue is only valid when type is custom

Currently supported types are:

  • auto
  • line_feed
  • vertical_tab
  • form_feed
  • carriage_return
  • carriage_return_line_feed
  • next_line
  • line_separator
  • paragraph_separator
  • null_terminator

The corresponding newline symbols are:

  1. auto: {'\u000A'},
  2. line_feed: {'\u000A'},
  3. vertical_tab: {'\u000B'},
  4. form_feed: {'\u000C'},
  5. carriage_return: {'\u000D'},
  6. carriage_return_line_feed: []byte("\u000D\u000A"),
  7. next_line: {'\u0085'},
  8. line_separator: []byte("\u2028"),
  9. paragraph_separator: []byte("\u2029"),
  10. null_terminator: {'\u0000'},
  11. ```
  1. ### value
  2. <table><thead><tr><th><code>field</code></th><th><code>type</code></th><th><code>required</code></th><th><code>default</code></th><th><code>description</code></th></tr></thead><tbody><tr><td>value</td><td>string</td><td>false</td><td>\n</td><td>newline symbol</td></tr></tbody></table>
  3. ### charset
  4. <table><thead><tr><th><code>field</code></th><th><code>type</code></th><th><code>required</code></th><th><code>default</code></th><th><code>description</code></th></tr></thead><tbody><tr><td>charset</td><td>string</td><td>false</td><td>utf-8</td><td>newline symbol encoding</td></tr></tbody></table>
  5. ## multi
  6. Multi-line collection configuration
  7. Example

sources:

  • type: file name: accesslog multi: active: true ```

active

fieldtyperequireddefaultdescription
activeboolfalsefalsewhether to enable multi-line

pattern

fieldtyperequireddefaultdescription
patternstringrequired when multi.active=truefalseA regular expression that is used to judge whether a line is a brand new log. For example, if it is configured as ‘^[‘, it is considered that a line beginning with [ is a new log, otherwise the content of this line is merged into the previous log as part of the previous log.

maxLines

fieldtyperequireddefaultdescription
maxLinesintfalse500Number of lines a log can contains at most. The default is 500 lines. If the upper limit is exceeded, the current log will be forced to be sent, and the excess will be used as a new log.

maxBytes

fieldtyperequireddefaultdescription
maxBytesint64false131072Number of bytes a log can contains at most. The default is 128K. If the upper limit is exceeded, the current log will be forced to be sent, and the excess will be used as a new log.

timeout

fieldtyperequireddefaultdescription
timeouttime.Durationfalse5sHow long to wait for a log to be collected as a complete log. The default is 5s. If the upper limit is exceeded, the current log will be sent, and the excess will be used as a new log.

ack

Configuration related to the confirmation of the source. If you need to make sure at least once, you need to turn on the ack mechanism, but there will be a certain performance loss.

Caution

This configuration can only be configured in defaults

Example

  1. defaults:
  2. sources:
  3. - type: file
  4. ack:
  5. enable: true

enable

fieldtyperequireddefaultdescription
enableboolfalsetrueWhether to enable confirmation

maintenanceInterval

fieldtyperequireddefaultdescription
maintenanceIntervaltime.Durationfalse20hmaintenance cycle. Used to regularly clean up expired confirmation data (such as the ack information of files that are no longer collected)

db

Use sqlite3 as database. Save the file name, file inode, offset of file collection and other information during the collection process. Used to restore the last collection progress after logie reload or restart.

Caution

This configuration can only be configured in defaults.

Example

  1. defaults:
  2. sources:
  3. - type: file
  4. db:
  5. file: "./data/loggie.db"

file

fieldtyperequireddefaultdescription
filestringfalse./data/loggie.dbdatabase file path

tableName

fieldtyperequireddefaultdescription
tableNamestringfalseregistrydatabase table name

flushTimeout

fieldtyperequireddefaultdescription
flushTimeouttime.Durationfalse2swrite the collected information to the database regularly

bufferSize

fieldtyperequireddefaultdescription
bufferSizeintfalse2048The buffer size of the collection information written into the database

cleanInactiveTimeout

fieldtyperequireddefaultdescription
cleanInactiveTimeouttime.Durationfalse504hClean up outdated data in the database. If the update time of the data exceeds the configured value, the data will be deleted. 21 days by default.

cleanScanInterval

fieldtyperequireddefaultdescription
cleanScanIntervaltime.Durationfalse1hPeriodically check the database for outdated data. Check every 1 hour by default

watcher

Configuration for monitoring file changes

Caution

This configuration can only be configured in defaults

Example

  1. defaults:
  2. sources:
  3. - type: file
  4. watcher:
  5. enableOsWatch: true

enableOsWatch

fieldtyperequireddefaultdescription
enableOsWatchboolfalsetrueWhether to enable the monitoring notification mechanism of the OS. For example, inotify of linux

scanTimeInterval

fieldtyperequireddefaultdescription
scanTimeIntervaltime.Durationfalse10sPeriodically check file status changes (such as file creation, deletion, etc.). Check every 10s by default

maintenanceInterval

fieldtyperequireddefaultdescription
maintenanceIntervaltime.Durationfalse5mPeriodic maintenance work (such as reporting and collecting statistics, cleaning files, etc.)

fdHoldTimeoutWhenInactive

fieldtyperequireddefaultdescription
fdHoldTimeoutWhenInactivetime.Durationfalse5mWhen the time from the last collection of the file to the present exceeds the limit (the file has not been written for a long time, it is considered that there is a high probability that the content will not be written again), the handle of the file will be released to release system resources

fdHoldTimeoutWhenRemove

fieldtyperequireddefaultdescription
fdHoldTimeoutWhenRemovetime.Durationfalse5mWhen the file is deleted and the collection is not completed, it will wait for the maximum time to complete the collection. If the limit is exceeded, no matter whether the file is finally collected or not, the handle will be released directly and no longer collected.

maxOpenFds

fieldtyperequireddefaultdescription
maxOpenFdsintfalse512The maximum number of open file handles. If the limit is exceeded, the files will not be collected temporarily

maxEofCount

fieldtyperequireddefaultdescription
maxEofCountintfalse3The maximum number of times EoF is encountered in consecutive reads of a file. If the limit is exceeded, it is considered that the file is temporarily inactive and will enter the “zombie” queue to wait for the update event to be activated.

cleanWhenRemoved

fieldtyperequireddefaultdescription
cleanWhenRemovedboolfalsetrueWhen the file is deleted, whether to delete the collection-related information in the db synchronously.

readFromTail

fieldtyperequireddefaultdescription
readFromTailboolfalsefalseWhether to start collecting from the latest line of the file, regardless of writing history. It is suitable for scenarios such as migration of collection systems.

taskStopTimeout

fieldtyperequireddefaultdescription
taskStopTimeouttime.Durationfalse30sThe timeout period for the collection task to exit. It is a bottom-up solution when Loggie cannot be reloaded.

cleanFiles

File clearing related configuration. Expired and collected files will be deleted directly from the disk to free up disk space.

maxHistoryDays

fieldtyperequireddefaultdescription
maxHistoryDaysintfalsenoneMaximum number of days to keep files (after collection). If the limit is exceeded, the file will be deleted directly from the disk. If not configured, the file will never be deleted