Monitor and Log Tools

System Monitor

Currently, IoTDB provides users to use Java’s JConsole tool to monitor system status or use IoTDB’s open API to check data status.

System Status Monitoring

After starting JConsole tool and connecting to IoTDB server, you will have a basic look at IoTDB system status(CPU Occupation, in-memory information, etc.). See official documentationMonitor and Log Tools - 图1 (opens new window) for more informations.

JMX MBean Monitoring

By using JConsole tool and connecting with JMX you can see some system statistics and parameters. This section describes how to use the JConsole Mbean tab to monitor the number of files opened by the IoTDB service process, the size of the data file, and so on. Once connected to JMX, you can find the MBean named org.apache.iotdb.service through the MBeans tab, as shown in the following Figure.

Monitor and Log Tools - 图2

There are several attributes under Monitor, including the numbers of files opened in different folders, the data file size statistics and the values of some system parameters. By double-clicking the value corresponding to an attribute it can also display a line chart of that attribute. In particular, all the opened file count statistics are currently only supported on MacOS and most Linux distro except CentOS. For the OS not supported these statistics will return -2. See the following section for specific introduction of the Monitor attributes.

MBean Monitor Attributes List
  • DataSizeInByte
NameDataSizeInByte
DescriptionThe total size of data file.
UnitByte
TypeLong
  • FileNodeNum
NameFileNodeNum
DescriptionThe count number of FileNode. (Currently not supported)
TypeLong
  • OverflowCacheSize
NameOverflowCacheSize
DescriptionThe size of out-of-order data cache. (Currently not supported)
UnitByte
TypeLong
  • BufferWriteCacheSize
NameBufferWriteCacheSize
DescriptionThe size of BufferWriter cache. (Currently not supported)
UnitByte
TypeLong
  • BaseDirectory
NameBaseDirectory
DescriptionThe absolute directory of data file.
TypeString
  • WriteAheadLogStatus
NameWriteAheadLogStatus
DescriptionThe status of write-ahead-log (WAL). True means WAL is enabled.
TypeBoolean
  • TotalOpenFileNum
NameTotalOpenFileNum
DescriptionAll the opened file number of IoTDB server process.
TypeInt
  • DeltaOpenFileNum
NameDeltaOpenFileNum
DescriptionThe opened TsFile file number of IoTDB server process.
Default Directory/data/data/settled
TypeInt
  • WalOpenFileNum
NameWalOpenFileNum
DescriptionThe opened write-ahead-log file number of IoTDB server process.
Default Directory/data/wal
TypeInt
  • MetadataOpenFileNum
NameMetadataOpenFileNum
DescriptionThe opened meta-data file number of IoTDB server process.
Default Directory/data/system/schema
TypeInt
  • DigestOpenFileNum
NameDigestOpenFileNum
DescriptionThe opened info file number of IoTDB server process.
Default Directory/data/system/info
TypeInt
  • SocketOpenFileNum
NameSocketOpenFileNum
DescriptionThe Socket link (TCP or UDP) number of the operation system.
TypeInt
  • MergePeriodInSecond
NameMergePeriodInSecond
DescriptionThe interval at which the IoTDB service process periodically triggers the merge process.
UnitSecond
TypeLong
  • ClosePeriodInSecond
NameClosePeriodInSecond
DescriptionThe interval at which the IoTDB service process periodically flushes memory data to disk.
UnitSecond
TypeLong

Data Status Monitoring

This module is the statistical monitoring method provided by IoTDB for users to store data information. We will record the statistical data in the system and store it in the database. The current 0.8.0 version of IoTDB provides statistics for writing data.

The user can choose to enable or disable the data statistics monitoring function (set the enable_stat_monitor item in the configuration file).

Writing Data Monitor

The current statistics of writing data by the system can be divided into two major modules: Global Writing Data Statistics and Storage Group Writing Data Statistics. Global Writing Data Statistics records the point number written by the user and the number of requests. Storage Group Writing Data Statistics records data of a certain storage group.

The system defaults to collect data every 5 seconds, and writes the statistics to the IoTDB and stores them in a system-specified locate. (If you need to change the statistic frequency, you can set The back_loop_period_in_second entry in the configuration file, see Section Engine Layer for details). After the system is refreshed or restarted, IoTDB does not recover the statistics, and the statistics data will restart from zero.

In order to avoid the excessive use of statistical information, we add a mechanism to periodically clear invalid data for statistical information. The system will delete invalid data at regular intervals. The user can set the trigger frequency (stat_monitor_retain_interval_in_second, default is 600s, see section Engine Layer for details) to set the frequency of deleting data. By setting the valid data duration (stat_monitor_detect_freq_in_second entry, the default is 600s, see section Engine Layer for details) to set the time period of valid data, that is, the data within the time of the clear operation trigger time is stat_monitor_detect_freq_in_second is valid data. In order to ensure the stability of the system, it is not allowed to delete the statistics frequently. Therefore, if the configuration parameter time is less than the default value (600s), the system will abort the configuration parameter and uses the default parameter.

It’s convenient for you to use select clause to get the writing data statistics the same as other timeseires.

Here are the writing data statistics:

  • TOTAL_POINTS (GLOABAL)
NameTOTAL_POINTS
DescriptionCalculate the global writing points number.
TypeWriting data statistics
Timeseries Nameroot.stats.write.global.TOTAL_POINTS
Reset After Restarting Systemyes
Exampleselect TOTAL_POINTS from root.stats.write.global
  • TOTAL_REQ_SUCCESS (GLOABAL)
NameTOTAL_REQ_SUCCESS
DescriptionCalculate the global successful requests number.
TypeWriting data statistics
Timeseries Nameroot.stats.write.global.TOTAL_REQ_SUCCESS
Reset After Restarting Systemyes
Exampleselect TOTAL_REQ_SUCCESS from root.stats.write.global
  • TOTAL_REQ_FAIL (GLOABAL)
NameTOTAL_REQ_FAIL
DescriptionCalculate the global failed requests number.
TypeWriting data statistics
Timeseries Nameroot.stats.write.global.TOTAL_REQ_FAIL
Reset After Restarting Systemyes
Exampleselect TOTAL_REQ_FAIL from root.stats.write.global
  • TOTAL_POINTS_FAIL (GLOABAL)
NameTOTAL_POINTS_FAIL
DescriptionCalculate the global failed writing points number.
TypeWriting data statistics
Timeseries Nameroot.stats.write.global.TOTAL_POINTS_FAIL
Reset After Restarting Systemyes
Exampleselect TOTAL_POINTS_FAIL from root.stats.write.global
  • TOTAL_POINTS_SUCCESS (GLOABAL)
NameTOTAL_POINTS_SUCCESS
DescriptionCalculate the c.
TypeWriting data statistics
Timeseries Nameroot.stats.write.global.TOTAL_POINTS_SUCCESS
Reset After Restarting Systemyes
Exampleselect TOTAL_POINTS_SUCCESS from root.stats.write.global
  • TOTAL_REQ_SUCCESS (STORAGE GROUP)
NameTOTAL_REQ_SUCCESS
DescriptionCalculate the successful requests number for specific storage group
TypeWriting data statistics
Timeseries Nameroot.stats.write.<storage_group_name>.TOTAL_REQ_SUCCESS
Reset After Restarting Systemyes
Exampleselect TOTAL_REQ_SUCCESS from root.stats.write.<storage_group_name>
  • TOTAL_REQ_FAIL (STORAGE GROUP)
NameTOTAL_REQ_FAIL
DescriptionCalculate the fail requests number for specific storage group
TypeWriting data statistics
Timeseries Nameroot.stats.write.<storage_group_name>.TOTAL_REQ_FAIL
Reset After Restarting Systemyes
Exampleselect TOTAL_REQ_FAIL from root.stats.write.<storage_group_name>
  • TOTAL_POINTS_SUCCESS (STORAGE GROUP)
NameTOTAL_POINTS_SUCCESS
DescriptionCalculate the successful writing points number for specific storage group.
TypeWriting data statistics
Timeseries Nameroot.stats.write.<storage_group_name>.TOTAL_POINTS_SUCCESS
Reset After Restarting Systemyes
Exampleselect TOTAL_POINTS_SUCCESS from root.stats.write.<storage_group_name>
  • TOTAL_POINTS_FAIL (STORAGE GROUP)
NameTOTAL_POINTS_FAIL
DescriptionCalculate the fail writing points number for specific storage group.
TypeWriting data statistics
Timeseries Nameroot.stats.write.<storage_group_name>.TOTAL_POINTS_FAIL
Reset After Restarting Systemyes
Exampleselect TOTAL_POINTS_FAIL from root.stats.write.<storage_group_name>

Note:

<storage_group_name> should be replaced by real storage group name, and the ‘.’ in storage group need to be replaced by ‘_‘. For example, the storage group name is ‘root.a.b’, when using in the statistics, it will change to ‘root_a_b’

Example

Here we give some example of using writing data statistics.

If you want to know the global successful writing points number, you can use select clause to query it’s value. The query statement is like this:

  1. select TOTAL_POINTS_SUCCESS from root.stats.write.global

If you want to know the successfule writing points number of root.ln (storage group), here is the query statement:

  1. select TOTAL_POINTS_SUCCESS from root.stats.write.root_ln

If you want to know the current timeseries point in the system, you can use MAX_VALUE function to query. Here is the query statement:

  1. select MAX_VALUE(TOTAL_POINTS_SUCCESS) from root.stats.write.root_ln

File Size Monitor

Sometimes we are concerned about how the data file size of IoTDB is changing, maybe to help calculate how much disk space is left or the data ingestion speed. The File Size Monitor provides several statistics to show how different types of file-sizes change.

The file size monitor defaults to collect file size data every 5 seconds using the same shared parameter back_loop_period_in_second,

Unlike Writing Data Monitor, currently File Size Monitor will not delete statistic data at regular intervals.

You can also use select clause to get the file size statistics like other time series.

Here are the file size statistics:

  • DATA
NameDATA
DescriptionCalculate the sum of all the files’s sizes under the data directory (data/data by default) in byte.
TypeFile size statistics
Timeseries Nameroot.stats.file_size.DATA
Reset After Restarting SystemNo
Exampleselect DATA from root.stats.file_size.DATA
  • SETTLED
NameSETTLED
DescriptionCalculate the sum of all the TsFile size (under data/data/settled by default) in byte. If there are multiple TsFile directories like {data/data/settled1, data/data/settled2}, this statistic is the sum of their size.
TypeFile size statistics
Timeseries Nameroot.stats.file_size.SETTLED
Reset After Restarting SystemNo
Exampleselect SETTLED from root.stats.file_size.SETTLED
  • OVERFLOW
NameOVERFLOW
DescriptionCalculate the sum of all the out-of-order data file size (under data/data/unsequence by default) in byte.
TypeFile size statistics
Timeseries Nameroot.stats.file_size.OVERFLOW
Reset After Restarting SystemNo
Exampleselect OVERFLOW from root.stats.file_size.OVERFLOW
  • WAL
NameWAL
DescriptionCalculate the sum of all the Write-Ahead-Log file size (under data/wal by default) in byte.
TypeFile size statistics
Timeseries Nameroot.stats.file_size.WAL
Reset After Restarting SystemNo
Exampleselect WAL from root.stats.file_size.WAL
  • INFO
NameINFO
DescriptionCalculate the sum of all the .restore, etc. file size (under data/system/info) in byte.
TypeFile size statistics
Timeseries Nameroot.stats.file_size.INFO
Reset After Restarting SystemNo
Exampleselect INFO from root.stats.file_size.INFO
  • SCHEMA
NameSCHEMA
DescriptionCalculate the sum of all the metadata file size (under data/system/metadata) in byte.
TypeFile size statistics
Timeseries Nameroot.stats.file_size.SCHEMA
Reset After Restarting SystemNo
Exampleselect SCHEMA from root.stats.file_size.SCHEMA

Performance Monitor

Introduction

In order to grasp the performance of iotdb, we add this module to count the time-consumption of each operation. This module can compute the statistics of the avg time-consuming of each operation and the proportion of each operation whose time consumption falls into a time range. The output is in log_measure.log file. An output example is below.

Monitor and Log Tools - 图3

Configuration parameter

location:conf/iotdb-engine.properties

**Table -parameter and description**

ParameterDefault ValueDescription
enable_performance_statfalseIs stat performance of sub-module enable.
performance_stat_display_interval60000The interval of display statistic result in ms.
performance_stat_memory_in_kb20The memory used for performance_stat in kb.

JMX MBean

Connect to jconsole with port 31999,and choose ‘MBean’in menu bar. Expand the sidebar and choose ‘org.apache.iotdb.db.cost.statistic’. You can Find:

Monitor and Log Tools - 图4

Attribute

  1. EnableStat:Whether the statistics are enabled or not, if it is true, the module records the time-consuming of each operation and prints the results; It is non-editable but can be changed by the function below.

  2. DisplayIntervalInMs:The interval between print results. The changes will not take effect instantly. To make the changes effective, you should call startContinuousStatistics() or startOneTimeStatistics().

  3. OperationSwitch:It’s a map to indicate whether the statistics of one kind of operation should be computed, the key is operation name and the value is true means the statistics of the operation are enabled, otherwise disabled. This parameter cannot be changed directly, it’s changed by operation ‘changeOperationSwitch()’.

Operation

  1. startContinuousStatistics: Start the statistics and output at interval of ‘DisplayIntervalInMs’.
  2. startOneTimeStatistics:Start the statistics and output in delay of ‘DisplayIntervalInMs’.
  3. stopStatistic:Stop the statistics.
  4. clearStatisticalState(): clear current stat result, reset statistical result.
  5. changeOperationSwitch(String operationName, Boolean operationState):set whether to monitor a kind of operation. The param ‘operationName’ is the name of operation, defined in attribute operationSwitch. The param operationState is whether to enable the statistics or not. If the state is switched successfully, the function will return true, else return false.

Adding Custom Monitoring Items for contributors of IOTDB

Add Operation

Add an enumeration in org.apache.iotdb.db.cost.statistic.Operation.

Add Timing Code in Monitoring Area

Add timing code in the monitoring start area:

  1. long t0 = System. currentTimeMillis();

Add timing code in the monitoring stop area:

  1. Measurement.INSTANCE.addOperationLatency(Operation, t0);

Cache Hit Ratio Statistics

Overview

To improve query performance, IOTDB caches ChunkMetaData and TsFileMetaData. Users can view the cache hit ratio through debug level log and MXBean, and adjust the memory occupied by the cache according to the cache hit ratio and system memory. The method of using MXBean to view cache hit ratio is as follows:

  1. Connect to jconsole with port 31999 and select ‘MBean’ in the menu item above.
  2. Expand the sidebar and select ‘org.apache.iotdb.db.service’. You will get the results shown in the following figure:

Monitor and Log Tools - 图5 ## System log

IoTDB allows users to configure IoTDB system logs (such as log output level) by modifying the log configuration file. The default location of the system log configuration file is in $IOTDB_HOME/conf folder.

The default log configuration file is named logback.xml. The user can modify the configuration of the system running log by adding or changing the xml tree node parameters. It should be noted that the configuration of the system log using the log configuration file does not take effect immediately after the modification, instead, it will take effect after restarting the system. The usage of logback.xml is just as usual.

At the same time, in order to facilitate the debugging of the system by the developers and DBAs, we provide several JMX interfaces to dynamically modify the log configuration, and configure the Log module of the system in real time without restarting the system.

Dynamic System Log Configuration

Connect JMX

Here we use JConsole to connect with JMX.

Start the JConsole, establish a new JMX connection with the IoTDB Server (you can select the local process or input the IP and PORT for remote connection, the default operation port of the IoTDB JMX service is 31999). Fig 4.1 shows the connection GUI of JConsole.

Monitor and Log Tools - 图6

After connected, click MBean and find ch.qos.logback.classic.default.ch.qos.logback.classic.jmx.JMXConfigurator(As shown in fig 4.2). Monitor and Log Tools - 图7

In the JMXConfigurator Window, there are 6 operations provided for you, as shown in fig 4.3. You can use there interfaces to perform operation.

Monitor and Log Tools - 图8

Interface Instruction

  • reloadDefaultConfiguration

This method is to reload the default logback configuration file. The user can modify the default configuration file first, and then call this method to reload the modified configuration file into the system to take effect.

  • reloadByFileName

This method loads a logback configuration file with the specified path and name, and then makes it take effect. This method accepts a parameter of type String named p1, which is the path to the configuration file that needs to be specified for loading.

  • getLoggerEffectiveLevel

This method is to obtain the current log level of the specified Logger. This method accepts a String type parameter named p1, which is the name of the specified Logger. This method returns the log level currently in effect for the specified Logger.

  • getLoggerLevel

This method is to obtain the log level of the specified Logger. This method accepts a String type parameter named p1, which is the name of the specified Logger. This method returns the log level of the specified Logger. It should be noted that the difference between this method and the getLoggerEffectiveLevel method is that the method returns the log level that the specified Logger is set in the configuration file. If the user does not set the log level for the Logger. , then return empty. According to Logger’s log-level inheritance mechanism, f a Logger’s level is not explicitly set, it will inherit the log level settings from its nearest ancestor. At this point, calling the getLoggerEffectiveLevel method will return the log level in which the Logger is in effect; calling getLoggerLevel will return null.