Monitoring ArangoDB using collectd

Problem

The ArangoDB web interface shows a nice summary of the current state. I want to see similar numbers in my monitoring system so I can analyze the system usage post mortem or send alarms on failure.

Solution

Collectd is an excellent tool to gather all kinds of metrics from a system, and deliver it to a central monitoring like Graphite and / or Nagios.

Ingredients

For this recipe you need to install the following tools:

Configuring collectd

For aggregating the values we will use the cURL-JSON plug-in. We will store the values using the Round-Robin-Database writer(RRD) which kcollectd can later on present to you.

We assume your collectd comes from your distribution and reads its config from /etc/collectd/collectd.conf. Since this file tends to become pretty unreadable quickly, we use the include mechanism:

  1. <Include "/etc/collectd/collectd.conf.d">
  2. Filter "*.conf"
  3. </Include>

This way we can make each metric group on compact set config files. It consists of three components:

  • loading the plug-in
  • adding metrics to the TypesDB
  • the configuration for the plug-in itself

rrdtool

We will use the Round-Robin-Database as storage backend for now. It creates its own database files of fixed size for each specific time range. Later you may choose more advanced writer-plug-ins, which may do network distribution of your metrics or integrate the above mentioned Graphite or your already established monitoring, etc.

For the RRD we will go pretty much with defaults:

  1. # Load the plug-in:
  2. LoadPlugin rrdtool
  3. <Plugin rrdtool>
  4. DataDir "/var/lib/collectd/rrd"
  5. # CacheTimeout 120
  6. # CacheFlush 900
  7. # WritesPerSecond 30
  8. # CreateFilesAsync false
  9. # RandomTimeout 0
  10. #
  11. # The following settings are rather advanced
  12. # and should usually not be touched:
  13. # StepSize 10
  14. # HeartBeat 20
  15. # RRARows 1200
  16. # RRATimespan 158112000
  17. # XFF 0.1
  18. </Plugin>

cURL JSON

Collectd comes with a wide range of metric aggregation plug-ins. Many tools today use JSON as data formatting grammar; so does ArangoDB.

Therefore a plug-in offering to fetch JSON documents via HTTP is the perfect match to query ArangoDBs administrative Statistics interface:

  1. # Load the plug-in:
  2. LoadPlugin curl_json
  3. # we need to use our own types to generate individual names for our gauges:
  4. # TypesDB "/etc/collectd/arangodb_types.db"
  5. <Plugin curl_json>
  6. # Adjust the URL so collectd can reach your arangod:
  7. <URL "http://localhost:8529/_db/_system/_admin/statistics">
  8. # Set your authentication to Aardvark here:
  9. User "root"
  10. # Password "bar"
  11. <Key "http/requestsTotal">
  12. Type "gauge"
  13. </Key>
  14. <Key "http/requestsPatch">
  15. Type "gauge"
  16. </Key>
  17. <Key "http/requestsPut">
  18. Type "gauge"
  19. </Key>
  20. <Key "http/requestsOther">
  21. Type "gauge"
  22. </Key>
  23. <Key "http/requestsAsync">
  24. Type "gauge"
  25. </Key>
  26. <Key "http/requestsPost">
  27. Type "gauge"
  28. </Key>
  29. <Key "http/requestsOptions">
  30. Type "gauge"
  31. </Key>
  32. <Key "http/requestsHead">
  33. Type "gauge"
  34. </Key>
  35. <Key "http/requestsGet">
  36. Type "gauge"
  37. </Key>
  38. <Key "http/requestsDelete">
  39. Type "gauge"
  40. </Key>
  41. <Key "system/minorPageFaults">
  42. Type "gauge"
  43. </Key>
  44. <Key "system/majorPageFaults">
  45. Type "gauge"
  46. </Key>
  47. <Key "system/userTime">
  48. Type "gauge"
  49. </Key>
  50. <Key "system/systemTime">
  51. Type "gauge"
  52. </Key>
  53. <Key "system/numberOfThreads">
  54. Type "gauge"
  55. </Key>
  56. <Key "system/virtualSize">
  57. Type "gauge"
  58. </Key>
  59. <Key "system/residentSize">
  60. Type "gauge"
  61. </Key>
  62. <Key "system/residentSizePercent">
  63. Type "gauge"
  64. </Key>
  65. <Key "server/threads/running">
  66. Type "gauge"
  67. </Key>
  68. <Key "server/threads/queued">
  69. Type "gauge"
  70. </Key>
  71. <Key "server/threads/working">
  72. Type "gauge"
  73. </Key>
  74. <Key "server/threads/blocked">
  75. Type "gauge"
  76. </Key>
  77. <Key "server/uptime">
  78. Type "gauge"
  79. </Key>
  80. <Key "server/physicalMemory">
  81. Type "gauge"
  82. </Key>
  83. <Key "server/v8Context/available">
  84. Type "gauge"
  85. </Key>
  86. <Key "server/v8Context/max">
  87. Type "gauge"
  88. </Key>
  89. <Key "server/v8Context/busy">
  90. Type "gauge"
  91. </Key>
  92. <Key "server/v8Context/dirty">
  93. Type "gauge"
  94. </Key>
  95. <Key "server/v8Context/free">
  96. Type "gauge"
  97. </Key>
  98. <Key "client/totalTime/count">
  99. Type "client_totalTime_count"
  100. </Key>
  101. <Key "client/totalTime/sum">
  102. Type "client_totalTime_sum"
  103. </Key>
  104. <Key "client/totalTime/counts/0">
  105. Type "client_totalTime_counts0"
  106. </Key>
  107. <Key "client/bytesReceived/count">
  108. Type "client_bytesReceived_count"
  109. </Key>
  110. <Key "client/bytesReceived/sum">
  111. Type "client_bytesReceived_sum"
  112. </Key>
  113. <Key "client/bytesReceived/counts/0">
  114. Type "client_bytesReceived_counts0"
  115. </Key>
  116. <Key "client/requestTime/count">
  117. Type "client_requestTime_count"
  118. </Key>
  119. <Key "client/requestTime/sum">
  120. Type "client_requestTime_sum"
  121. </Key>
  122. <Key "client/requestTime/counts/0">
  123. Type "client_requestTime_counts0"
  124. </Key>
  125. <Key "client/connectionTime/count">
  126. Type "client_connectionTime_count"
  127. </Key>
  128. <Key "client/connectionTime/sum">
  129. Type "client_connectionTime_sum"
  130. </Key>
  131. <Key "client/connectionTime/counts/0">
  132. Type "client_connectionTime_counts0"
  133. </Key>
  134. <Key "client/queueTime/count">
  135. Type "client_queueTime_count"
  136. </Key>
  137. <Key "client/queueTime/sum">
  138. Type "client_queueTime_sum"
  139. </Key>
  140. <Key "client/queueTime/counts/0">
  141. Type "client_queueTime_counts0"
  142. </Key>
  143. <Key "client/bytesSent/count">
  144. Type "client_bytesSent_count"
  145. </Key>
  146. <Key "client/bytesSent/sum">
  147. Type "client_bytesSent_sum"
  148. </Key>
  149. <Key "client/bytesSent/counts/0">
  150. Type "client_bytesSent_counts0"
  151. </Key>
  152. <Key "client/ioTime/count">
  153. Type "client_ioTime_count"
  154. </Key>
  155. <Key "client/ioTime/sum">
  156. Type "client_ioTime_sum"
  157. </Key>
  158. <Key "client/ioTime/counts/0">
  159. Type "client_ioTime_counts0"
  160. </Key>
  161. <Key "client/httpConnections">
  162. Type "gauge"
  163. </Key>
  164. </URL>
  165. </Plugin>

To circumvent the shortcoming of the curl_JSON plug-in to only take the last path element as name for the metric, we need to give them a name using our own types.db file in /etc/collectd/arangodb_types.db:

  1. client_totalTime_count value:GAUGE:0:9223372036854775807
  2. client_totalTime_sum value:GAUGE:U:U
  3. client_totalTime_counts0 value:GAUGE:U:U
  4. client_bytesReceived_count value:GAUGE:0:9223372036854775807
  5. client_bytesReceived_sum value:GAUGE:U:U
  6. client_bytesReceived_counts0 value:GAUGE:U:U
  7. client_requestTime_count value:GAUGE:0:9223372036854775807
  8. client_requestTime_sum value:GAUGE:U:U
  9. client_requestTime_counts0 value:GAUGE:U:U
  10. client_connectionTime_count value:GAUGE:0:9223372036854775807
  11. client_connectionTime_sum value:GAUGE:U:U
  12. client_connectionTime_counts0 value:GAUGE:U:U
  13. client_queueTime_count value:GAUGE:0:9223372036854775807
  14. client_queueTime_sum value:GAUGE:U:U
  15. client_queueTime_counts0 value:GAUGE:U:U
  16. client_bytesSent_count value:GAUGE:0:9223372036854775807
  17. client_bytesSent_sum value:GAUGE:U:U
  18. client_bytesSent_counts0 value:GAUGE:U:U
  19. client_ioTime_count value:GAUGE:0:9223372036854775807
  20. client_ioTime_sum value:GAUGE:U:U
  21. client_ioTime_counts0 value:GAUGE:U:U

Please note that you probably need to uncomment this line from the main collectd.conf:

  1. # TypesDB "/usr/share/collectd/types.db" "/etc/collectd/my_types.db"

in order to make it still load its main types definition file.

Rolling your own

You may want to monitor your own metrics from ArangoDB. Here is a simple example how to use the config:

  1. {
  2. "testArray":[1,2],
  3. "testArrayInbetween":[{"blarg":3},{"blub":4}],
  4. "testDirectHit":5,
  5. "testSubLevelHit":{"oneMoreLevel":6}
  6. }

This config snippet will parse the JSON above:

  1. <Key "testArray/0">
  2. Type "gauge"
  3. # Expect: 1
  4. </Key>
  5. <Key "testArray/1">
  6. Type "gauge"
  7. # Expect: 2
  8. </Key>
  9. <Key "testArrayInbetween/0/blarg">
  10. Type "gauge"
  11. # Expect: 3
  12. </Key>
  13. <Key "testArrayInbetween/1/blub">
  14. Type "gauge"
  15. # Expect: 4
  16. </Key>
  17. <Key "testDirectHit">
  18. Type "gauge"
  19. # Expect: 5
  20. </Key>
  21. <Key "testSubLevelHit/oneMoreLevel">
  22. Type "gauge"
  23. # Expect: 6
  24. </Key

Get it served

Now we will (re)start collectd so it picks up our configuration:

  1. /etc/init.d/collectd start

We will inspect the syslog to revalidate nothing went wrong:

  1. Mar 3 13:59:52 localhost collectd[11276]: Starting statistics collection and monitoring daemon: collectd.
  2. Mar 3 13:59:52 localhost systemd[1]: Started LSB: manage the statistics collection daemon.
  3. Mar 3 13:59:52 localhost collectd[11283]: Initialization complete, entering read-loop.

Collectd adds the hostname to the directory address, so now we should have files like these:

  1. -rw-r--r-- 1 root root 154888 Mar 2 16:53 /var/lib/collectd/rrd/localhost/curl_json-default/gauge-numberOfThreads15M.rrd

Now we start kcollectd to view the values in the RRD file:

Kcollectd screenshot

Since we started putting values in just now, we need to choose ‘last hour’ and zoom in a little more to inspect the values.

Finished with this dish, wait for more metrics to come in other recipes.