On Posix systems we can use existing tracing mechanism(strace, blktrace etc.) to understand the system calls, IO requests. But on storage systems where we cannot use existing tracing tools, we added a mechanism to trace IO operations to understand IO behavior of RocksDB while accessing data on the storage.

Table of Contents

IO Trace Format

IO trace record contains following information:

Required for all records:

Column NameValuesComment
Access timestamp in microsecondsunsigned long
File Operationstringtype of operation (Append, Read,…).
Latencyunsigned long
IO StatusIO Status of the file operation returned.
File NamestringFile name is printed instead of full file path

Based on File Operation:

Column NameValuesComment
Lengthunsigned long
Offsetunsigned long
File Sizeunsigned long

Usage

An example to start IO tracing:

  1. Env* env = rocksdb::Env::Default();
  2. EnvOptions env_options;
  3. std::string trace_path = "/tmp/binary_trace_test_example”;
  4. std::unique_ptr<TraceWriter> trace_writer;
  5. DB* db = nullptr;
  6. std::string db_name = "/tmp/rocksdb”;
  7. /*Create the trace file writer*/
  8. NewFileTraceWriter(env, env_options, trace_path, &trace_writer);
  9. DB::Open(options, dbname);
  10. /*Start IO tracing*/
  11. db->StartIOTrace(env, trace_opt, std::move(trace_writer));
  12. /*Your call of RocksDB APIs */
  13. DB::Put();
  14. /*End IO tracing*/
  15. db->EndIOTrace();

If you call DB::Put then io_tracer will record all the FileSystem APIs called during DB::Put.

Implementation

  • Added tracing wrappers like FileSystemTracingWrapper extends FileSystemWrapper, FSRandomRWFileTracingWrapper extends FSRandomRWFileWrapper , etc that calls the underlying FileSystem APIs and log the tracing.
  • In FileSystemTracingWrapper APIs (for eg FileSystemTracingWrapper::Close()):
    • Call underlying FileSystem::Close(),
    • Create IOTraceRecord,
    • Call IOTracer::WriteIOOp to dump the trace in trace file.
  • Added new classes FileSystemPtr, etc. that overloads -> operator. It returns the appropriate f/s pointer based on tracing is enabled/disabled to avoid tracing overhead.
  • Details can be found in:

IO Tracer Parser

The trace file generated from IO tracing is in binary format. So parser can be used to read that binary trace file

  1. ./io_tracer_parser -io_trace_file trace_file

Implementation details can be found in https://github.com/facebook/rocksdb/tree/main/tools/io_tracer_parser_tool.h

Planned Work

  • Trace DB::Open
  • Include more information in trace format