概述

MyRocks提供了丰富的tools,如sst_dump, mysql_ldb等,这些工具对我们的运维和分析问题非常有用。

sst_dump 可以导出sst中的数据和属性信息。

  1. sst_dump --help
  2. sst_dump --file=<data_dir_OR_sst_file> [--command=check|scan|raw]
  3. --file=<data_dir_OR_sst_file>
  4. Path to SST file or directory containing SST files
  5. --command=check|scan|raw|verify
  6. check: Iterate over entries in files but dont print anything except if an error is encounterd (default command)
  7. scan: Iterate over entries in files and print them to screen
  8. raw: Dump all the table contents to <file_name>_dump.txt
  9. verify: Iterate all the blocks in files verifying checksum to detect possible coruption but dont print anything except if a corruption is encountered
  10. recompress: reports the SST file size if recompressed with different
  11. compression types
  12. --output_hex
  13. Can be combined with scan command to print the keys and values in Hex
  14. --from=<user_key>
  15. Key to start reading from when executing check|scan
  16. --to=<user_key>
  17. Key to stop reading at when executing check|scan
  18. --prefix=<user_key>
  19. Returns all keys with this prefix when executing check|scan
  20. Cannot be used in conjunction with --from
  21. --read_num=<num>
  22. Maximum number of entries to read when executing check|scan
  23. --verify_checksum
  24. Verify file checksum when executing check|scan
  25. --input_key_hex
  26. Can be combined with --from and --to to indicate that these values are encoded in Hex
  27. --show_properties
  28. Print table properties after iterating over the file when executing
  29. check|scan|raw
  30. --set_block_size=<block_size>
  31. Can be combined with --command=recompress to set the block size that will
  32. be used when trying different compression algorithms
  33. --compression_types=<comma-separated list of CompressionType members, e.g.,
  34. kSnappyCompression>
  35. Can be combined with --command=recompress to run recompression for this
  36. list of compression types
  37. --parse_internal_key=<0xKEY>
  38. Convenience option to parse an internal key on the command line. Dumps the
  39. internal key in hex format {'key' @ SN: type}

mysql_ldb 工具功能非常强大,可以解析rocksdb各类文件,sst, manifest, wal等; 可以写入和查询数据;还提供了一些维护功能,详见如下help信息

  1. bin/mysql_ldb --help
  2. ldb - RocksDB Tool
  3. commands MUST specify --db=<full_path_to_db_directory> when necessary
  4. The following optional parameters control if keys/values are input/output as hex or as plain strings:
  5. --key_hex : Keys are input/output as hex
  6. --value_hex : Values are input/output as hex
  7. --hex : Both keys and values are input/output as hex
  8. The following optional parameters control the database internals:
  9. --column_family=<string> : name of the column family to operate on. default: default column family
  10. --ttl with 'put','get','scan','dump','query','batchput' : DB supports ttl and value is internally timestamp-suffixed
  11. --try_load_options : Try to load option file from DB.
  12. --ignore_unknown_options : Ignore unknown options when loading option file.
  13. --bloom_bits=<int,e.g.:14>
  14. --fix_prefix_len=<int,e.g.:14>
  15. --compression_type=<no|snappy|zlib|bzip2|lz4|lz4hc|xpress|zstd>
  16. --compression_max_dict_bytes=<int,e.g.:16384>
  17. --block_size=<block_size_in_bytes>
  18. --auto_compaction=<true|false>
  19. --db_write_buffer_size=<int,e.g.:16777216>
  20. --write_buffer_size=<int,e.g.:4194304>
  21. --file_size=<int,e.g.:2097152>
  22. Data Access Commands:
  23. put <key> <value> [--ttl]
  24. get <key> [--ttl]
  25. batchput <key> <value> [<key> <value>] [..] [--ttl]
  26. scan [--from] [--to] [--ttl] [--timestamp] [--max_keys=<N>q] [--start_time=<N>:- is inclusive] [--end_time=<N>:- is exclusive] [--no_value]
  27. delete <key>
  28. deleterange <begin key> <end key>
  29. query [--ttl]
  30. Starts a REPL shell. Type help for list of available commands.
  31. approxsize [--from] [--to]
  32. checkconsistency
  33. Admin Commands:
  34. dump_wal --walfile=<write_ahead_log_file_path> [--header] [--print_value]
  35. compact [--from] [--to]
  36. reduce_levels --new_levels=<New number of levels> [--print_old_levels]
  37. change_compaction_style --old_compaction_style=<Old compaction style: 0 for level compaction, 1 for universal compaction> --new_compaction_style=<New compaction style: 0 for level compaction, 1 for universal compaction>
  38. dump [--from] [--to] [--ttl] [--max_keys=<N>] [--timestamp] [--count_only] [--count_delim=<char>] [--stats] [--bucket=<N>] [--start_time=<N>:- is inclusive] [--end_time=<N>:- is exclusive] [--path=<path_to_a_file>]
  39. load [--create_if_missing] [--disable_wal] [--bulk_load] [--compact]
  40. manifest_dump [--verbose] [--json] [--path=<path_to_manifest_file>]
  41. list_column_families full_path_to_db_directory
  42. dump_live_files
  43. idump [--from] [--to] [--input_key_hex] [--max_keys=<N>] [--count_only] [--count_delim=<char>] [--stats]
  44. repair
  45. backup [--backup_env_uri] [--backup_dir] [--num_threads] [--stderr_log_level=<int (InfoLogLevel)>]
  46. restore [--backup_env_uri] [--backup_dir] [--num_threads] [--stderr_log_level=<int (InfoLogLevel)>]
  47. checkpoint [--checkpoint_dir]

Note: ldb是rocksdb自带的工具,其功能和mysql_ldb一致。但mysql_ldb使用了MyRocks的comparactor, 因此推荐使用mysql_ldb

示例初始化

  1. create table t1( c1 int , c2 int, c3 varchar(10), primary key (c1), key
  2. idx1(c2)) engine=rocksdb;
  3. insert t1 values (1,101,'A');
  4. insert t1 values (2,102,'B');
  5. insert t1 values (3,103,'C');
  6. insert t1 values (4,104,'D');
  7. insert t1 values (5,105,'E');
  8. select * from t1;
  9. +----+------+------+
  10. | c1 | c2 | c3 |
  11. +----+------+------+
  12. | 1 | 101 | A |
  13. | 2 | 102 | B |
  14. | 3 | 103 | C |
  15. | 4 | 104 | D |
  16. | 5 | 105 | E |
  17. +----+------+------+
  18. SET GLOBAL rocksdb_force_flush_memtable_now=1;
  19. select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
  20. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  21. | table_schema | table_name | index_name | index_number | sst_name | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
  22. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  23. | test | t1 | PRIMARY | 256 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
  24. | test | t1 | idx1 | 257 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
  25. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+

使用 sst_dump

查看test.t1数据分布情况

  1. select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
  2. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  3. | table_schema | table_name | index_name | index_number | sst_name | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
  4. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  5. | test | t1 | PRIMARY | 256 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
  6. | test | t1 | idx1 | 257 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
  7. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  • 导出t1的数据
  1. bin/sst_dump --command=scan --file=data/.rocksdb/000030.sst --output_hex
  2. from [] to []
  3. Process data/.rocksdb/000030.sst
  4. Sst file format: block-based
  5. '0000010080000001' seq:13, type:1 => 00650000000141
  6. '0000010080000002' seq:16, type:1 => 00660000000142
  7. '0000010080000003' seq:19, type:1 => 00670000000143
  8. '0000010080000004' seq:22, type:1 => 00680000000144
  9. '0000010080000005' seq:25, type:1 => 00690000000145
  10. '00000101018000006580000001' seq:14, type:1 =>
  11. '00000101018000006680000002' seq:17, type:1 =>
  12. '00000101018000006780000003' seq:20, type:1 =>
  13. '00000101018000006880000004' seq:23, type:1 =>
  14. '00000101018000006980000005' seq:26, type:1 =>

前五行为主键primary数据

后五行为索引idx1的数据

MyRocks记录格式参考这里

以此行(1,101,’A’)数据为例子

主键为

  1. '0000010080000001' seq:13, type:1 => 00650000000141

key由indexid+c1+seq+type组成

  1. Primary index id: 256 ==> 100
  2. c1: 1 ==> 80000001 (符号反转)
  3. type:1 PUT

value 由c2+c3组成

  1. c2: 101 ==> 65
  2. c3: A ==> 141

二级索引idx1数据为

  1. '00000101018000006580000001' seq:14, type:1 =>

key由indexid+c2(主键)+c1(二级索引键)组成

  1. second index id: 257 ==> 101
  2. c2 : 101 ==> 80000065 主键
  3. c1: 1 ==> 80000001 (符号反转) 二级索引键

value为null

Note type值参考如下定义

  1. enum ValueType : unsigned char {
  2. kTypeDeletion = 0x0,
  3. kTypeValue = 0x1,
  4. kTypeMerge = 0x2,
  5. kTypeLogData = 0x3, // WAL only.
  6. kTypeColumnFamilyDeletion = 0x4, // WAL only.
  7. kTypeColumnFamilyValue = 0x5, // WAL only.
  8. kTypeColumnFamilyMerge = 0x6, // WAL only.
  9. kTypeSingleDeletion = 0x7,
  10. kTypeColumnFamilySingleDeletion = 0x8, // WAL only.
  11. kTypeBeginPrepareXID = 0x9, // WAL only.
  12. kTypeEndPrepareXID = 0xA, // WAL only.
  13. kTypeCommitXID = 0xB, // WAL only.
  14. kTypeRollbackXID = 0xC, // WAL only.
  15. kTypeNoop = 0xD, // WAL only.
  16. kTypeColumnFamilyRangeDeletion = 0xE, // WAL only.
  17. kTypeRangeDeletion = 0xF, // meta block
  18. kTypeColumnFamilyBlobIndex = 0x10, // Blob DB only
  19. kTypeBlobIndex = 0x11, // Blob DB only
  20. // When the prepared record is also persisted in db, we use a different
  21. // record. This is to ensure that the WAL that is generated by a WritePolicy
  22. // is not mistakenly read by another, which would result into data
  23. // inconsistency.
  24. kTypeBeginPersistedPrepareXID = 0x12, // WAL only.
  25. kMaxValue = 0x7F // Not used for storing records.
  26. };
  • 按指定范围导出数据
  1. bin/sst_dump --command=scan --file=data/.rocksdb/000030.sst --from='0x0000010080000002' --to='0x0000010080000005' --input_key_hex --output_hex
  2. from [0000010080000002] to [0000010080000005]
  3. Process data/.rocksdb/000030.sst
  4. Sst file format: block-based
  5. '0000010080000002' seq:16, type:1 => 00660000000142
  6. '0000010080000003' seq:19, type:1 => 00670000000143
  7. '0000010080000004' seq:22, type:1 => 00680000000144
  • 查看sst属性信息

command=raw可以将数据和属性信息都写到*_dump.txt文件中

  1. bin/sst_dump --command=raw --file=data/.rocksdb/000030.sst --output_hex
  2. from [] to []
  3. Process data/.rocksdb/000030.sst
  4. Sst file format: block-based
  5. raw dump written to file data/.rocksdb/000030_dump.txt

另外选项–show_properties也可以展示属性信息

mysql_ldb 使用

  • 查看sst数据
  1. bin/mysql_ldb --db=data/.rocksdb scan --hex
  2. 0x0000010080000001 : 0x00650000000141
  3. 0x0000010080000002 : 0x00660000000142
  4. 0x0000010080000003 : 0x00670000000143
  5. 0x0000010080000004 : 0x00680000000144
  6. 0x0000010080000005 : 0x00690000000145
  7. 0x00000101018000006580000001 : 0x
  8. 0x00000101018000006680000002 : 0x
  9. 0x00000101018000006780000003 : 0x
  10. 0x00000101018000006880000004 : 0x
  11. 0x00000101018000006980000005 : 0x
  • 写入数据

MyRocks在运行过程中,mysql_ldb是不能写入数据的

  1. bin/mysql_ldb --db=data/.rocksdb put --hex 0x0000010080000006 0x006A0000000146
  2. Failed: IO error: While lock file: data/.rocksdb/LOCK: Resource temporarily unavailable

shutdown myrocks实例后,再写入成功

  1. bin/mysql_ldb --db=data/.rocksdb put --hex 0x0000010080000006 0x006A0000000146
  2. ok

可以看到数据写到了新的sst(000041.sst)中

  1. select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
  2. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  3. | table_schema | table_name | index_name | index_number | sst_name | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
  4. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  5. | test | t1 | PRIMARY | 256 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
  6. | test | t1 | idx1 | 257 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
  7. | test | t1 | PRIMARY | 256 | 000041.sst | 1 | 0 | 0 | 0 | 0 |
  8. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+

同时可以看到新的数据(6,106,’F’);

  1. select * from t1;
  2. +----+------+------+
  3. | c1 | c2 | c3 |
  4. +----+------+------+
  5. | 1 | 101 | A |
  6. | 2 | 102 | B |
  7. | 3 | 103 | C |
  8. | 4 | 104 | D |
  9. | 5 | 105 | E |
  10. | 6 | 106 | F |
  11. +----+------+------+

二级索引没有写数据,所以查不到

  1. select * from t1 where c2=106;
  2. Empty set (0.00 sec)
  3. explain select * from t1 where c2=106;
  4. +----+-------------+-------+------+---------------+------+---------+-------+------+-------+
  5. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
  6. +----+-------------+-------+------+---------------+------+---------+-------+------+-------+
  7. | 1 | SIMPLE | t1 | ref | idx1 | idx1 | 5 | const | 4 | NULL |
  8. +----+-------------+-------+------+---------------+------+---------+-------+------+-------+

接着二级索引插入数据

  1. bin/mysql_ldb --db=data/.rocksdb put --hex 0x00000101018000006A80000006 0x
  2. OK

此时可以从二级索引查出数据了

  1. select * from t1 where c2=106;
  2. +----+------+------+
  3. | c1 | c2 | c3 |
  4. +----+------+------+
  5. | 6 | 106 | F |
  6. +----+------+------+
  7. select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
  8. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  9. | table_schema | table_name | index_name | index_number | sst_name | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
  10. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  11. | test | t1 | PRIMARY | 256 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
  12. | test | t1 | idx1 | 257 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
  13. | test | t1 | PRIMARY | 256 | 000041.sst | 1 | 0 | 0 | 0 | 0 |
  14. | test | t1 | idx1 | 257 | 000058.sst | 1 | 0 | 0 | 0 | 0 |
  15. +--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
  • mysql_ldb 查看MANIFEST文件

MANIFEST文件格式参考这里

  1. bin/mysql_ldb manifest_dump --path=data/.rocksdb/MANIFEST-000059
  2. --------------- Column family "default" (ID 0) --------------
  3. log number: 58
  4. comparator: RocksDB_SE_v3.10
  5. --- level 0 --- version# 2 ---
  6. 58:740['
  7. --------------- Column family "__system__" (ID 1) --------------
  8. log number: 63
  9. comparator: RocksDB_SE_v3.10
  10. --- level 0 --- version# 3 ---
  11. 64:773['
  12. next_file_number 66 last_sequence 36 prev_log_number 0 max_column_family 1
  • mysql_ldb repair

人为将MANIFEST-000059删除,rocksdb数据将无法查看

  1. bin/mysql_ldb --db=data/.rocksdb scan --hex
  2. Failed: IO error: While opening a file for sequentially reading: data/.rocksdb/MANIFEST-000059

mysql_ldb 通过 repair 选项可以恢复出MANIFEST

  1. bin/mysql_ldb repair --db=data/.rocksdb scan
  2. RocksDB version: 5.9.0
  3. Git sha rocksdb_build_git_sha:7c8c83458365f8b359cae13785d15b0bdc9df380
  4. Compile date Dec 16 2017
  5. DB SUMMARY
  6. CURRENT file: CURRENT
  7. IDENTITY file: IDENTITY
  8. SST files in data/.rocksdb dir, Total Num: 7, files: 000030.sst 000039.sst 000041.sst 000047.sst 000054.sst 000058.sst 000064.sst
  9. Write Ahead Log file in data/.rocksdb: 000063.log size: 19 ;
  10. .......... //省略部分配置信息
  11. [WARN] [/home/zhangyuan.zy/git/rds6/rocksdb/db/repair.cc:209] **** Repaired rocksdb data/.rocksdb; recovered 7 files; 5847bytes. Some data may have been lost. ****
  12. OK

repair 完成后数据可以查看

  1. bin/mysql_ldb --db=data/.rocksdb scan --hex
  2. 0x0000010080000001 : 0x00650000000141
  3. 0x0000010080000002 : 0x00660000000142
  4. 0x0000010080000003 : 0x00670000000143
  5. 0x0000010080000004 : 0x00680000000144
  6. 0x0000010080000005 : 0x00690000000145
  7. 0x0000010080000006 : 0x006A0000000146
  8. 0x00000101018000006580000001 : 0x
  9. 0x00000101018000006680000002 : 0x
  10. 0x00000101018000006780000003 : 0x
  11. 0x00000101018000006880000004 : 0x
  12. 0x00000101018000006980000005 : 0x
  13. 0x00000101018000006A80000006 : 0x

同时生成了新的MANIFEST文件

  1. bin/mysql_ldb manifest_dump --path=data/.rocksdb/MANIFEST-000003
  2. --------------- Column family "default" (ID 0) --------------
  3. log number: 0
  4. comparator: RocksDB_SE_v3.10
  5. --- level 0 --- version# 2 ---
  6. 58:740['
  7. --------------- Column family "__system__" (ID 1) --------------
  8. log number: 0
  9. comparator: RocksDB_SE_v3.10
  10. --- level 0 --- version# 3 ---
  11. 64:773['
  12. next_file_number 66 last_sequence 36 prev_log_number 0 max_column_family 1

最后

本文简单介绍了MyRocks相关工具sst_dump和mysql_ldb的基本用法,希望对大家有所帮助。