Background and motivations
SST integrity: In current RocksDB, we calculate the checksum for block (e.g., data block) before they are flushed to file system and store the checksum in block trailer. When reading the blocks, the checksum is verified. It ensures the correctness of data block. However, to better protect the data in RocksDB, checksum for each SST file is needed, especially when the SST files are stored remotely or the SST file are moved or copied. File might be corrupted during the transmission or when it is stored in the storage.
SST identity: If a wrong SST file is transferred to a RocksDB SST file directory, all block checksum will match, but it doesn’t contain the data we want. This usually can be caught by file name and file size mismatch because the chance that two different SST files share the same size is very small, but it may not be a good assumption to make. A full file checksum
SST file checksum can be used when: 1) SST files are copied to other places (e.g., backup, move, or replicate); 2) SST files are stored remotely, 3) ingesting external SST files to RocksDB, 4) verify the SST file when the whole file is read in DB (e.g., compaction).
Design
- where to generate: SST file checksum is generated when a SST file is generated in RocksDB (1. flush Memtable 2. compaction) via writeable_file_writer.
- Flexibility
- options.file_checksum_gen_factory is for upper-layer applications to plugin a specific file checksum generator factory implementation. FileChecksumGenFactory creates a FileChecksumGenerator object for each SST file and it generates the file checksum for a certain file. The object IS NOT shared, so FileChecksumGenerator can store the intermediate data during checksum generating in the object and the implementation does not need to be thread safe.
- Provide a default checksum generator (FileChecksumGenCrc32c) and factory (FileChecksumGenCrc32cFactory) for SST files (based on Crc32c) such that user can easily use it if they do not have their own requirement.
- The checksum value is std::string, any other checksum value type such as uint32, int, uint64 can be easily converted to a string type. checksum function name is also a string.
- what should be stored
- the checksum value if self.
- the name of the checksum function: there are many different checksum functions. Therefore, the checksum value should be pair with its function name. Otherwise, either RocksDB or the application is not able to make meaningful checksum check.
- where to store the checksums
- we store the checksum function name and checksum value in vstorage as part of FileMetadata.
- we store the checksum function name and checksum value in MANIFEST for persistency
- Tools: Dump the checksum of all SST file from MANIFEST in a map (in ldb)
How to use
In order to enable the full file checksum, user needs to initialize the Options.file_checksum_gen_factory. For example:
Options options;
FileChecksumGenCrc32cFactory* file_checksum_gen_factory = new FileChecksumGenCrc32cFactory();
options.file_checksum_gen_factory.reset(file_checksum_gen_factory);
ImmutableCFOptions ioptions(options);
......
To implement a customized checksum generator factory, the application needs to implement a checksum generator. For example:
class FileChecksumGenCrc32c : public FileChecksumGenerator {
public:
FileChecksumGenCrc32c(const FileChecksumGenContext& /*context*/) {
checksum_ = 0;
}
void Update(const char* data, size_t n) override {
checksum_ = crc32c::Extend(checksum_, data, n);
}
void Finalize() override { checksum_str_ = Uint32ToString(checksum_); }
std::string GetChecksum() const override { return checksum_str_; }
const char* Name() const override { return "FileChecksumCrc32c"; }
private:
uint32_t checksum_;
std::string checksum_str_;
};
And also the checksum generator factory, for example:
class FileChecksumGenCrc32cFactory : public FileChecksumGenFactory {
public:
std::unique_ptr<FileChecksumGenerator> CreateFileChecksumGenerator(
const FileChecksumGenContext& context) override {
return std::unique_ptr<FileChecksumGenerator>(
new FileChecksumGenCrc32c(context));
}
const char* Name() const override { return "FileChecksumGenCrc32cFactory"; }
};
When sst_file_checksum_func is intialized (!=nullptr), RocksDB generate the checksum value when creating the SST file.
In the current stage, we do not provide a public db interface to list or get the checksum value and checksum function name. However, there are two ways that user can get the checksum.
- by calling
db->GetLiveFileMetadata(std::vector<LiveFileMetaData>)
, checksum value and checksum function name are included in the LiveFileMetadata. The checksum information is from vstorage in memory. - If the db is not running, or if user only has the Manifest file, we can use ldb tool to print a list of checksum with the file name. It will print a list of SST file wit checksum information as the following format:[file_number, checksum_function_name, checksum value]
./ldb --db=<db path> file_checksum_dump.
The Next Step
We plan to work on following:
- Take advantage of SST file checksum with backup engine.
- Work with some use cases to apply the full file checksum.
- Implement WAL file checksum and store them in manifest too.