A snapshot captures a point-in-time view of the DB at the time it’s created. Snapshots do not persist across DB restarts.
- Create a snapshot with the
- Read from a snapshot by setting
- When finished, release resources associated with the snapshot by calling
A snapshot is represented by a small object of
SnapshotImpl class. It holds only a few primitive fields, like the seqnum at which the snapshot was taken.
Snapshots are stored in a linked list owned by
DBImpl. One benefit is we can allocate the list node before acquiring the DB mutex. Then while holding the mutex, we only need to update list pointers. Additionally,
ReleaseSnapshot() can be called on the snapshots in an arbitrary order. With linked list, we can remove a node from the middle without shifting.
The main downside of linked list is it cannot be binary searched despite its ordering. During flush/compaction, we have to scan the snapshot list when we need to find out the earliest snapshot to which a key is visible. When there are many snapshots, this scan can significantly slow down flush/compaction to the point of causing write stalls. We’ve noticed problems when snapshot count is in the hundreds of thousands.