A snapshot captures a point-in-time view of the DB at the time it’s created. Snapshots do not persist across DB restarts.

API Usage

  • Create a snapshot with the GetSnapshot() API.
  • Read from a snapshot by setting ReadOptions::snapshot.
  • When finished, release resources associated with the snapshot by calling ReleaseSnapshot().

Implementation

Flush/compaction

Representation

A snapshot is represented by a small object of SnapshotImpl class. It holds only a few primitive fields, like the seqnum at which the snapshot was taken.

Snapshots are stored in a linked list owned by DBImpl. One benefit is we can allocate the list node before acquiring the DB mutex. Then while holding the mutex, we only need to update list pointers. Additionally, ReleaseSnapshot() can be called on the snapshots in an arbitrary order. With linked list, we can remove a node from the middle without shifting.

Scalability

The main downside of linked list is it cannot be binary searched despite its ordering. During flush/compaction, we have to scan the snapshot list when we need to find out the earliest snapshot to which a key is visible. When there are many snapshots, this scan can significantly slow down flush/compaction to the point of causing write stalls. We’ve noticed problems when snapshot count is in the hundreds of thousands.