Garbage Collection

Badger values need to be garbage collected, because of two reasons:

  • Badger keeps values separately from the LSM tree. This means that the compaction operations that clean up the LSM tree do not touch the values at all. Values need to be cleaned up separately.

  • Concurrent read/write transactions could leave behind multiple values for a single key, because they are stored with different versions. These could accumulate, and take up unneeded space beyond the time these older versions are needed.

Badger relies on the client to perform garbage collection at a time of their choosing. It provides the following method, which can be invoked at an appropriate time:

  • DB.RunValueLogGC(): This method is designed to do garbage collection while Badger is online. Along with randomly picking a file, it uses statistics generated by the LSM-tree compactions to pick files that are likely to lead to maximum space reclamation. It is recommended to be called during periods of low activity in your system, or periodically. One call would only result in removal of at max one log file. As an optimization, you could also immediately re-run it whenever it returns nil error (indicating a successful value log GC), as shown below.

    1. ticker := time.NewTicker(5 * time.Minute)
    2. defer ticker.Stop()
    3. for range ticker.C {
    4. again:
    5. err := db.RunValueLogGC(0.7)
    6. if err == nil {
    7. goto again
    8. }
    9. }
  • DB.PurgeOlderVersions(): This method is DEPRECATED since v1.5.0. Now, Badger’s LSM tree automatically discards older/invalid versions of keys.

Note The RunValueLogGC method would not garbage collect the latest value log.