These benchmarks measure RocksDB performance when data resides on flash storage. (The benchmarks on this page were generated in June 2020 with RocksDB 6.10.0 unless otherwise noted)

Setup

All of the benchmarks are run on the same AWS instance. Here are the details of the test setup:

  • Instance type: m5d.2xlarge 8 CPU, 32 GB Memory, 1 x 300 NVMe SSD.
  • Kernel version: Linux 4.14.177-139.253.amzn2.x86_64
  • File System: XFS with discard enabled

To understand the performance of the SSD card, we ran an fio test and observed 117K IOPS of 4KB reads (See Performance Benchmarks#fio test results for outputs).

All tests were executed against by executing benchmark.sh with the following parameters (unless otherwise specified): NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 For long-running tests, the tests were executed with a duration of 5400 seconds (DURATION=5400)

All other parameters used the default values, unless explicitly mentioned here. Tests were executed sequentially against the same database instance. The db_bench tool was generated via “make release”.

The following test sequence was executed:

Test 1. Bulk Load of keys in Random Order (benchmark.sh bulkload)

NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 benchmark.sh bulkload

Measure performance to load 900 million keys into the database. The keys are inserted in random order. The database is empty at the beginning of this benchmark run and gradually fills up. No data is being read when the data load is in progress.

Test 2. Random Write (benchmark.sh overwrite)

NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh overwrite

Measure performance to randomly overwrite keys into the database. The database was first created by the previous benchmark.

Test 3. Multi-threaded read and single-threaded write (benchmark.sh readwhilewriting)

NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh readwhilewriting

Measure performance to randomly read keys and ongoing updates to existing keys. The database from Test #2 was used as the starting point.

Test 4. Random Read (benchmark.sh randomread)

NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh randomread

Measure random read performance of a database.

The following shows results of these tests using various releases and parameters.

Scenario 1: RocksDB 6.10, Different Block Sizes

The test cases were executed with various block sizes. The Direct I/O (DIO) test was executed with an 8K block size. In the “RL” tests, a timed rate-limited operation was place before the reported operation. For example, between the “bulkload” and “overwrite” operations, a 30-minute “rate-limited overwrite (limited at 2MB/sec) was conducted. This timed operation was meant as a means to help guarantee any flush or other background operation happened before the “timed reported” operation, thereby creating more predicatability in the percentile perforamnce numbers.

Test Case 1 : benchmark.sh bulkload

8K: Complete bulkload in 4560 seconds 4K: Complete bulkload in 5215 seconds 16K: Complete bulkload in 3996 seconds DIO: Complete bulkload in 4547 seconds 8K RL: Complete bulkload in 4388 seconds

Blockops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99UptimeStall-timeStall%du -s - k
8K924468370.30.2157.1157.11.0167.51.10.50.824111996000:03:45.19323.5101411592
4K853217341.80.2165.3165.31.0165.91.20.50.8241159102000:04:41.46527.6108748512
16K1027567411.60.1149.0149.01.0181.61.00.50.823102184000:02:23.60017.199070240
DIO921342369.00.2156.6156.61.0167.01.10.50.824110496000:03:27.28021.6101412440
8K RL989786396.50.2159.4159.41.0179.51.00.50.824104390900:02:41.51417.8101406496

Test Case 2 : benchmark.sh overwrite

Blockops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99Stall-timeStall%du -s -k
8K8575634.30.1161.4739.94.5142.2373.19.7274.15613256204772600:20:18.38822.9159903832
4K7985632.00.2166.0716.94.3136.3400.79.7268.95914253944729600:25:37.18328.5168094916
16K9367837.50.1174.4825.04.7156.8341.69.4279.24453247964703800:16:24.87818.3155953232
DIO8565534.30.1163.9734.94.4140.7373.69.7263.16250258074767800:18:51.14521.2159470752
8K RL8554234.30.1161.2757.84.7143.6748.1340.5735.8118523085159137540100:08:18.3599.2

Test Case 3 : benchmark.sh readwhilewriting

Blockops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99Stall-timeStall%du -s -k
8K8928528.00.14.2199.647.537.9358.4281.1427.9293575871902900:13:7.32514.6139287936
4K11675936.20.13.6203.856.638.9274.1224.4328.0253461311367800:20:58.78923.5147504716
16K6439320.40.14.1194.047.336.8496.9402.3642.734887251888000:10:58.90612.2138132068
DIO9869830.90.13.9197.450.637.6324.2257.7353.7276465831374200:16:47.97918.8139319040
8K RL10159831.90.13.297.230.318.4629.9587.5805.93922688119699540200:00:0.0540.0

Test Case 4 : benchmark.sh randomread

Blockops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99du =s -k
8K10164732.00.10.03.90.7314.8410.7498.876112473092139119060
4K13084640.70.10.01.00.1244.6291.7347.56638652626147417776
16K7088422.60.10.01.30.2451.4547.5715.0103913972598138040824
DIO14473745.50.10.10.77.0.1221.1239.8320.95788662133139239620
8K RL10579033.40.10.00.00605.0683.0807.91579313361525403139681920

Scenario 2: RocksDB 6.10, 2K Value size, 100M Keys.

The test cases were executed with the default block size and a value size of 2K. Only 100M keys were written to the database. Complete bulkload in 2018 seconds

Testops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99UptimeStall-timeStall%du -s -k
bulkload272448537.30.185.385.31.0242.63.70.71.131105128536000:03:52.67964.657285876
overwrite2294045.20.1229.3879.43.8169.01394.9212.9350.7760326151160352532801:06:21.97774.7110458852
readwhilewriting87093154.20.15.4162.630.131.0367.4369.2491.92209630213544536000:00:1.1600.092081776
readrandom95666169.90.10.00.000334.5411.1498.774212142789535800:00:0.0000.092092164

Scenario 3: Different Versions of RocksDB

These tests were executed against different versions of RocksDB, by checking out the corresponding branch and doing a “make release”.

Test Case 1 : NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 benchmark.sh bulkload

6.10.0: Complete bulkload in 4560 seconds 6.3.6: Complete bulkload in 4584 seconds 6.0.2: Complete bulkload in 4668 seconds

Versionops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99UptimeStall-timeStall%du -s -k
6.10.0924468370.30.2157.1157.11.0167.51.10.50.824111996000:03:45.19323.5101411592
6.3.6921714369.20.2156.7156.71.0167.11.10.50.824113396000:04:2.07025.2101437836
6.0.2933665374.00.2158.7158.71.0169.21.10.50.824110596000:03:31.62722.0101434096

Test Case 2 : NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh overwrite

Versionops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99Stall-timeStall%du -s -k
6.10.08575634.30.1161.4739.94.5142.2373.19.7274.15613256204772600:20:18.38822.9159903832
6.3.69232837.00.2174.0818.44.7155.4346.68.9263.84432245814675300:20:24.69722.7162288400
6.0.28676734.80.2164.8740.44.4141.4368.89.8294.75900.256234775500:17:6.88719.2162797372

Test Case 3 : NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh readwhilewriting

Versionops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99Stall-timeStall%du -s -k
6.10.08928528.00.14.2199.647.537.9358.4281.1427.9293575871902900:13:7.32514.6139287936
6.3.69018928.60.14.1213.151.940.6354.8288.1430.2278163571526800:13:58.83515.6141082740
6.0.29014028.30.14.1209.851.139.9355.0290.1445.1278963541595100:12:13.38413.6139700676

Test Case 4 : NUM_KEYS=900000000 NUM_THREADS=32 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh readrandom

Versionops/secmb/secSize-GBL0_GBSum_GBW-AmpW-MB/susec/opp50p75p99p99.9p99.99du -s -k
6.10.010164732.00.10.03.90.7314.8410.7498.876112473092139119060
6.3.610016831.80.10.00.90.1319.5411.3499.276912482787140911608
6.9.210102331.80.10.06.001.1316.8412.5499.776312393900139423196

Appedix

fio test results

  1. ]$ fio --randrepeat=1 --ioengine=sync --direct=1 --gtod_reduce=1 --name=test --filename=/data/test_file --bs=4k --iodepth=64 --size=4G --readwrite=randread --numjobs=32 --group_reporting
  2. test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=64
  3. ...
  4. fio-2.14
  5. Starting 32 processes
  6. Jobs: 3 (f=3): [_(3),r(1),_(1),E(1),_(10),r(1),_(13),r(1),E(1)] [100.0% done] [445.3MB/0KB/0KB /s] [114K/0/0 iops] [eta 00m:00s]
  7. test: (groupid=0, jobs=32): err= 0: pid=28042: Fri Jul 24 01:36:19 2020
  8. read : io=131072MB, bw=469326KB/s, iops=117331, runt=285980msec
  9. cpu : usr=1.29%, sys=3.26%, ctx=33585114, majf=0, minf=297
  10. IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  11. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  12. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  13. issued : total=r=33554432/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
  14. latency : target=0, window=0, percentile=100.00%, depth=64
  15. Run status group 0 (all jobs):
  16. READ: io=131072MB, aggrb=469325KB/s, minb=469325KB/s, maxb=469325KB/s, mint=285980msec, maxt=285980msec
  17. Disk stats (read/write):
  18. nvme1n1: ios=33654742/61713, merge=0/40, ticks=8723764/89064, in_queue=8788592, util=100.00%
  1. ]$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/data/test_file --bs=4k --iodepth=64 --size=4G --readwrite=randread
  2. test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
  3. fio-2.14
  4. Starting 1 process
  5. Jobs: 1 (f=1): [r(1)] [100.0% done] [456.3MB/0KB/0KB /s] [117K/0/0 iops] [eta 00m:00s]
  6. test: (groupid=0, jobs=1): err= 0: pid=28385: Fri Jul 24 01:36:56 2020
  7. read : io=4096.0MB, bw=547416KB/s, iops=136854, runt= 7662msec
  8. cpu : usr=22.20%, sys=48.81%, ctx=144112, majf=0, minf=73
  9. IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
  10. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  11. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
  12. issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
  13. latency : target=0, window=0, percentile=100.00%, depth=64
  14. Run status group 0 (all jobs):
  15. READ: io=4096.0MB, aggrb=547416KB/s, minb=547416KB/s, maxb=547416KB/s, mint=7662msec, maxt=7662msec
  16. Disk stats (read/write):
  17. nvme1n1: ios=1050868/1904, merge=0/1, ticks=374836/2900, in_queue=370532, util=98.70%

Previous Results