These benchmarks measure RocksDB performance when data resides on flash storage.

Setup

All of the benchmarks are run on the same AWS instance. Here are the details of the test setup:

  • Instance type: m5d.2xlarge 8 CPU, 32 GB Memory, 1 x 300 NVMe SSD.
  • Kernel version: Linux 4.14.177-139.253.amzn2.x86_64
  • File System: XFS with discard enabled

To understand the performance of the SSD card, we ran an fio test and observed 117K IOPS of 4KB reads (See Performance Benchmarks#fio test results for outputs).

All tests were executed against by executing benchmark.sh with the following parameters (unless otherwise specified):

  • NUM_KEYS=900000000
  • CACHE_SIZE=6442450944
  • For long-running tests, the tests were executed with a duration of 5400 seconds (DURATION=5400)

Unless explicitly specified, the remaining tests used default parameters. DIO tests were executed with the options --use_direct_io_for_flush_and_compaction --use_direct_reads.

All other parameters used the default values, unless explicitly mentioned here. Tests were executed sequentially against the same database instance. The db_bench tool was generated via make release.

The following tests were executed in sequence:

Test 1. Bulk Load of keys in Random Order (benchmark.sh bulkload)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 benchmark.sh bulkload

Measure performance to load 900 million keys into the database. The keys are inserted in random order. The database is empty at the beginning of this benchmark run and gradually fills up. No data is being read when the data load is in progress.

VersionOptsTimeops/secmb/secusec/opp50p75p99p99.9p99.99Stall-timeStall%du -s -k
7.2.2None40211003732402.01.00.50.8272200:00:52.5586.3101406408
7.2.2DIO39761021386409.11.00.50.8233200:00:41.2154.9101404476
7.1.1None39511028135411.81.00.50.8232100:00:42.5805.1101407124
7.1.1DIO39201046129419.01.00.50.8232000:00:33.0233.9101407876
7.0.3None39341040089416.61.00.50.8232200:01:02.3077.4101406288
7.0.3DIO38791060242424.70.90.50.8232100:00:50.5236.0101405820
6.29.1None38981045486418.81.00.50.8235500:01:17.8769.3101405948
6.29.1DIO38191065706426.90.90.50.8232500:01:09.4058.3101404236
6.29.0None38991047693419.61.00.50.82310800:01:25.63710.2101407032
6.29.0DIO38281061703425.30.90.50.8232100:00:56.2986.7101405356
6.28.0None39241050028420.61.00.50.8236000:01:17.2889.2101406260
6.28.0DIO38191072892429.70.90.50.8232900:01:01.6487.9101405916
6.27.0None38981052489421.60.90.50.8232200:01:07.7768.1101406796
6.27.0DIO38261066941427.40.90.50.8232100:00:58.3066.9101405580
6.26.0None38921043630418.01.00.50.8235400:01:17.2889.2101407528
6.26.0DIO38991060561424.80.90.50.8232200:01:04.5367.7101402764
6.25.0None39891032155413.41.00.50.82310200:01:23.78310.0101407140
6.25.0DIO38991048824420.11.00.50.8232200:01:04.7477.7101402764
6.24.0None39831025562410.81.00.50.8233200:01:12.2968.6101406524
6.24.0DIO38801052049421.41.00.50.8232200:01:05.8627.8101405064
6.23.0None41751015722406.81.00.50.8236900:01:17.5419.2101405292
6.23.0DIO38851055232422.70.90.50.8232100:00:52.3606.2101402116
6.22.1None41431013002405.81.00.50.82322400:01:26.03210.2101405804
6.22.1DIO40581031703413.21.00.50.82312500:01:23.0199.9101403424
6.21.2None41411017259407.50.90.50.82355600:01:32.27911.0101406320
6.15.5None40681045195418.61.00.50.81398000:02:08.22315.3101401808
6.10.4None40021062310425.50.90.50.813101300:02:24.65217.2101402936

Test 2. Random Read (benchmark.sh readrandom)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh readrandom

Measure performance to randomly read existing keys. The database after bulkload was used as the starting point.

VersionOptsops/secmb/secusec/opp50p75p99p99.9p99.99
7.2.2None13691534.7467.4615.5772.8127018012840
7.2.2DIO18923647.9338.2419.6539.1102216932297
7.1.1None14549036.8439.9599.7753.7125218092813
7.1.1DIO18924247.9338.2419.0539.1103716962294
7.0.3None14554036.8439.7599.8753.3125118032803
7.0.3DIO18924347.9338.2419.2539.2102916912246
6.29.1None14557736.9439.6606.3751.0120412922091
6.29.1DIO18924347.9338.2430.0540.98549691291
6.29.0None14559036.9439.6606.2751.0120412921936
6.29.0DIO18924147.9338.2430.0540.88549321289
6.28.0None14698037.2435.4604.3748.9119512911984
6.28.0DIO18923247.9338.2430.0540.98549911293
6.27.0None14692137.2435.6604.4748.8119412911980
6.27.0DIO18925047.9338.2430.1540.88549021287
6.26.0None12834132.5498.7639.6805.7127212982156
6.26.0DIO18924447.9338.2430.1540.88548941287
6.25.0None12851732.5498.0639.0804.6127212982220
6.25.0DIO18924547.9338.2430.1540.88548971289
6.24.0None13085233.1489.1632.6791.4126612972152
6.24.0DIO18924047.9338.2430.0540.78549301292
6.23.0None13766434.9464.9618.4766.5124412952557
6.23.0DIO18925247.9338.2430.0540.78549261296
6.22.1None13862335.1461.7616.8763.9123912952663
6.22.1DIO18923747.9338.2430.0540.78549601291
6.21.2None13863335.1461.6616.8764.1124012952461
6.15.5None13851335.1462.0616.9764.2124012953083
6.10.4None13849635.1462.1617.1764.3124012952484

Test 3. Multi-Random Read (benchmark.sh multireadrandom)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh multireadrandom —multiread_batched

Measure performance to randomly multi-get existing keys. The database after bulkload was used as the starting point.

VersionOptsops/secp50p75p99p99.9p99.99
7.2.2None1369284657.75774.99416987318001
7.2.2DIO1892163415.84064.2642265868602
7.1.1None1455484387.85568.68899983116943
7.1.1DIO1892133413.74064.2642265868630
7.0.3None1455874386.75567.18886982916789
7.0.3DIO1892303413.54063.8642265868590
6.29.1None1456524376.95549.28702981315243
6.29.1DIO1892333410.84048.6640665837498
6.29.0None1456604376.75549.18701981115305
6.29.0DIO1892313410.34048.0640665837310
6.28.0None1470224345.65523.98584980414594
6.28.0DIO1892283410.54048.4640665837679
6.27.0None1469894346.25524.08579980615833
6.27.0DIO1892273409.64047.2640565837332
6.26.0None1283664933.46093.69672988413845
6.26.0DIO1892293409.04046.8640565837282
6.25.0None1285234927.26087.89670988313727
6.25.0DIO1892413408.74046.6640465837525
6.24.0None1308594836.95995.19630988014169
6.24.0DIO1892343409.04047.2640665847996
6.23.0None1376384607.15736.49360986917172
6.23.0DIO1892373409.04047.0640665848125
6.22.1None138660461.64576.15706.392949867
6.22.1DIO189235338.23410.74047.764066583
6.21.2None138623461.74577.45707.292949866
6.15.5None138507462.04582.35710.292999867
6.10.4None138476462.14583.05710.992989864

Test 4. Range Scan (benchmark.sh fwdrange)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh fwdrange

Measure performance to randomly iterate over keys. The database after bulkload was used as the starting point.

VersionOptsops/secmb/secusec/opp50p75p99p99.9p99.99
7.2.2None70097280.8913.0791.91435.51892281110210
7.2.2DIO78828315.7811.9836.91093.2177126012894
7.1.1None74491298.4859.1775.31380.6188918998592
7.1.1DIO78831315.8811.8836.71093.1177125982982
7.0.3None74510298.4858.9775.21380.9188927868384
7.0.3DIO78832315.8811.8836.81093.1177126032895
6.29.1None74530298.5858.7775.81392.9188118997807
6.29.1DIO78830315.7811.8870.21090.8143418622668
6.29.0None74535298.5858.6775.71393.3188118997553
6.29.0DIO78832315.8811.8870.31090.3138818582620
6.28.0None75231301.3850.7773.71381.2188018998224
6.28.0DIO78828315.7811.9870.21090.8143818622655
6.27.0None75246301.4850.5773.21384.2188018998360
6.27.0DIO78829315.7811.9870.51090.2137318552513
6.26.0None65717263.2973.8808.21492.5188418997217
6.26.0DIO78831315.8811.8870.51090.2137018552512
6.25.0None65813263.6972.4807.71491.5188418997338
6.25.0DIO78833315.8811.8870.41090.2136918562610
6.24.0None67004268.4955.1802.81480.1188418997216
6.24.0DIO78832315.8811.8870.41090.2137618562582
6.23.0None70459282.2908.3789.61443.01883189910273
6.23.0DIO78829315.7811.9870.51090.1135618552596
6.22.1None70971284.3901.7787.81437.21882189910274
6.22.1DIO78829315.7811.9870.31090.5141118592618
6.21.2None70967284.3901.8787.81437.31882189910253
6.15.5None70978284.3901.7787.71437.5188218999890
6.10.4None70973284.3901.7787.61438.0188218999945

Test 4b. Reverse Range Scan (benchmark.sh revrange)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh revrange

Measure performance to randomly iterate over keys. The database after bulkload was used as the starting point.

VersionOptsops/secmb/secusec/opp50p75p99p99.9p99.99
7.2.2None68785275.5930.4806.01467.21892285912052
7.2.2DIO76200305.2839.9897.51114.2177626172898
7.1.1None73116292.9875.3788.11399.91889285313338
7.1.1DIO76202305.2839.8897.11114.1177826313022
7.0.3None73149293.0874.9788.01399.41889285313524
7.0.3DIO76202305.2839.8897.41114.0177626323173
6.29.1None73167293.1874.7788.91406.91882190012818
6.29.1DIO76204305.2839.8910.21112.1156218742764
6.29.0None73170293.1874.6788.61409.11882189912688
6.29.0DIO76202305.2839.8910.21111.5152418702722
6.28.0None73839295.8866.7786.51391.81881190013492
6.28.0DIO76205305.2839.8910.01112.1156018732715
6.27.0None73861295.8866.5786.01396.41881189913488
6.27.0DIO76204305.2839.8910.31111.3151018692718

Test 5. Overwrite (benchmark.sh overwrite)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 DURATION=5400 benchmark.sh overwrite

Measure performance to randomly overwrite keys into the database. The database was first created by the previous benchmark.

VersionOptsops/secmb/secW-AmpW-MB/susec/opp50p75p99p99.9p99.99Stall-timeStall%du -s -k
7.2.2None8661734.79.5149.7738.9449.7777.610479300055832800:04:43.1885.3158540048
7.2.2DIO8683934.89.4154.6737.0460.7775.49534291495427800:03:00.1023.4159135832
7.1.1None9020336.19.4155.4709.5418.5746.010469300155838000:04:45.5495.3160992944
7.1.1DIO8859035.59.6154.6722.4440.2754.09538293135549400:04:15.4534.8158372164
7.0.3None9098536.49.4155.8703.4418.2743.910156299875778800:04:48.0495.3161110716
7.0.3DIO8968635.99.5154.1713.6439.3752.39377209215350500:03:28.7963.9160356720
6.29.1None9071136.39.4155.1705.5418.2740.010213297795710000:05:28.8486.1161099792
6.29.1DIO8946935.89.5154.4715.3431.6748.09568291435432400:04:06.1064.6159661172
6.29.0None8937335.89.6155.3716.1434.1756.110447298915795200:04:21.6224.9158912856
6.29.0DIO8851735.59.5152.6723.0455.1759.19219286945175000:03:31.2353.9160258772
6.28.0None8979136.09.4153.7712.4430.9751.910292298395873700:04:15.2764.8161859856
6.28.0DIO8810835.39.5152.4726.4449.4763.59508289175412200:03:25.7193.8159865440
6.27.0None8981536.09.5154.1712.6427.7749.410533296605761500:04:30.3995.0160273772
6.27.0DIO8844035.49.4151.6723.6455.0761.29383287645284400:03:20.9773.7159572484
6.26.0None9034036.29.4153.6708.4430.2742.510198296925563500:04:54.1935.5161202432
6.26.0DIO8840135.49.6154.5724.0446.0754.69418289115252600:03:50.4284.3158469672
6.25.0None8956735.99.4155.2714.5419.4742.710327299525995700:05:52.3356.5160392244
6.25.0DIO8854935.59.5153.6722.7433.9743.69483290645410900:05:00.7285.6158500488
6.24.0None9082936.44.7155.2704.6397.1726.410359299685816000:07:01.8497.9160757048
6.24.0DIO9010536.14.8153.7710.3421.8736.99344288695267600:05:22.1286.0160833572
6.23.0None8905235.74.7151.3718.7442.5758.810263297635387400:04:40.4295.2160633196
6.23.0DIO8862435.54.9152.4722.1441.5749.09319288875379200:04:53.7835.5158994508
6.22.1None9158636.74.7155.0698.8380.5709.410140298875824400:08:29.1539.5161321740
6.22.1DIO9031036.24.8154.7708.7419.0730.19227288165551300:06:22.7907.1160400436
6.21.2None9177636.84.7155.6697.3379.9708.710055297825594200:08:24.8829.4162082088
6.15.5None9291137.24.7158.4688.8351.9697.710031298945833300:08:43.3349.7161156844
6.10.4None9453937.94.7161.9676.9328.4700.410022298435654800:07:11.2268.0162965216

Test 6. Multi-threaded read and single-threaded write (benchmark.sh readwhilewriting)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 DURATION=5400 MB_WRITE_PER_SEC=2 benchmark.sh readwhilewriting

Measure performance with one writer and multiple reader threads. The writes are rate limited.

VersionOptsops/secmb/secW-AmpW-MB/susec/opp50p75p99p99.9p99.99du -s -k
7.2.2None9824031.118.111.4651.4600.6829.83963604110139140646588
7.2.2DIO14328345.317.17.3446.7394.8539.8282043156393140470436
7.1.1None10205632.516.910.6627.1584.8803.2393160319844141627716
7.1.1DIO14295845.317.97.6447.7395.6540.1281943166405140849884
7.0.3None10194832.517.010.7627.7585.6803.6393160289824141767112
7.0.3DIO14292345.418.27.8447.8393.0539.4282543226414141164436
6.29.1None10044531.828.318.2637.1593.0810.33906652418544140795968
6.29.1DIO14179944.831.514.2451.3397.1541.4279247449095140017864
6.29.0None10085331.927.717.7634.6592.9810.83893648017827140416272
6.29.0DIO14194744.832.614.4450.9397.4542.1278647919273139972676
6.28.0None10123332.028.318.3632.2591.6807.03892653019073140616192
6.28.0DIO14185444.735.015.2451.2394.1541.4279650009696139803484
6.27.0None10137532.127.717.8631.3587.9805.03893647718216140673616
6.27.0DIO14246044.931.214.1449.2394.7539.6278946859127139867840
6.26.0None9187929.127.719.0696.5630.8904.64010642413872140615968
6.26.0DIO14214844.831.814.4450.2394.7540.0279346978939139826380
6.25.0None9173629.028.720.0697.6630.8906.24019641813775140615968
6.25.0DIO14161844.733.014.9451.9394.4540.8280048259113140031428
6.24.0None9297429.527.619.0688.3624.8869.74010643614558140384360
6.24.0DIO14149144.732.715.0452.3395.8540.8280248679311140255568
6.23.0None9681130.629.118.9661.1607.3835.33966643313513140384360
6.23.0DIO14241044.929.613.5449.4394.0539.3278945988989139961824
6.22.1None9681230.728.418.5661.1606.5832.83958650015777140972560
6.22.1DIO14063544.532.514.9455.1400.4543.4280450519348140465744
6.21.2None9689130.729.118.9660.5607.1833.43961646513669141208940
6.15.5None9622330.628.018.7665.1609.4835.13965647515613141339712
6.10.4None9564930.530.219.7669.1608.1834.53999659717861141636760

Test 7. Multi-threaded scan and single-threaded write (benchmark.sh fwdrangewhilewriting)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 DURATION=5400 MB_WRITE_PER_SEC=2 benchmark.sh fwdrangewhilewriting

Measure performance with one writer and multiple iterator threads. The writes are rate limited.

VersionOptsops/secmb/secW-AmpW-MB/susec/opp50p75p99p99.9p99.99du -s -k
7.2.2None40675162.917.47.41573.41374.61855.162931343424996141346104
7.2.2DIO35619142.718.37.51796.61540.72171.86533969813325140957044
7.1.1None42202169.016.57.31516.41322.21821.461681309825099142336676
7.1.1DIO35535142.317.97.31800.81544.52175.76527969113298141591172
7.0.3None42436170.017.37.81508.01310.01815.561981322624937142579812
7.0.3DIO35702143.018.97.81792.51535.02165.46531970213343141636716
6.29.1None43138172.817.67.81483.51294.31804.660891302625065141561940
6.29.1DIO36460146.016.57.11755.21517.82128.96381957212979140761644
6.29.0None42806171.517.07.61495.01311.01813.261011310825308140416272
6.29.0DIO36418145.917.47.41757.21522.12124.56404962413210140619752
6.28.0None43564174.517.37.71469.01282.21794.660551292624865141241492
6.28.0DIO36230145.117.77.51766.31527.92142.46439965313251140537532
6.27.0None43229173.218.38.31480.41290.31802.661231314024987141261580
6.27.0DIO35860143.616.97.41784.51540.32181.76422960313041140542060
6.26.0None36960148.017.28.21731.41534.22217.964771323921533141557396
6.26.0DIO35961144.016.57.31779.51536.82174.26415960313055140627180
6.25.0None37344149.617.98.41713.71513.52164.764891340921654141269488
6.25.0DIO36023144.317.97.81776.51532.52162.06458967213308140685060
6.24.0None38940156.017.78.11643.41445.81970.864111348021757141724476
6.24.0DIO35955144.017.17.51779.81534.82173.06427961513093140989196
6.23.0None41322165.516.77.61584.71359.91838.462251328524338141101776
6.23.0DIO35968144.117.07.41779.21536.82167.66446964813218140731428
6.22.1None41244165.216.77.61551.61362.31845.862341334624988141716340
6.22.1DIO35962144.017.07.41779.51538.82150.66455966113264141008104
6.21.2None41360165.718.28.01547.21354.11840.562571343425280141820100
6.15.5None42197169.017.58.01516.61315.51817.861851301823081142157224
6.10.4None41827167.517.68.01530.01329.21826.262121312923244142497356

Test 7b. Multi-threaded scan and single-threaded write (benchmark.sh revrangewhilewriting)

NUM_KEYS=900000000 CACHE_SIZE=6442450944 DURATION=5400 MB_WRITE_PER_SEC=2 benchmark.sh revrangewhilewriting

Measure performance with one writer and multiple iterator threads. The writes are rate limited.

VersionOptsops/secmb/secW-AmpW-MB/susec/opp50p75p99p99.9p99.99du -s -k
7.2.2None33680134.917.37.51900.11668.22417.172071660529880142066536
7.2.2DIO31215125.016.46.92050.01755.32528.076371017813981141817860
7.1.1None34825139.517.47.71837.61623.92360.867421656930135142975980
7.1.1DIO31259125.217.47.32047.21744.12520.776731020513980142349268
7.0.3None35015140.317.57.81827.61614.42345.466951654029947143211480
7.0.3DIO31155124.815.97.02054.01753.52529.07627997013948142568752
6.29.1None35535142.317.67.81800.81598.92320.165721654730132142191812
6.29.1DIO31839127.516.87.32009.91731.92489.07184988213917141520096
6.29.0None35676142.917.37.81793.81592.12307.665691671130812141867556
6.29.0DIO31896127.817.87.82006.31728.12483.773201001713938141162940
6.28.0None35882143.716.67.51783.41588.02292.465341640530385142078076
6.28.0DIO31855127.616.57.32008.91727.12485.272871001214000141298660
6.27.0None35594142.617.07.71797.91596.82315.065661641829917142075232
6.27.0DIO32086128.517.07.41994.41717.52470.27116986513867141261580

Appendix

fio test results

  1. ]$ fio --randrepeat=1 --ioengine=sync --direct=1 --gtod_reduce=1 --name=test --filename=/data/test_file --bs=4k --iodepth=64 --size=4G --readwrite=randread --numjobs=32 --group_reporting
  2. test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=64
  3. ...
  4. fio-2.14
  5. Starting 32 processes
  6. Jobs: 3 (f=3): [_(3),r(1),_(1),E(1),_(10),r(1),_(13),r(1),E(1)] [100.0% done] [445.3MB/0KB/0KB /s] [114K/0/0 iops] [eta 00m:00s]
  7. test: (groupid=0, jobs=32): err= 0: pid=28042: Fri Jul 24 01:36:19 2020
  8. read : io=131072MB, bw=469326KB/s, iops=117331, runt=285980msec
  9. cpu : usr=1.29%, sys=3.26%, ctx=33585114, majf=0, minf=297
  10. IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  11. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  12. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  13. issued : total=r=33554432/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
  14. latency : target=0, window=0, percentile=100.00%, depth=64
  15. Run status group 0 (all jobs):
  16. READ: io=131072MB, aggrb=469325KB/s, minb=469325KB/s, maxb=469325KB/s, mint=285980msec, maxt=285980msec
  17. Disk stats (read/write):
  18. nvme1n1: ios=33654742/61713, merge=0/40, ticks=8723764/89064, in_queue=8788592, util=100.00%
  1. ]$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/data/test_file --bs=4k --iodepth=64 --size=4G --readwrite=randread
  2. test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
  3. fio-2.14
  4. Starting 1 process
  5. Jobs: 1 (f=1): [r(1)] [100.0% done] [456.3MB/0KB/0KB /s] [117K/0/0 iops] [eta 00m:00s]
  6. test: (groupid=0, jobs=1): err= 0: pid=28385: Fri Jul 24 01:36:56 2020
  7. read : io=4096.0MB, bw=547416KB/s, iops=136854, runt= 7662msec
  8. cpu : usr=22.20%, sys=48.81%, ctx=144112, majf=0, minf=73
  9. IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
  10. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  11. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
  12. issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
  13. latency : target=0, window=0, percentile=100.00%, depth=64
  14. Run status group 0 (all jobs):
  15. READ: io=4096.0MB, aggrb=547416KB/s, minb=547416KB/s, maxb=547416KB/s, mint=7662msec, maxt=7662msec
  16. Disk stats (read/write):
  17. nvme1n1: ios=1050868/1904, merge=0/1, ticks=374836/2900, in_queue=370532, util=98.70%

Previous Results