6.2. Performance

With up to tens of thousands of documents you will generally find CouchDB toperform well no matter how you write your code. Once you start getting intothe millions of documents you need to be a lot more careful.

6.2.1. Disk I/O

6.2.1.1. File Size

The smaller your file size, the less I/O operations there will be,the more of the file can be cached by CouchDB and the operating system,the quicker it is to replicate, backup etc. Consequently you should carefullyexamine the data you are storing. For example it would be silly to use keysthat are hundreds of characters long, but your program would be hard tomaintain if you only used single character keys. Carefully consider datathat is duplicated by putting it in views.

6.2.1.2. Disk and File System Performance

Using faster disks, striped RAID arrays and modern file systems can all speedup your CouchDB deployment. However, there is one option that can increasethe responsiveness of your CouchDB server when disk performance is abottleneck. From the Erlang documentation for the file module:

On operating systems with thread support, it is possible to let fileoperations be performed in threads of their own, allowing other Erlangprocesses to continue executing in parallel with the file operations.See the command line flag +A in erl(1).

Setting this argument to a number greater than zero can keep your CouchDBinstallation responsive even during periods of heavy disk utilization. Theeasiest way to set this option is through the ERL_FLAGS environmentvariable. For example, to give Erlang four threads with which to perform I/Ooperations add the following to (prefix)/etc/defaults/couchdb(or equivalent):

  1. export ERL_FLAGS="+A 4"

6.2.2. System Resource Limits

One of the problems that administrators run into as their deployments becomelarge are resource limits imposed by the system and by the applicationconfiguration. Raising these limits can allow your deployment to grow beyondwhat the default configuration will support.

6.2.2.1. CouchDB Configuration Options

6.2.2.1.1. delayed_commits

The delayed commits allows toachieve better write performance for some workloads while sacrificing a smallamount of durability. The setting causes CouchDB to wait up to a full secondbefore committing new data after an update. If the server crashes beforethe header is written then any writes since the last commit are lost. Keep thisoption enabled on your own risk.

6.2.2.1.2. max_dbs_open

In your configuration (local.ini or similar) familiarizeyourself with the couchdb/max_dbs_open:

  1. [couchdb]
  2. max_dbs_open = 100

This option places an upper bound on the number of databases that can beopen at one time. CouchDB reference counts database accesses internally andwill close idle databases when it must. Sometimes it is necessary to keepmore than the default open at once, such as in deployments where many databaseswill be continuously replicating.

6.2.2.2. Erlang

Even if you’ve increased the maximum connections CouchDB will allow,the Erlang runtime system will not allow more than 1024 connections bydefault. Adding the following directive to (prefix)/etc/default/couchdb (orequivalent) will increase this limit (in this case to 4096):

  1. export ERL_MAX_PORTS=4096

CouchDB versions up to 1.1.x also create Erlang Term Storage (ETS) tables foreach replication. If you are using a version of CouchDB older than 1.2 andmust support many replications, also set the ERL_MAX_ETS_TABLES variable.The default is approximately 1400 tables.

Note that on Mac OS X, Erlang will not actually increase the file descriptorlimit past 1024 (i.e. the system header–defined value of FD_SETSIZE). Seethis tip for a possible workaround and this thread for a deeperexplanation.

6.2.2.3. Maximum open file descriptors (ulimit)

Most *nix operating systems impose various resource limits on every process.The method of increasing these limits varies, depending on your init system andparticular OS release. The default value for many OSes is 1024 or 4096. On asystem with many databases or many views, CouchDB can very rapidly hit thislimit.

If your system is set up to use the Pluggable Authentication Modules (PAM)system (as is the case with nearly all modern Linuxes), increasing this limitis straightforward. For example, creating a file named/etc/security/limits.d/100-couchdb.conf with the following contents willensure that CouchDB can open up to 10000 file descriptors at once:

  1. #<domain> <type> <item> <value>
  2. couchdb hard nofile 10000
  3. couchdb soft nofile 10000

If you are using our Debian/Ubuntu sysvinit script (/etc/init.d/couchdb),you also need to raise the limits for the root user:

  1. #<domain> <type> <item> <value>
  2. root hard nofile 10000
  3. root soft nofile 10000

You may also have to edit the /etc/pam.d/common-session and/etc/pam.d/common-session-noninteractive files to add the line:

  1. session required pam_limits.so

if it is not already present.

For systemd-based Linuxes (such as CentOS/RHEL 7, Ubuntu 16.04+, Debian 8or newer), assuming you are launching CouchDB from systemd, you must alsooverride the upper limit by creating the file/etc/systemd/system/<servicename>.d/override.conf with the followingcontent:

  1. [Service]
  2. LimitNOFILE=#######

and replacing the ####### with the upper limit of file descriptors CouchDBis allowed to hold open at once.

If your system does not use PAM, a ulimit command is usually available foruse in a custom script to launch CouchDB with increased resource limits.Typical syntax would be something like ulimit -n 10000.

In general, modern UNIX-like systems can handle very large numbers of filehandles per process (e.g. 100000) without problem. Don’t be afraid to increasethis limit on your system.

6.2.3. Network

There is latency overhead making and receiving each request/response.In general you should do your requests in batches. Most APIs have somemechanism to do batches, usually by supplying lists of documents or keys inthe request body. Be careful what size you pick for the batches. The largerbatch requires more time your client has to spend encoding the items into JSONand more time is spent decoding that number of responses. Do some benchmarkingwith your own configuration and typical data to find the sweet spot.It is likely to be between one and ten thousand documents.

If you have a fast I/O system then you can also use concurrency - havemultiple requests/responses at the same time. This mitigates the latencyinvolved in assembling JSON, doing the networking and decoding JSON.

As of CouchDB 1.1.0, users often report lower write performance of documentscompared to older releases. The main reason is that this release ships withthe more recent version of the HTTP server library MochiWeb, which by defaultsets the TCP socket option SO_NODELAY to false. This means that small datasent to the TCP socket, like the reply to a document write request (or readinga very small document), will not be sent immediately to the network - TCP willbuffer it for a while hoping that it will be asked to send more data throughthe same socket and then send all the data at once for increased performance.This TCP buffering behaviour can be disabled viahttpd/socket_options:

  1. [httpd]
  2. socket_options = [{nodelay, true}]

See also

Bulk load and store API.

6.2.3.1. Connection limit

MochiWeb handles CouchDB requests.The default maximum number of connections is 2048. To change this limit, use theserver_options configuration variable. max indicates maximum number ofconnections.

  1. [chttpd]
  2. server_options = [{backlog, 128}, {acceptor_pool_size, 16}, {max, 4096}]

6.2.4. CouchDB

6.2.4.1. DELETE operation

When you DELETE a document the database will create a newrevision which contains the id and _rev fields as well asthe __deleted flag. This revision will remain even after a databasecompaction so that the deletion can be replicated. Deleted documents, likenon-deleted documents, can affect view build times, PUT andDELETE request times, and the size of the database since theyincrease the size of the B+Tree. You can see the number of deleted documentsin database information. If your use case creates lots ofdeleted documents (for example, if you are storing short-term data like logentries, message queues, etc), you might want to periodically switch to a newdatabase and delete the old one (once the entries in it have all expired).

6.2.4.2. Document’s ID

The db file size is derived from your document and view sizes but also on amultiple of your _id sizes. Not only is the _id present in the document,but it and parts of it are duplicated in the binary tree structure CouchDB usesto navigate the file to find the document in the first place. As a real worldexample for one user switching from 16 byte ids to 4 byte ids made a databasego from 21GB to 4GB with 10 million documents (the raw JSON text when from2.5GB to 2GB).

Inserting with sequential (and at least sorted) ids is faster than random ids.Consequently you should consider generating ids yourself, allocating themsequentially and using an encoding scheme that consumes fewer bytes.For example, something that takes 16 hex digits to represent can be done in4 base 62 digits (10 numerals, 26 lower case, 26 upper case).

6.2.5. Views

6.2.5.1. Views Generation

Views with the JavaScript query server are extremely slow to generate whenthere are a non-trivial number of documents to process. The generation processwon’t even saturate a single CPU let alone your I/O. The cause is the latencyinvolved in the CouchDB server and separate couchjs query server, dramaticallyindicating how important it is to take latency out of your implementation.

You can let view access be “stale” but it isn’t practical to determine whenthat will occur giving you a quick response and when views will be updatedwhich will take a long time. (A 10 million document database took about 10minutes to load into CouchDB but about 4 hours to do view generation).

In a cluster, “stale” requests are serviced by a fixed set of shards in orderto present users with consistent results between requests. This comes with anavailability trade-off - the fixed set of shards might not be the mostresponsive / available within the cluster. If you don’t need this kind ofconsistency (e.g. your indexes are relatively static), you can tell CouchDB touse any available replica by specifying stable=false&update=false instead ofstale=ok, or stable=false&update=lazy instead of stale=update_after.

View information isn’t replicated - it is rebuilt on each database so youcan’t do the view generation on a separate sever.

6.2.5.2. Built-In Reduce Functions

If you’re using a very simple view function that only performs a sum or countreduction, you can call native Erlang implementations of them by simplywriting _sum or _count in place of your function declaration.This will speed up things dramatically, as it cuts down on IO between CouchDBand the JavaScript query server. For example, as1B20CF0BA4C8@julianstahnke.com%3e">mentioned on the mailing list, the time for outputting an (already indexedand cached) view with about 78,000 items went down from 60 seconds to 4 seconds.

Before:

  1. {
  2. "_id": "_design/foo",
  3. "views": {
  4. "bar": {
  5. "map": "function (doc) { emit(doc.author, 1); }",
  6. "reduce": "function (keys, values, rereduce) { return sum(values); }"
  7. }
  8. }
  9. }

After:

  1. {
  2. "_id": "_design/foo",
  3. "views": {
  4. "bar": {
  5. "map": "function (doc) { emit(doc.author, 1); }",
  6. "reduce": "_sum"
  7. }
  8. }
  9. }

See also

Built-in Reduce Functions

原文: http://docs.couchdb.org/en/stable/maintenance/performance.html