Basics and Terminology

Documents in ArangoDB are JSON objects. These objects can be nested (toany depth) and may contain lists. Each document has a unique primary key which identifies it within its collection. Furthermore, each document is uniquely identifiedby its document handle across all collections in the same database. Different revisions ofthe same document (identified by its handle) can be distinguished by their document revision.Any transaction only ever sees a single revision of a document.For example:

  1. {
  2. "_id" : "myusers/3456789",
  3. "_key" : "3456789",
  4. "_rev" : "14253647",
  5. "firstName" : "John",
  6. "lastName" : "Doe",
  7. "address" : {
  8. "street" : "Road To Nowhere 1",
  9. "city" : "Gotham"
  10. },
  11. "hobbies" : [
  12. {"name": "swimming", "howFavorite": 10},
  13. {"name": "biking", "howFavorite": 6},
  14. {"name": "programming", "howFavorite": 4}
  15. ]
  16. }

All documents contain special attributes: the document handle is storedas a string in _id, thedocument’s primary key in _key and the document revision in_rev. The value of the _key attribute can be specified by the user whencreating a document. _id and _key values are immutable once the documenthas been created. The _rev value is maintained by ArangoDB automatically.

Document Handle

A document handle uniquely identifies a document in the database. Itis a string and consists of the collection’s name and the document key(_key attribute) separated by /.

Document Key

A document key uniquely identifies a document in the collection it isstored in. It can and should be used by clients when specific documentsare queried. The document key is stored in the _key attribute ofeach document. The key values are automatically indexed by ArangoDB ina collection’s primary index. Thus looking up a document by itskey is a fast operation. The _key value of a document isimmutable once the document has been created. By default, ArangoDB willauto-generate a document key if no _key attribute is specified, and usethe user-specified _key otherwise. The generated _key is guaranteed tobe unique in the collection it was generated for. This also applies tosharded collections in a cluster. It can’t be guaranteed that the _key isunique within a database or across a whole node or instance however.

This behavior can be changed on a per-collection level by creatingcollections with the keyOptions attribute.

Using keyOptions it is possible to disallow user-specified keyscompletely, or to force a specific regime for auto-generating the _keyvalues.

Document Revision

As ArangoDB supports MVCC (Multiple Version Concurrency Control),documents can exist in more than onerevision. The document revision is the MVCC token used to specify a particular revision of a document (identified by its _id). It is a string value that contained (up to ArangoDB 3.0)an integer number and is unique within the list of documentrevisions for a single document. In ArangoDB >= 3.1 the _rev stringsare in fact time stamps. They use the local clock of the DBserver thatactually writes the document and have millisecond accuracy. Actually, a “Hybrid Logical Clock” is used (forthis concept see this paper).

Within one shard it is guaranteed that two different document revisionshave a different _rev string, even if they are written in the samemillisecond, and that these stamps are ascending.

Note however that different servers in your cluster might have a clockskew, and therefore between different shards or even between differentcollections the time stamps are not guaranteed to be comparable.

The Hybrid Logical Clock feature does one thing to address thisissue: Whenever a message is sent from some server A in your cluster toanother one B, it is ensured that any timestamp taken on B after themessage has arrived is greater than any timestamp taken on A before themessage was sent. This ensures that if there is some “causality” betweenevents on different servers, time stamps increase from cause to effect.A direct consequence of this is that sometimes a server has to taketimestamps that seem to come from the future of its own clock. It willhowever still produce ever increasing timestamps. If the clock skew issmall, then your timestamps will relatively accurately describe the timewhen the document revision was actually written.

ArangoDB uses 64bit unsigned integer values to maintaindocument revisions internally. At this stage we intentionally do notdocument the exact format of the revision values. When returning document revisions toclients, ArangoDB will put them into a string to ensure the revisionis not clipped by clients that do not support big integers. Clientsshould treat the revision returned by ArangoDB as an opaque stringwhen they store or use it locally. This will allow ArangoDB to changethe format of revisions later if this should be required (as has happenedwith 3.1 with the Hybrid Logical Clock). Clients canuse revisions to perform simple equality/non-equality comparisons(e.g. to check whether a document has changed or not), but they shouldnot use revision ids to perform greater/less than comparisons with themto check if a document revision is older than one another, even if thismight work for some cases.

Document revisions can be used toconditionally query, update, replace or delete documents in the database. Inorder to find a particular revision of a document, you need the documenthandle or key, and the document revision.

Multiple Documents in a single Command

Beginning with ArangoDB 3.0 the basic document API has been extendedto handle not only single documents but multiple documents in a singlecommand. This is crucial for performance, in particular in the clustersituation, in which a single request can involve multiple network hopswithin the cluster. Another advantage is that it reduces the overhead ofindividual network round trips between the clientand the server. The general idea to perform multiple document operations in a single command is to use JSON arrays of objects in the place of a single document. As a consequence, document keys, handles and revisionsfor preconditions have to be supplied embedded in the individual documentsgiven. Multiple document operations are restricted to a single documentor edge collection. See the API descriptions for collection objects for details. Note that the API for database objectsdo not offer these operations.