Getting started

Overview

This beginner’s guide will make you familiar with ArangoDB.We will cover how to

  • install and run a local ArangoDB server
  • use the web interface to interact with it
  • store example data in the database
  • query the database to retrieve the data again
  • edit and remove existing data

Installation

Head to arangodb.com/download,select your operating system and download ArangoDB. You may also followthe instructions on how to install with a package manager, if available.

If you installed a binary package under Linux, the server isautomatically started.

If you installed ArangoDB using homebrew under MacOS X, start theserver by running /usr/local/sbin/arangod.

If you installed ArangoDB under Windows as a service, the server isautomatically started. Otherwise, run the arangod.exe located in theinstallation folder’s bin directory. You may have to run it as administratorto grant it write permissions to C:\Program Files.

For more in-depth information on how to install ArangoDB, as well as availablestartup parameters, installation in a cluster and so on, seeInstalling.

ArangoDB offers two storage engines:MMFiles and RocksDB. Choose the one which suits your needs best in theinstallation process or on first startup.

Securing the installation

The default installation contains one database _system and a usernamed root.

Debian based packages and the Windows installer will ask for apassword during the installation process. Red-Hat based packages willset a random password. For all other installation packages you need toexecute

  1. shell> arango-secure-installation

This will ask for a root password and sets this password.

Web interface

The server itself (arangod) speaks HTTP / REST, but you can use thegraphical web interface to keep it simple. There’s alsoarangosh, a synchronous shellfor interaction with the server. If you’re a developer, you mightprefer the shell over the GUI. It does not provide features likesyntax highlighting however.

When you start using ArangoDB in your project, you will likely use an officialor community-made driver written in the same language as your project. Driversimplement a programming interface that should feel natural for that programminglanguage, and do all the talking to the server. Therefore, you can most certainlyignore the HTTP API unless you want to write a driver yourself or explicitlywant to use the raw interface.

To get familiar with the database system you can even put drivers aside anduse the web interface (code name Aardvark) for basic interaction.The web interface will become available shortly after you started arangod.You can access it in your browser at http://localhost:8529 - if not, pleasesee Troubleshooting.

By default, authentication is enabled. The default user isroot. Depending on the installation method used, the installationprocess either prompted for the root password or the default rootpassword is empty (see above).

Aardvark Login Form

Next you will be asked which database to use. Every server instance comes witha _system database. Select this database to continue.

select database

You should then be presented the dashboard with server statistics like this:

Aardvark Dashboard Request Statistics

For a more detailed description of the interface, see Web Interface.

Databases, collections and documents

Databases are sets of collections. Collections store records, which are referredto as documents. Collections are the equivalent of tables in RDBMS, anddocuments can be thought of as rows in a table. The difference is that you don’tdefine what columns (or rather attributes) there will be in advance. Everydocument in any collection can have arbitrary attribute keys andvalues. Documents in a single collection will likely have a similar structure inpractice however, but the database system itself does not impose it and willoperate stable and fast no matter how your data looks like.

Read more in the data-model concepts chapter.

For now, you can stick with the default system database and use the webinterface to create collections and documents. Start by clicking the_COLLECTIONS menu entry, then the Add Collection tile. Give it a name, e.g.users, leave the other settings unchanged (we want it to be a documentcollection) and Save it. A new tile labeled users should show up, whichyou can click to open.

There will be No documents yet. Click the green circle with the white pluson the right-hand side to create a first document in this collection. A dialogwill ask you for a key. You can leave the field blank and click _Create tolet the database system assign an automatically generated (unique) key. Notethat the _key property is immutable, which means you can not change it oncethe document is created. What you can use as document key is described in thenaming conventions.

An automatically generated key could be "9883" (key is always a string!),and the document _id would be "users/9883" in that case. Aside from a fewsystem attributes, there is nothing in this document yet. Let’s add a customattribute by clicking the icon to the left of (empty object), then _Append.Two input fields will become available, FIELD (attribute key) and VALUE(attribute value). Type name as key and your name as value. Append anotherattribute, name it age and set it to your age. Click Save to persist thechanges. If you click on Collection: users at the top on the right-hand sideof the ArangoDB logo, the document browser will show the documents in theusers collection and you will see the document you just created in the list.

Querying the database

Time to retrieve our document using AQL, ArangoDB’s query language. We candirectly look up the document we created via the id, but there are alsoother options. Click the _QUERIES menu entry to bring up the query editorand type the following (adjust the document ID to match your document):

  1. RETURN DOCUMENT("users/9883")

Then click Execute to run the query. The result appears below the query editor:

  1. [
  2. {
  3. "_key": "9883",
  4. "_id": "users/9883",
  5. "_rev": "9883",
  6. "age": 32,
  7. "name": "John Smith"
  8. }
  9. ]

As you can see, the entire document including the system attributes is returned.DOCUMENT() is a function to retrievea single document or a list of documents of which you know the _keys or _ids.We return the result of the function call as our query result, which is ourdocument inside of the result array (we could have returned more than one resultwith a different query, but even for a single document as result, we still getan array at the top level).

This type of query is called data access query. No data is created, changed ordeleted. There is another type of query called data modification query. Let’sinsert a second document using a modification query:

  1. INSERT { name: "Katie Foster", age: 27 } INTO users

The query is pretty self-explanatory: the INSERT keyword tells ArangoDB thatwe want to insert something. What to insert, a document with two attributes inthis case, follows next. The curly braces { } signify documents, or objects.When talking about records in a collection, we call them documents. Encoded asJSON, we call them objects. Objects can also be nested. Here’s an example:

  1. {
  2. "name": {
  3. "first": "Katie",
  4. "last": "Foster"
  5. }
  6. }

INTO is a mandatory part of every INSERT operation and is followed by thecollection name that we want to store the document in. Note that there are noquote marks around the collection name.

If you run above query, there will be an empty array as result because we didnot specify what to return using a RETURN keyword. It is optional inmodification queries, but mandatory in data access queries. Even with RETURN,the return value can still be an empty array, e.g. if the specified documentwas not found. Despite the empty result, the above query still created a newuser document. You can verify this with the document browser.

Let’s add another user, but return the newly created document this time:

  1. INSERT { name: "James Hendrix", age: 69 } INTO users
  2. RETURN NEW

NEW is a pseudo-variable, which refers to the document created by INSERT.The result of the query will look like this:

  1. [
  2. {
  3. "_key": "10074",
  4. "_id": "users/10074",
  5. "_rev": "10074",
  6. "age": 69,
  7. "name": "James Hendrix"
  8. }
  9. ]

Now that we have 3 users in our collection, how to retrieve them all with asingle query? The following does not work:

  1. RETURN DOCUMENT("users/9883")
  2. RETURN DOCUMENT("users/9915")
  3. RETURN DOCUMENT("users/10074")

There can only be a single RETURN statement here and a syntax error is raisedif you try to execute it. The DOCUMENT() function offers a secondary signatureto specify multiple document handles, so we could do:

  1. RETURN DOCUMENT( ["users/9883", "users/9915", "users/10074"] )

An array with the _ids of all 3 documents is passed to the function. Arraysare denoted by square brackets [ ] and their elements are separated by commas.

But what if we add more users? We would have to change the query to retrievethe newly added users as well. All we want to say with our query is: “For everyuser in the collection users, return the user document”. We can formulate thiswith a FOR loop:

  1. FOR user IN users
  2. RETURN user

It expresses to iterate over every document in users and to use user asvariable name, which we can use to refer to the current user document. It couldalso be called doc, u or ahuacatlguacamole, this is up to you. It isadvisable to use a short and self-descriptive name however.

The loop body tells the system to return the value of the variable user,which is a single user document. All user documents are returned this way:

  1. [
  2. {
  3. "_key": "9915",
  4. "_id": "users/9915",
  5. "_rev": "9915",
  6. "age": 27,
  7. "name": "Katie Foster"
  8. },
  9. {
  10. "_key": "9883",
  11. "_id": "users/9883",
  12. "_rev": "9883",
  13. "age": 32,
  14. "name": "John Smith"
  15. },
  16. {
  17. "_key": "10074",
  18. "_id": "users/10074",
  19. "_rev": "10074",
  20. "age": 69,
  21. "name": "James Hendrix"
  22. }
  23. ]

You may have noticed that the order of the returned documents is not necessarilythe same as they were inserted. There is no order guaranteed unless you explicitlysort them. We can add a SORT operation very easily:

  1. FOR user IN users
  2. SORT user._key
  3. RETURN user

This does still not return the desired result: James (10074) is returned beforeJohn (9883) and Katie (9915). The reason is that the _key attribute is a stringin ArangoDB, and not a number. The individual characters of the strings arecompared. 1 is lower than 9 and the result is therefore “correct”. If wewanted to use the numerical value of the _key attributes instead, we couldconvert the string to a number and use it for sorting. There are some implicationshowever. We are better off sorting something else. How about the age, in descendingorder?

  1. FOR user IN users
  2. SORT user.age DESC
  3. RETURN user

The users will be returned in the following order: James (69), John (32), Katie(27). Instead of DESC for descending order, ASC can be used for ascendingorder. ASC is the default though and can be omitted.

We might want to limit the result set to a subset of users, based on the ageattribute for example. Let’s return users older than 30 only:

  1. FOR user IN users
  2. FILTER user.age > 30
  3. SORT user.age
  4. RETURN user

This will return John and James (in this order). Katie’s age attribute does notfulfill the criterion (greater than 30), she is only 27 and therefore not partof the result set. We can make her age to return her user document again, usinga modification query:

  1. UPDATE "9915" WITH { age: 40 } IN users
  2. RETURN NEW

UPDATE allows to partially edit an existing document. There is also REPLACE,which would remove all attributes (except for _key and _id, which remain thesame) and only add the specified ones. UPDATE on the other hand only replacesthe specified attributes and keeps everything else as-is.

The UPDATE keyword is followed by the document key (or a document / objectwith a _key attribute) to identify what to modify. The attributes to updateare written as object after the WITH keyword. IN denotes in which collectionto perform this operation in, just like INTO (both keywords are actuallyinterchangeable here). The full document with the changes applied is returnedif we use the NEW pseudo-variable:

  1. [
  2. {
  3. "_key": "9915",
  4. "_id": "users/9915",
  5. "_rev": "12864",
  6. "age": 40,
  7. "name": "Katie Foster"
  8. }
  9. ]

If we used REPLACE instead, the name attribute would be gone. With UPDATE,the attribute is kept (the same would apply to additional attributes if we hadthem).

Let us run our FILTER query again, but only return the user names this time:

  1. FOR user IN users
  2. FILTER user.age > 30
  3. SORT user.age
  4. RETURN user.name

This will return the names of all 3 users:

  1. [
  2. "John Smith",
  3. "Katie Foster",
  4. "James Hendrix"
  5. ]

It is called a projection if only a subset of attributes is returned. Anotherkind of projection is to change the structure of the results:

  1. FOR user IN users
  2. RETURN { userName: user.name, age: user.age }

The query defines the output format for every user document. The user name isreturned as userName instead of name, the age keeps the attribute key inthis example:

  1. [
  2. {
  3. "userName": "James Hendrix",
  4. "age": 69
  5. },
  6. {
  7. "userName": "John Smith",
  8. "age": 32
  9. },
  10. {
  11. "userName": "Katie Foster",
  12. "age": 40
  13. }
  14. ]

It is also possible to compute new values:

  1. FOR user IN users
  2. RETURN CONCAT(user.name, "'s age is ", user.age)

CONCAT() is a function that can join elements together to a string. We use ithere to return a statement for every user. As you can see, the result set doesnot always have to be an array of objects:

  1. [
  2. "James Hendrix's age is 69",
  3. "John Smith's age is 32",
  4. "Katie Foster's age is 40"
  5. ]

Now let’s do something crazy: for every document in the users collection,iterate over all user documents again and return user pairs, e.g. John and Katie.We can use a loop inside a loop for this to get the cross product (every possiblecombination of all user records, 3 * 3 = 9). We don’t want pairings like John +John however, so let’s eliminate them with a filter condition:

  1. FOR user1 IN users
  2. FOR user2 IN users
  3. FILTER user1 != user2
  4. RETURN [user1.name, user2.name]

We get 6 pairings. Pairs like James + John and John + James are basicallyredundant, but fair enough:

  1. [
  2. [ "James Hendrix", "John Smith" ],
  3. [ "James Hendrix", "Katie Foster" ],
  4. [ "John Smith", "James Hendrix" ],
  5. [ "John Smith", "Katie Foster" ],
  6. [ "Katie Foster", "James Hendrix" ],
  7. [ "Katie Foster", "John Smith" ]
  8. ]

We could calculate the sum of both ages and compute something new this way:

  1. FOR user1 IN users
  2. FOR user2 IN users
  3. FILTER user1 != user2
  4. RETURN {
  5. pair: [user1.name, user2.name],
  6. sumOfAges: user1.age + user2.age
  7. }

We introduce a new attribute sumOfAges and add up both ages for the value:

  1. [
  2. {
  3. "pair": [ "James Hendrix", "John Smith" ],
  4. "sumOfAges": 101
  5. },
  6. {
  7. "pair": [ "James Hendrix", "Katie Foster" ],
  8. "sumOfAges": 109
  9. },
  10. {
  11. "pair": [ "John Smith", "James Hendrix" ],
  12. "sumOfAges": 101
  13. },
  14. {
  15. "pair": [ "John Smith", "Katie Foster" ],
  16. "sumOfAges": 72
  17. },
  18. {
  19. "pair": [ "Katie Foster", "James Hendrix" ],
  20. "sumOfAges": 109
  21. },
  22. {
  23. "pair": [ "Katie Foster", "John Smith" ],
  24. "sumOfAges": 72
  25. }
  26. ]

If we wanted to post-filter on the new attribute to only return pairs with asum less than 100, we should define a variable to temporarily store the sum,so that we can use it in a FILTER statement as well as in the RETURNstatement:

  1. FOR user1 IN users
  2. FOR user2 IN users
  3. FILTER user1 != user2
  4. LET sumOfAges = user1.age + user2.age
  5. FILTER sumOfAges < 100
  6. RETURN {
  7. pair: [user1.name, user2.name],
  8. sumOfAges: sumOfAges
  9. }

The LET keyword is followed by the designated variable name (sumOfAges),then there’s a = symbol and the value or an expression to define what valuethe variable is supposed to have. We re-use our expression to calculate thesum here. We then have another FILTER to skip the unwanted pairings andmake use of the variable we declared before. We return a projection with anarray of the user names and the calculated age, for which we use the variableagain:

  1. [
  2. {
  3. "pair": [ "John Smith", "Katie Foster" ],
  4. "sumOfAges": 72
  5. },
  6. {
  7. "pair": [ "Katie Foster", "John Smith" ],
  8. "sumOfAges": 72
  9. }
  10. ]

Pro tip: when defining objects, if the desired attribute key and the variableto use for the attribute value are the same, you can use a shorthand notation:{ sumOfAges } instead of { sumOfAges: sumOfAges }.

Finally, let’s delete one of the user documents:

  1. REMOVE "9883" IN users

It deletes the user John (_key: "9883"). We could also remove documents in aloop (same goes for INSERT, UPDATE and REPLACE):

  1. FOR user IN users
  2. FILTER user.age >= 30
  3. REMOVE user IN users

The query deletes all users whose age is greater than or equal to 30.

How to continue

There is a lot more to discover in AQL and much morefunctionality that ArangoDB offers. Continue reading the other chapters andexperiment with a test database to foster your knowledge.

If you want to write more AQL queries right now, have a look here:

ArangoDB programs

The ArangoDB package comes with the following programs:

  • arangod: The ArangoDB database daemon.This server program is intended to run as a daemon process and to serve thevarious clients connection to the server via TCP / HTTP.

  • arangosh: The ArangoDB shell.A client that implements a read-eval-print loop (REPL) and provides functionsto access and administrate the ArangoDB server.

  • arangoimp: A bulk importer for theArangoDB server. It supports JSON and CSV.

  • arangodump: A tool to create backupsof an ArangoDB database in JSON format.

  • arangorestore: A tool to load data of a backupback into an ArangoDB database.

  • arango-dfdb: A datafile debugger forArangoDB. It is primarily intended to be used during development of ArangoDB.

  • arangobench: A benchmark and test tool.It can be used for performance and server function testing.