Arangoimport Examples: CSV / TSV

Importing CSV Data

arangoimport offers the possibility to import data from CSV files. Thiscomes handy when the data at hand is in CSV format already and you don’t want tospend time converting them to JSON for the import.

To import data from a CSV file, make sure your file contains the attribute namesin the first row. All the following lines in the file will be interpreted asdata records and will be imported.

The CSV import requires the data to have a homogeneous structure. All recordsmust have exactly the same amount of columns as there are headers. By default,lines with a different number of values will not be imported and there will be warnings for them. To still import lines with less values than in the header,there is the —ignore-missing option. If set to true, lines that have adifferent amount of fields will be imported. In this case only those attributeswill be populated for which there are values. Attributes for which there areno values present will silently be discarded.

Example:

  1. "first","last","age","active","dob"
  2. "John","Connor",25,true
  3. "Jim","O'Brady"

With —ignore-missing this will produce the following documents:

  1. { "first" : "John", "last" : "Connor", "active" : true, "age" : 25 }
  2. { "first" : "Jim", "last" : "O'Brady" }

The cell values can have different data types though. If a cell does not haveany value, it can be left empty in the file. These values will not be importedso the attributes will not “be there” in document created. Values enclosed inquotes will be imported as strings, so to import numeric values, boolean valuesor the null value, don’t enclose the value in quotes in your file.

We’ll be using the following import for the CSV import:

  1. "first","last","age","active","dob"
  2. "John","Connor",25,true,
  3. "Jim","O'Brady",19,,
  4. "Lisa","Jones",,,"1981-04-09"
  5. Hans,dos Santos,0123,,
  6. Wayne,Brewer,,false,

The command line to execute the import is:

  1. arangoimport --file "data.csv" --type csv --collection "users"

The above data will be imported into 5 documents which will look as follows:

  1. { "first" : "John", "last" : "Connor", "active" : true, "age" : 25 }
  2. { "first" : "Jim", "last" : "O'Brady", "age" : 19 }
  3. { "first" : "Lisa", "last" : "Jones", "dob" : "1981-04-09" }
  4. { "first" : "Hans", "last" : "dos Santos", "age" : 123 }
  5. { "first" : "Wayne", "last" : "Brewer", "active" : false }

As can be seen, values left completely empty in the input file will be treatedas absent. Numeric values not enclosed in quotes will be treated as numbers.Note that leading zeros in numeric values will be removed. To import numberswith leading zeros, please use strings.The literals true and false will be treated as booleans if they are notenclosed in quotes. Other values not enclosed in quotes will be treated asstrings.Any values enclosed in quotes will be treated as strings, too.

String values containing the quote character or the separator must be enclosedwith quote characters. Within a string, the quote character itself must beescaped with another quote character (or with a backslash if the _—backslash-escape_option is used).

Note that the quote and separator characters can be adjusted via the—quote and —separator arguments when invoking arangoimport. The quotecharacter defaults to the double quote (). To use a literal quote in astring, you can use two quote characters.To use backslash for escaping quote characters, please set the option—backslash-escape to true.

The importer supports Windows (CRLF) and Unix (LF) line breaks. Line breaks mightalso occur inside values that are enclosed with the quote character.

Here’s an example for using literal quotes and newlines inside values:

  1. "name","password"
  2. "Foo","r4ndom""123!"
  3. "Bar","wow!
  4. this is a
  5. multine password!"
  6. "Bartholomew ""Bart"" Simpson","Milhouse"

Extra whitespace at the end of each line will be ignored. Whitespace at thestart of lines or between field values will not be ignored, so please make surethat there is no extra whitespace in front of values or between them.

Importing TSV Data

You may also import tab-separated values (TSV) from a file. This format is verysimple: every line in the file represents a data record. There is no quoting orescaping. That also means that the separator character (which defaults to thetabstop symbol) must not be used anywhere in the actual data.

As with CSV, the first line in the TSV file must contain the attribute names,and all lines must have an identical number of values.

If a different separator character or string should be used, it can be specifiedwith the —separator argument.

An example command line to execute the TSV import is:

  1. arangoimport --file "data.tsv" --type tsv --collection "users"

Attribute Name Translation

For the CSV and TSV input formats, attribute names can be translated automatically.This is useful in case the import file has different attribute names than thosethat should be used in ArangoDB.

A common use case is to rename an “id” column from the input file into “_key” asit is expected by ArangoDB. To do this, specify the following translation wheninvoking arangoimport:

  1. arangoimport --file "data.csv" --type csv --translate "id=_key"

Other common cases are to rename columns in the input file to from_ and to_:

  1. arangoimport --file "data.csv" --type csv --translate "from=_from" --translate "to=_to"

The translate option can be specified multiple types. The source attribute nameand the target attribute must be separated with a =.

Ignoring Attributes

For the CSV and TSV input formats, certain attribute names can be ignored on imports.In an ArangoDB cluster there are cases where this can come in handy,when your documents already contain a key attributeand your collection has a sharding attribute other than _key: In the cluster thisconfiguration is not supported, because ArangoDB needs to guarantee the uniqueness of the _keyattribute in _all shards of the collection.

  1. arangoimport --file "data.csv" --type csv --remove-attribute "_key"

The same thing would apply if your data contains an _id attribute:

  1. arangoimport --file "data.csv" --type csv --remove-attribute "_id"