Import Database of Beers in OrientDB

Import the database of Beers - 图1

First, create a new folder somewhere on your hard drive. For this test we’ll assume /temp/openbeer.

  1. $ mkdir /temp/openbeer

Download Beers Database in CSV format

  1. $ curl http://openbeerdb.com/data_files/openbeerdb_csv.zip > openbeerdb_csv.zip
  2. $ unzip openbeerdb_csv.zip

Install OrientDB

  1. $ curl "http://orientdb.com/download.php?email=unknown@unknown.com&file=orientdb-community-2.0.9.zip&os=multi" > orientdb-community-2.0.9.zip
  2. $ unzip orientdb-community-2.0.9.zip

Import Beer Categories

These are the first 2 lines of categories.csv file:

  1. "id","cat_name","last_mod"
  2. "1","British Ale","2010-10-24 13:50:10"

In order to import this file in OrientDB, we have to create the following file as categories.json:

  1. {
  2. "source": { "file": { "path": "/temp/openbeer/openbeerdb_csv/categories.csv" } },
  3. "extractor": { "csv": {} },
  4. "transformers": [
  5. { "vertex": { "class": "Category" } }
  6. ],
  7. "loader": {
  8. "orientdb": {
  9. "dbURL": "plocal:../databases/openbeerdb",
  10. "dbType": "graph",
  11. "classes": [
  12. {"name": "Category", "extends": "V"}
  13. ], "indexes": [
  14. {"class":"Category", "fields":["id:integer"], "type":"UNIQUE" }
  15. ]
  16. }
  17. }
  18. }

Now to import it into OrientDB, move into the “bin” directory of OrientDB distribution.

  1. $ cd orientdb-community-2.0.9/bin

And run OrientDB ETL.

  1. $ ./oetl.sh /temp/openbeer/categories.json
  2. OrientDB etl v.2.0.9 (build @BUILD@) www.orientechnologies.com
  3. BEGIN ETL PROCESSOR
  4. END ETL PROCESSOR
  5. + extracted 12 rows (0 rows/sec) - 12 rows -> loaded 11 vertices (0 vertices/sec) Total time: 77ms [0 warnings, 0 errors]

Import Beer Styles

Now let’s import the Beer Styles. These are the first 2 lines of the styles.csv file.

  1. "id","cat_id","style_name","last_mod"
  2. "1","1","Classic English-Style Pale Ale","2010-10-24 13:53:31"

In this case, we’ll correlate the Style with the Category created earlier. This is the styles.json to use with OrientDB ETL for the next step.

  1. {
  2. "source": { "file": { "path": "/temp/openbeer/openbeerdb_csv/styles.csv" } },
  3. "extractor": { "csv": {} },
  4. "transformers": [
  5. { "vertex": { "class": "Style" } },
  6. { "edge": { "class": "HasCategory", "joinFieldName": "cat_id", "lookup": "Category.id" } }
  7. ],
  8. "loader": {
  9. "orientdb": {
  10. "dbURL": "plocal:../databases/openbeerdb",
  11. "dbType": "graph",
  12. "classes": [
  13. {"name": "Style", "extends": "V"},
  14. {"name": "HasCategory", "extends": "E"}
  15. ], "indexes": [
  16. {"class":"Style", "fields":["id:integer"], "type":"UNIQUE" }
  17. ]
  18. }
  19. }
  20. }

Now import the styles.

  1. $ ./oetl.sh /temp/openbeer/styles.json
  2. OrientDB etl v.2.0.9 (build @BUILD@) www.orientechnologies.com
  3. BEGIN ETL PROCESSOR
  4. END ETL PROCESSOR
  5. + extracted 142 rows (0 rows/sec) - 142 rows -> loaded 141 vertices (0 vertices/sec) Total time: 498ms [0 warnings, 0 errors]

Import Breweries

Now it’s time for the Breweries. These are the first 2 lines of the breweries.csv file.

  1. "id","name","address1","address2","city","state","code","country","phone","website","filepath","descript","last_mod"
  2. "1","(512) Brewing Company","407 Radam, F200",,"Austin","Texas","78745","United States","512.707.2337","http://512brewing.com/",,"(512) Brewing Company is a microbrewery located in the heart of Austin that brews for the community using as many local, domestic and organic ingredients as possible.","2010-07-22 20:00:20"

Breweries have no outgoing relations with other entities, so this is a plain import similar to categories. This is the breweries.json to use with OrientDB ETL for the next step.

  1. {
  2. "source": { "file": { "path": "/temp/openbeer/openbeerdb_csv/breweries.csv" } },
  3. "extractor": { "csv": {} },
  4. "transformers": [
  5. { "vertex": { "class": "Brewery" } }
  6. ],
  7. "loader": {
  8. "orientdb": {
  9. "dbURL": "plocal:../databases/openbeerdb",
  10. "dbType": "graph",
  11. "classes": [
  12. {"name": "Brewery", "extends": "V"}
  13. ], "indexes": [
  14. {"class":"Brewery", "fields":["id:integer"], "type":"UNIQUE" }
  15. ]
  16. }
  17. }
  18. }

Run the import for breweries.

  1. $ ./oetl.sh /temp/openbeer/breweries.json
  2. OrientDB etl v.2.0.9 (build @BUILD@) www.orientechnologies.com
  3. BEGIN ETL PROCESSOR
  4. END ETL PROCESSOR
  5. + extracted 1.395 rows (0 rows/sec) - 1.395 rows -> loaded 1.394 vertices (0 vertices/sec) Total time: 830ms [0 warnings, 0 errors]

Import Beers

Now it’s time for the last and most important file: the Beers! These are the first 2 lines of the beers.csv file.

  1. "id","brewery_id","name","cat_id","style_id","abv","ibu","srm","upc","filepath","descript","last_mod",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
  2. "1","812","Hocus Pocus","11","116","4.5","0","0","0",,"Our take on a classic summer ale. A toast to weeds, rays, and summer haze. A light, crisp ale for mowing lawns, hitting lazy fly balls, and communing with nature, Hocus Pocus is offered up as a summer sacrifice to clodless days.

As you can see each beer is connected to other entities through the following fields:

  • brewery_id -> Brewery
  • cat_id -> Category
  • style_id -> Style

This is the breweries.json to use with OrientDB ETL for the next step.

  1. {
  2. "config" : { "haltOnError": false },
  3. "source": { "file": { "path": "/temp/openbeer/openbeerdb_csv/beers.csv" } },
  4. "extractor": { "csv": { "columns": ["id","brewery_id","name","cat_id","style_id","abv","ibu","srm","upc","filepath","descript","last_mod"],
  5. "columnsOnFirstLine": true } },
  6. "transformers": [
  7. { "vertex": { "class": "Beer" } },
  8. { "edge": { "class": "HasCategory", "joinFieldName": "cat_id", "lookup": "Category.id" } },
  9. { "edge": { "class": "HasBrewery", "joinFieldName": "brewery_id", "lookup": "Brewery.id" } },
  10. { "edge": { "class": "HasStyle", "joinFieldName": "style_id", "lookup": "Style.id" } }
  11. ],
  12. "loader": {
  13. "orientdb": {
  14. "dbURL": "plocal:../databases/openbeerdb",
  15. "dbType": "graph",
  16. "classes": [
  17. {"name": "Beer", "extends": "V"},
  18. {"name": "HasCategory", "extends": "E"},
  19. {"name": "HasStyle", "extends": "E"},
  20. {"name": "HasBrewery", "extends": "E"}
  21. ], "indexes": [
  22. {"class":"Beer", "fields":["id:integer"], "type":"UNIQUE" }
  23. ]
  24. }
  25. }
  26. }

Run the final import for beers.

  1. $ ./oetl.sh /temp/openbeer/beers.json
  2. OrientDB etl v.2.0.9 (build @BUILD@) www.orientechnologies.com
  3. BEGIN ETL PROCESSOR
  4. ...
  5. + extracted 5.862 rows (1.041 rows/sec) - 5.862 rows -> loaded 4.332 vertices (929 vertices/sec) Total time: 10801ms [0 warnings, 27 errors]
  6. END ETL PROCESSOR

_Note: 27 errors are due to the 27 wrong content lines that have no id.

This database is available online. Install it with:

  • Studio: in the login page press the “Cloud” button, put server’s credential and press on download button on “OpenBeer” line
  • Download it manually from http://orientdb.com/public-databases/OpenBeer.zip and unzip it in a OpenBeer folder inside OrientDB’s server “databases” directory