Satellite Collections

This feature is only available in theEnterprise Edition

When doing joins in an ArangoDB cluster data has to be exchanged between different servers.

Joins will be executed on a coordinator. It will prepare an execution planand execute it. When executing, the coordinator will contact all shards of thestarting point of the join and ask for their data. The database servers carryingout this operation will load all their local data and then ask the cluster forthe other part of the join. This again will be distributed to all involved shardsof this join part.

In sum this results in much network traffic and slow results depending of theamount of data that has to be sent throughout the cluster.

Satellite collections are collections that are intended to address this issue.

They will facilitate the synchronous replication and replicate all its datato all database servers that are part of the cluster.

This enables the database servers to execute that part of any join locally.

This greatly improves performance for such joins at the costs of increasedstorage requirements and poorer write performance on this data.

To create a satellite collection set the replicationFactor of this collectionto “satellite”.

Using arangosh:

  1. arangosh> db._create("satellite", {"replicationFactor": "satellite"});

A full example

  1. arangosh> var explain = require("@arangodb/aql/explainer").explain
  2. arangosh> db._create("satellite", {"replicationFactor": "satellite"})
  3. arangosh> db._create("nonsatellite", {numberOfShards: 8})
  4. arangosh> db._create("nonsatellite2", {numberOfShards: 8})

Let’s analyse a normal join not involving satellite collections:

  1. arangosh> explain("FOR doc in nonsatellite FOR doc2 in nonsatellite2 RETURN 1")
  2. Query string:
  3. FOR doc in nonsatellite FOR doc2 in nonsatellite2 RETURN 1
  4. Execution plan:
  5. Id NodeType Site Est. Comment
  6. 1 SingletonNode DBS 1 * ROOT
  7. 4 CalculationNode DBS 1 - LET #2 = 1 /* json expression */ /* const assignment */
  8. 2 EnumerateCollectionNode DBS 0 - FOR doc IN nonsatellite /* full collection scan */
  9. 12 RemoteNode COOR 0 - REMOTE
  10. 13 GatherNode COOR 0 - GATHER
  11. 6 ScatterNode COOR 0 - SCATTER
  12. 7 RemoteNode DBS 0 - REMOTE
  13. 3 EnumerateCollectionNode DBS 0 - FOR doc2 IN nonsatellite2 /* full collection scan */
  14. 8 RemoteNode COOR 0 - REMOTE
  15. 9 GatherNode COOR 0 - GATHER
  16. 5 ReturnNode COOR 0 - RETURN #2
  17. Indexes used:
  18. none
  19. Optimization rules applied:
  20. Id RuleName
  21. 1 move-calculations-up
  22. 2 scatter-in-cluster
  23. 3 remove-unnecessary-remote-scatter

All shards involved querying the nonsatellite collection will fan out via thecoordinator to the shards of nonsatellite. In sum 8 shards will open 8 connectionsto the coordinator asking for the results of the nonsatellite2 join. The coordinatorwill fan out to the 8 shards of nonsatellite2. So there will be quite somenetwork traffic.

Let’s now have a look at the same using satellite collections:

  1. arangosh> db._query("FOR doc in nonsatellite FOR doc2 in satellite RETURN 1")
  2. Query string:
  3. FOR doc in nonsatellite FOR doc2 in satellite RETURN 1
  4. Execution plan:
  5. Id NodeType Site Est. Comment
  6. 1 SingletonNode DBS 1 * ROOT
  7. 4 CalculationNode DBS 1 - LET #2 = 1 /* json expression */ /* const assignment */
  8. 2 EnumerateCollectionNode DBS 0 - FOR doc IN nonsatellite /* full collection scan */
  9. 3 EnumerateCollectionNode DBS 0 - FOR doc2 IN satellite /* full collection scan, satellite */
  10. 8 RemoteNode COOR 0 - REMOTE
  11. 9 GatherNode COOR 0 - GATHER
  12. 5 ReturnNode COOR 0 - RETURN #2
  13. Indexes used:
  14. none
  15. Optimization rules applied:
  16. Id RuleName
  17. 1 move-calculations-up
  18. 2 scatter-in-cluster
  19. 3 remove-unnecessary-remote-scatter
  20. 4 remove-satellite-joins

In this scenario all shards of nonsatellite will be contacted. Howeveras the join is a satellite join all shards can do the join locallyas the data is replicated to all servers reducing the network overheaddramatically.

Caveats

The cluster will automatically keep all satellite collections on all servers in syncby facilitating the synchronous replication. This means that write will be executedon the leader only and this server will coordinate replication to the followers.If a follower doesn’t answer in time (due to network problems, temporary shutdown etc.)it may be removed as a follower. This is being reported to the Agency.

The follower (once back in business) will then periodically check the Agency and knowthat it is out of sync. It will then automatically catch up. This may take a whiledepending on how much data has to be synced. When doing a join involving the satelliteyou can specify how long the DBServer is allowed to wait for sync until the queryis being aborted.

Check Accessing Cursorsfor details.

During network failure there is also a minimal chance that a query was properlydistributed to the DBServers but that a previous satellite write could not bereplicated to a follower and the leader dropped the follower. The follower howeveronly checks every few seconds if it is really in sync so it might indeed deliverstale results.