Adjacency Matrix Aggregation

A bucket aggregation returning a form of adjacency matrix. The request provides a collection of named filter expressions, similar to the filters aggregation request. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.

Given filters named A, B and C the response would return buckets with the following names:

ABC

A

A

A&B

A&C

B

B

B&C

C

C

The intersecting buckets e.g A&C are labelled using a combination of the two filter names separated by the ampersand character. Note that the response does not also include a “C&A” bucket as this would be the same set of documents as “A&C”. The matrix is said to be symmetric so we only return half of it. To do this we sort the filter name strings and always use the lowest of a pair as the value to the left of the “&” separator.

An alternative separator parameter can be passed in the request if clients wish to use a separator string other than the default of the ampersand.

Example:

  1. PUT /emails/_bulk?refresh
  2. { "index" : { "_id" : 1 } }
  3. { "accounts" : ["hillary", "sidney"]}
  4. { "index" : { "_id" : 2 } }
  5. { "accounts" : ["hillary", "donald"]}
  6. { "index" : { "_id" : 3 } }
  7. { "accounts" : ["vladimir", "donald"]}
  8. GET emails/_search
  9. {
  10. "size": 0,
  11. "aggs" : {
  12. "interactions" : {
  13. "adjacency_matrix" : {
  14. "filters" : {
  15. "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
  16. "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
  17. "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
  18. }
  19. }
  20. }
  21. }
  22. }

In the above example, we analyse email messages to see which groups of individuals have exchanged messages. We will get counts for each group individually and also a count of messages for pairs of groups that have recorded interactions.

Response:

  1. {
  2. "took": 9,
  3. "timed_out": false,
  4. "_shards": ...,
  5. "hits": ...,
  6. "aggregations": {
  7. "interactions": {
  8. "buckets": [
  9. {
  10. "key":"grpA",
  11. "doc_count": 2
  12. },
  13. {
  14. "key":"grpA&grpB",
  15. "doc_count": 1
  16. },
  17. {
  18. "key":"grpB",
  19. "doc_count": 2
  20. },
  21. {
  22. "key":"grpB&grpC",
  23. "doc_count": 1
  24. },
  25. {
  26. "key":"grpC",
  27. "doc_count": 1
  28. }
  29. ]
  30. }
  31. }
  32. }

Usage

On its own this aggregation can provide all of the data required to create an undirected weighted graph. However, when used with child aggregations such as a date_histogram the results can provide the additional levels of data required to perform dynamic network analysis where examining interactions over time becomes important.

Limitations

For N filters the matrix of buckets produced can be N²/2 and so there is a default maximum imposed of 100 filters . This setting can be changed using the index.max_adjacency_matrix_filters index-level setting (note this setting is deprecated and will be repaced with indices.query.bool.max_clause_count in 8.0+).