JSON Expressions

Sample data

  1. {
  2. "name": {"first": "Tom", "last": "Anderson"},
  3. "age":37,
  4. "children": ["Sara","Alex","Jack"],
  5. "fav.movie": "Deer Hunter",
  6. "friends": [
  7. {"first": "Dale", "last": "Murphy", "age": 44},
  8. {"first": "Roger", "last": "Craig", "age": 68},
  9. {"first": "Jane", "last": "Murphy", "age": 47}
  10. ],
  11. "followers": {
  12. "Group1": [
  13. {"first": "John", "last": "Shavor", "age": 22},
  14. {"first": "Ken", "last": "Miller", "age": 33}
  15. ],
  16. "Group2": [
  17. {"first": "Alice", "last": "Murphy", "age": 33},
  18. {"first": "Brian", "last": "Craig", "age": 44}
  19. ]
  20. }
  21. "ops": {
  22. "functionA": {"numArgs": 2},
  23. "functionB": {"numArgs": 3},
  24. "functionC": {"variadic": true}
  25. }
  26. }

Basic expressions

Identifier

Source Dereference (.) The source dereference operator can be used to specify columns by dereferencing the source stream or table. The -> dereference selects a key in a nested JSON object.

  1. SELECT demo.age FROM demo
  2. {"age" : 37}
  1. SELECT demo.name->first FROM demo
  2. {"first" : "Tom"}
  1. SELECT name->first AS fname FROM demo
  2. {"fname": "Tom"}

Index expression

Index Expressions allow you to select a specific element in a list. It should look similar to array access in common programming languages.The index value starts with 0, -1 is the starting position from the end, and so on.

  1. SELECT children FROM demo
  2. {
  3. "children": ["Sara","Alex","Jack"]
  4. }
  1. SELECT children[0] FROM demo
  2. {
  3. "children": "Sara"
  4. }
  5. SELECT children[1] FROM demo
  6. {
  7. "children": "Alex"
  8. }
  9. SELECT children[-1] FROM demo
  10. {
  11. "children": "Jack"
  12. }
  13. SELECT children[-2] FROM demo
  14. {
  15. "children": "Alex"
  16. }
  17. SELECT d.friends[0]->last FROM demo AS d
  18. {
  19. "last" : "Murphy"
  20. }

Slicing

Slices allow you to select a contiguous subset of an array.

field[from:to)is the interval before closing and opening, excluding to. If from is not specified, then it means start from the 1st element of array; If to is not specified, then it means end with the last element of array.

  1. SELECT children[0:1] FROM demo
  2. {
  3. "children": ["Sara"]
  4. }
  5. SELECT children[1:-1] FROM demo
  6. {
  7. "children": ["Alex"]
  8. }
  9. SELECT children[0:-1] FROM demo
  10. {
  11. "children": ["Sara","Alex"]
  12. }
  1. SELECT children[:] FROM demo == SELECT children FROM demo
  2. {
  3. "children": ["Sara","Alex","Jack"]
  4. }
  1. SELECT children[:2] FROM demo
  2. {
  3. "children": ["Sara","Alex"]
  4. }
  1. SELECT followers->Group1[:1]->first FROM demo
  2. {
  3. "first": ["John"]
  4. }

Json Path functions

Kuiper provides a list of functions to allow to execute json path over struct or array columns or values. The functions are:

  1. json_path_exists(col, jsonpath)
  2. json_path_query(col, jsonpath)
  3. json_path_query_first(col, jsonpath)

Please refer to json functions for detail.

All these functions share the same parameter signatures, among which, the second parameter is a jsonpath string. The jsonpath grammer used by Kuiper is based on JsonPathJSON Expressions - 图1 (opens new window).

The basic grammar of those expressions is to use the keys part of the JSON objects combined with some elements:

  • Dots . to move into a tree
  • Brackets [] for access to a given array member coupled with a position. It can also access to a map field.
  • Variables, with $ representing a JSON text and @ for result path evaluations.

So for example, when applied to the previous JSON data sample we can reach the following parts of the tree with these expressions:

  • $.age refers to 37.
  • $.friends.first refers to “dale”.
  • $.friends refers to the full array of friends.
  • $.friends[0] refers to the first friend listed in the previous array (contrary to arrays members are zero-based).
  • $.friends[0][lastname] refers to the lastname of the first friend listed. Use bracket if there are reserved words or special characters (such as space ‘ ‘, ‘.’ and Chinese etc) in the field key.
  • $.friends[? @.age>60].first or $.friends[? (@.age>60)].first refers to the first name of the friends whose age is bigger than 60. Notice that the space between ? and the condition is required even the condition is with braces.

Developers can use the json functions in the SQL statement. Here are some examples.

  • Select the lastname of group1 followers
  1. SELECT json_path_query(followers, "$.Group1[*].last") FROM demo
  2. ["Shavor","Miller"]
  • Select the lastname if any of the group1 followers is older than 60
  1. SELECT name->last FROM demo where json_path_exists(followers, "$.Group1[? @.age>30]")
  2. "Anderson"
  • Select the follower’s lastname from group1 whose age is bigger than 30
  1. SELECT json_path_exists(followers, "$.Group1[? @.age>30].last") FROM demo
  2. ["Miller"]
  • Assume there is a field in follows with reserved words or chars like dot my.follower, use bracket to access it.
  1. SELECT json_path_exists(followers, "$[\"my.follower\"]") FROM demo
  2. ["Miller"]

Projections - NOT SUPPORT YET

List & Slice projections

A wildcard expression creates a list projection, which is a projection over a JSON array.

  1. SELECT demo.friends[*]->first FROM demo
  2. {
  3. "first": ["Dale", "Roger", "Jane"]
  4. }
  1. SELECT friends[:1]->first FROM demo
  2. {
  3. "first": ["Dale", "Roger"]
  4. }

Object projections

  1. SELECT ops->*->numArgs FROM demo
  2. { "numArgs" : [2, 3] }