Geo search

Table of contents

Introduction

CrateDB can be used to store and query geographical information of many kinds using the geo_point and geo_shape types. With these it is possible to store geographical locations, ways, shapes, areas and other entities. These can be queried for distance, containment, intersection and so on, making it possible to create apps and services with rich geographical features.

Geographic shapes are stored using special indices. Geographic points are represented by their coordinates. They are represented as columns of the respective datatypes.

Geographic indices for geo_shape columns are used in order to speed up geographic searches even on complex shapes. This indexing process results in a representation that is not exact (See geo_shape for details). CrateDB does not operate on vector shapes but on a kind of a grid with the given precision as resolution.

Creating tables containing geographic information is straightforward:

  1. cr> CREATE TABLE country (
  2. ... name text,
  3. ... country_code text primary key,
  4. ... shape geo_shape INDEX USING "geohash" WITH (precision='100m'),
  5. ... capital text,
  6. ... capital_location geo_point
  7. ... ) WITH (number_of_replicas=0);
  8. CREATE OK, 1 row affected (... sec)

This table will contain the shape of a country and the location of its capital alongside with other metadata. The shape is indexed with a maximum precision of 100 meters using a geohash index (For more information, see Geo shape index structure).

Let’s insert Austria:

  1. cr> INSERT INTO country (name, country_code, shape, capital, capital_location)
  2. ... VALUES (
  3. ... 'Austria',
  4. ... 'at',
  5. ... {type='Polygon', coordinates=[
  6. ... [[16.979667, 48.123497], [16.903754, 47.714866],
  7. ... [16.340584, 47.712902], [16.534268, 47.496171],
  8. ... [16.202298, 46.852386], [16.011664, 46.683611],
  9. ... [15.137092, 46.658703], [14.632472, 46.431817],
  10. ... [13.806475, 46.509306], [12.376485, 46.767559],
  11. ... [12.153088, 47.115393], [11.164828, 46.941579],
  12. ... [11.048556, 46.751359], [10.442701, 46.893546],
  13. ... [9.932448, 46.920728], [9.47997, 47.10281],
  14. ... [9.632932, 47.347601], [9.594226, 47.525058],
  15. ... [9.896068, 47.580197], [10.402084, 47.302488],
  16. ... [10.544504, 47.566399], [11.426414, 47.523766],
  17. ... [12.141357, 47.703083], [12.62076, 47.672388],
  18. ... [12.932627, 47.467646], [13.025851, 47.637584],
  19. ... [12.884103, 48.289146], [13.243357, 48.416115],
  20. ... [13.595946, 48.877172], [14.338898, 48.555305],
  21. ... [14.901447, 48.964402], [15.253416, 49.039074],
  22. ... [16.029647, 48.733899], [16.499283, 48.785808],
  23. ... [16.960288, 48.596982], [16.879983, 48.470013],
  24. ... [16.979667, 48.123497]]
  25. ... ]},
  26. ... 'Vienna',
  27. ... [16.372778, 48.209206]
  28. ... );
  29. INSERT OK, 1 row affected (... sec)

Caution

Geoshapes has to be fully valid by ISO 19107. If you have problems importing geo data, they may not be fully valid. In most cases they could be repaired using this tool: https://github.com/tudelft3d/prepair

Note

When using a polygon shape that resembles a rectangle, and that rectangle is wider than 180 degrees, the CrateDB geoshape validator will convert it into a multipolygon consisting of 2 rectangular shapes covering the narrower area between the 4 original points split by the dateline (+/- 180deg).

This is due to CrateDB operating in the geospatial context of the earth.

Geographic points can be inserted as a double precision array with longitude and latitude values as seen above or by using a WKT string.

Geographic shapes can be inserted as GeoJSON object literal or parameter as seen above and as WKT string.

When it comes to get some meaningful insights into your geographical data CrateDB supports different kinds of geographic queries.

Fast queries that leverage the geographic index are done using the MATCH predicate:

MATCH predicate

The MATCH predicate can be used to perform multiple kinds of searches on indices or indexed columns. While it can be used to perform fulltext searches on analyzed indices of type text, it is also handy for operating on geographic indices, querying for relations between geographical shapes and points.

  1. MATCH (column_ident, query_term) [ using match_type ]

The MATCH predicate for geographical search supports a single column_ident of a geo_shape indexed column as first argument.

The second argument, the query_term is taken to match against the indexed geo_shape.

The matching operation is determined by the match_type which determines the spatial relation we want to match. Available match_types are:

intersects

(Default) If the two shapes share some points and/or area, they are intersecting and considered matching using this match_type. This also precludes containment or complete equality.

disjoint

If the two shapes share no single point or area, they are disjoint. This is the opposite of intersects.

within

If the indexed column_ident shape is completely inside the query_term shape, they are considered matching using this match_type.

Note

The MATCH predicate can only be used in the WHERE clause and on user-created tables. Using the MATCH predicate on system tables is not supported.

One MATCH predicate cannot combine columns of both relations of a join.

Additionally, MATCH predicates cannot be used on columns of both relations of a join if they cannot be logically applied to each of them separately. For example:

This is allowed:

  1. FROM t1, t2
  2. WHERE match(t1.shape, 'POINT(1.1 2.2)')
  3. AND match(t2.shape, 'POINT(3.3 4.4)')

But this is not:

  1. FROM t1, t2
  2. WHERE match(t1.shape, 'POINT(1.1 2.2)')
  3. OR match(t2.shape, 'POINT(3.3 4.4)')``

Having a table countries with a GEO_SHAPE column geo, indexed using geohash, you can query that column using the MATCH predicate with different match types as described above:

  1. cr> SELECT name from countries
  2. ... WHERE match("geo",
  3. ... 'LINESTRING (13.3813 52.5229, 11.1840 51.5497, 8.6132 50.0782, 8.3715 47.9457, 8.5034 47.3685)'
  4. ... );
  5. +---------+
  6. | name |
  7. +---------+
  8. | Germany |
  9. +---------+
  10. SELECT 1 row in set (... sec)
  1. cr> SELECT name from countries
  2. ... WHERE match("geo",
  3. ... 'LINESTRING (13.3813 52.5229, 11.1840 51.5497, 8.6132 50.0782, 8.3715 47.9457, 8.5034 47.3685)'
  4. ... ) USING disjoint
  5. ... ORDER BY name;
  6. +--------------+
  7. | name |
  8. +--------------+
  9. | Austria |
  10. | France |
  11. | South Africa |
  12. | Turkey |
  13. +--------------+
  14. SELECT 4 rows in set (... sec)

Exact queries

Exact queries are done using the following scalar functions:

They are exact, but this comes at the price of performance.

They do not make use of the index but work on the GeoJSON that was inserted to compute the shape vector. This access is quite expensive and may significantly slow down your queries.

For fast querying, use the MATCH predicate.

But executed on a limited result set, they will help you get precise insights into your geographic data:

  1. cr> SELECT within(capital_location, shape) AS capital_in_country
  2. ... FROM country;
  3. +--------------------+
  4. | capital_in_country |
  5. +--------------------+
  6. | TRUE |
  7. +--------------------+
  8. SELECT 1 row in set (... sec)
  1. cr> SELECT distance(capital_location, 'POINT(0.0 90.0)') as from_northpole
  2. ... FROM country ORDER BY country_code;
  3. +-------------------+
  4. | from_northpole |
  5. +-------------------+
  6. | 4646930.675034644 |
  7. +-------------------+
  8. SELECT 1 row in set (... sec)
  1. cr> SELECT intersects(
  2. ... {type='LineString', coordinates=[[13.3813, 52.5229],
  3. ... [11.1840, 51.5497],
  4. ... [8.6132, 50.0782],
  5. ... [8.3715, 47.9457],
  6. ... [8.5034, 47.3685]]},
  7. ... shape) as berlin_zurich_intersects
  8. ... FROM country ORDER BY country_code;
  9. +--------------------------+
  10. | berlin_zurich_intersects |
  11. +--------------------------+
  12. | FALSE |
  13. +--------------------------+
  14. SELECT 1 row in set (... sec)

Nonetheless these scalars can be used everywhere in a SQL query where scalar functions are allowed.