7.6 Data input (I)

Executing commands such as sf::st_read() (the main function we use for loading vector data) or raster::raster() (the main function used for loading raster data) silently sets off a chain of events that reads data from files.Moreover, there are many R packages containing a wide range of geographic data or providing simple access to different data sources.All of them load the data into R or, more precisely, assign objects to your workspace, stored in RAM accessible from the .GlobalEnv of the R session.

7.6.1 Vector data

Spatial vector data comes in a wide variety of file formats, most of which can be read-in via the sf function st_read().Behind the scenes this calls GDAL.To find out which data formats sf supports, run st_drivers().Here, we show only the first five drivers (see Table 7.3):

  1. sf_drivers = st_drivers()
  2. head(sf_drivers, n = 5)

Table 7.3: Sample of available drivers for reading/writing vector data (it could vary between different GDAL versions).
namelong_namewritecopyis_rasteris_vectorvsi
ESRI ShapefileESRI ShapefileTRUEFALSEFALSETRUETRUE
GPXGPXTRUEFALSEFALSETRUETRUE
KMLKeyhole Markup Language (KML)TRUEFALSEFALSETRUETRUE
GeoJSONGeoJSONTRUEFALSEFALSETRUETRUE
GPKGGeoPackageTRUETRUETRUETRUETRUE


The first argument of st_read() is dsn, which should be a text string or an object containing a single text string.The content of a text string could vary between different drivers.In most cases, as with the ESRI Shapefile (.shp) or the GeoPackage format (.gpkg), the dsn would be a file name.st_read() guesses the driver based on the file extension, as illustrated for a .gpkg file below:

  1. vector_filepath = system.file("shapes/world.gpkg", package = "spData")
  2. world = st_read(vector_filepath)
  3. #> Reading layer `world' from data source `.../world.gpkg' using driver `GPKG'
  4. #> Simple feature collection with 177 features and 10 fields
  5. #> geometry type: MULTIPOLYGON
  6. #> dimension: XY
  7. #> bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
  8. #> epsg (SRID): 4326
  9. #> proj4string: +proj=longlat +datum=WGS84 +no_defs

For some drivers, dsn could be provided as a folder name, access credentials for a database, or a GeoJSON string representation (see the examples of the st_read() help page for more details).

Some vector driver formats can store multiple data layers.By default, st_read() automatically reads the first layer of the file specified in dsn; however, using the layer argument you can specify any other layer.

Naturally, some options are specific to certain drivers.34For example, think of coordinates stored in a spreadsheet format (.csv).To read in such files as spatial objects, we naturally have to specify the names of the columns (X and Y in our example below) representing the coordinates.We can do this with the help of the options parameter.To find out about possible options, please refer to the ‘Open Options’ section of the corresponding GDAL driver description.For the comma-separated value (csv) format, visit http://www.gdal.org/drv_csv.html.

  1. cycle_hire_txt = system.file("misc/cycle_hire_xy.csv", package = "spData")
  2. cycle_hire_xy = st_read(cycle_hire_txt, options = c("X_POSSIBLE_NAMES=X",
  3. "Y_POSSIBLE_NAMES=Y"))

Instead of columns describing xy-coordinates, a single column can also contain the geometry information.Well-known text (WKT), well-known binary (WKB), and the GeoJSON formats are examples of this.For instance, the world_wkt.csv file has a column named WKT representing polygons of the world’s countries.We will again use the options parameter to indicate this.Here, we will use read_sf() which does exactly the same as st_read() except it does not print the driver name to the console and stores strings as characters instead of factors.

  1. world_txt = system.file("misc/world_wkt.csv", package = "spData")
  2. world_wkt = read_sf(world_txt, options = "GEOM_POSSIBLE_NAMES=WKT")
  3. # the same as
  4. world_wkt = st_read(world_txt, options = "GEOM_POSSIBLE_NAMES=WKT",
  5. quiet = TRUE, stringsAsFactors = FALSE)

Not all of the supported vector file formats store information about their coordinate reference system.In these situations, it is possible to add the missing information using the st_set_crs() function.Please refer also to Section 2.4 for more information.

As a final example, we will show how st_read() also reads KML files.A KML file stores geographic information in XML format - a data format for the creation of web pages and the transfer of data in an application-independent way (Nolan and Lang 2014).Here, we access a KML file from the web.This file contains more than one layer.st_layers() lists all available layers.We choose the first layer Placemarks and say so with the help of the layer parameter in read_sf().

  1. u = "https://developers.google.com/kml/documentation/KML_Samples.kml"
  2. download.file(u, "KML_Samples.kml")
  3. st_layers("KML_Samples.kml")
  4. #> Driver: LIBKML
  5. #> Available layers:
  6. #> layer_name geometry_type features fields
  7. #> 1 Placemarks 3 11
  8. #> 2 Styles and Markup 1 11
  9. #> 3 Highlighted Icon 1 11
  10. #> 4 Ground Overlays 1 11
  11. #> 5 Screen Overlays 0 11
  12. #> 6 Paths 6 11
  13. #> 7 Polygons 0 11
  14. #> 8 Google Campus 4 11
  15. #> 9 Extruded Polygon 1 11
  16. #> 10 Absolute and Relative 4 11
  17. kml = read_sf("KML_Samples.kml", layer = "Placemarks")

All the examples presented in this section so far have used the sf package for geographic data import.It is fast and flexible but it may be worth looking at other packages for specific file formats.An example is the geojsonsf package.A benchmark suggests it is around 10 times faster than the sf package for reading .geojson.

7.6.2 Raster data

Similar to vector data, raster data comes in many file formats with some of them supporting even multilayer files.raster’s raster() command reads in a single layer.

  1. raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
  2. single_layer = raster(raster_filepath)

In case you want to read in a single band from a multilayer file, use the band parameter to indicate a specific layer.

  1. multilayer_filepath = system.file("raster/landsat.tif", package = "spDataLarge")
  2. band3 = raster(multilayer_filepath, band = 3)

If you want to read in all bands, use brick() or stack().

  1. multilayer_brick = brick(multilayer_filepath)
  2. multilayer_stack = stack(multilayer_filepath)

Please refer to Section 2.3.3 for information on the difference between raster stacks and bricks.