14.2 Data and data preparation

All the data needed for the subsequent analyses is available via the RQGIS package.

  1. data("study_area", "random_points", "comm", "dem", "ndvi", package = "RQGIS")

study_area is an sf polygon representing the outlines of the study area.random_points is an sf object, and contains the 100 randomly chosen sites.comm is a community matrix of the wide data format (Wickham 2014b) where the rows represent the visited sites in the field and the columns the observed species.76

  1. # sites 35 to 40 and corresponding occurrences of the first five species in the
  2. # community matrix
  3. comm[35:40, 1:5]
  4. #> Alon_meri Alst_line Alte_hali Alte_porr Anth_eccr
  5. #> 35 0 0 0 0.0 1.000
  6. #> 36 0 0 1 0.0 0.500
  7. #> 37 0 0 0 0.0 0.125
  8. #> 38 0 0 0 0.0 3.000
  9. #> 39 0 0 0 0.0 2.000
  10. #> 40 0 0 0 0.2 0.125

The values represent species cover per site, and were recorded as the area covered by a species in proportion to the site area in percentage points (%; please note that one site can have >100% due to overlapping cover between individual plants).The rownames of comm correspond to the id column of random_points.dem is the digital elevation model (DEM) for the study area, and ndvi is the Normalized Difference Vegetation Index (NDVI) computed from the red and near-infrared channels of a Landsat scene (see Section 4.3.3 and ?ndvi).Visualizing the data helps to get more familiar with it, as shown in Figure 14.2 where the dem is overplotted by the random_points and the study_area.

Study mask (polygon), location of the sampling sites (black points) and DEM in the background.
Figure 14.2: Study mask (polygon), location of the sampling sites (black points) and DEM in the background.

The next step is to compute variables which we will not only need for the modeling and predictive mapping (see Section 14.4.2) but also for aligning the Non-metric multidimensional scaling (NMDS) axes with the main gradient in the study area, altitude and humidity, respectively (see Section 14.3).

Specifically, we will compute catchment slope and catchment area from a digital elevation model using R-GIS bridges (see Chapter 9).Curvatures might also represent valuable predictors, in the Exercise section you can find out how they would change the modeling result.

To compute catchment area and catchment slope, we will make use of the saga:sagawetnessindex function.77get_usage() returns all function parameters and default values of a specific geoalgorithm.Here, we present only a selection of the complete output.

  1. get_usage("saga:sagawetnessindex")
  2. #>ALGORITHM: Saga wetness index
  3. #> DEM <ParameterRaster>
  4. #> ...
  5. #> SLOPE_TYPE <ParameterSelection>
  6. #> ...
  7. #> AREA <OutputRaster>
  8. #> SLOPE <OutputRaster>
  9. #> AREA_MOD <OutputRaster>
  10. #> TWI <OutputRaster>
  11. #> ...
  12. #>SLOPE_TYPE(Type of Slope)
  13. #> 0 - [0] local slope
  14. #> 1 - [1] catchment slope
  15. #> ...

Subsequently, we can specify the needed parameters using R named arguments (see Section 9.2).Remember that we can use a RasterLayer living in R’s global environment to specify the input raster DEM (see Section 9.2).Specifying 1 as the SLOPE_TYPE makes sure that the algorithm will return the catchment slope.The resulting output rasters should be saved to temporary files with an .sdat extension which is a SAGA raster format.Setting load_output to TRUE ensures that the resulting rasters will be imported into R.

  1. # environmental predictors: catchment slope and catchment area
  2. ep = run_qgis(alg = "saga:sagawetnessindex",
  3. DEM = dem,
  4. SLOPE_TYPE = 1,
  5. SLOPE = tempfile(fileext = ".sdat"),
  6. AREA = tempfile(fileext = ".sdat"),
  7. load_output = TRUE,
  8. show_output_paths = FALSE)

This returns a list named ep consisting of two elements: AREA and SLOPE.Let us add two more raster objects to the list, namely dem and ndvi, and convert it into a raster stack (see Section 2.3.3).

  1. ep = stack(c(dem, ndvi, ep))
  2. names(ep) = c("dem", "ndvi", "carea", "cslope")

Additionally, the catchment area values are highly skewed to the right (hist(ep$carea)).A log10-transformation makes the distribution more normal.

  1. ep$carea = log10(ep$carea)

As a convenience to the reader, we have added ep to spDataLarge:

  1. data("ep", package = "spDataLarge")

Finally, we can extract the terrain attributes to our field observations (see also Section 5.4.2).

  1. random_points[, names(ep)] = raster::extract(ep, random_points)