11.7 Exercises

  • Compute the following terrain attributes from the dem datasets loaded with data("landslides", package = "RSAGA") with the help of R-GIS bridges (see Chapter 9):
    • Slope
    • Plan curvature
    • Profile curvature
    • Catchment area
  • Extract the values from the corresponding output rasters to the landslides data frame (data(landslides, package = "RSAGA") by adding new variables called slope, cplan, cprof, elev and log_carea. Keep all landslide initiation points and 175 randomly selected non-landslide points (see Section 11.2 for details).
  • Use the derived terrain attribute rasters in combination with a GLM to make a spatial prediction map similar to that shown in Figure 11.2.Running data("study_mask", package = "spDataLarge") attaches a mask of the study area.
  • Compute a 100-repeated 5-fold non-spatial cross-validation and spatial CV based on the GLM learner and compare the AUROC values from both resampling strategies with the help of boxplots (see Figure 11.5).Hint: You need to specify a non-spatial task and a non-spatial resampling strategy.
  • Model landslide susceptibility using a quadratic discriminant analysis (QDA, James et al. 2013).Assess the predictive performance (AUROC) of the QDA.What is the difference between the spatially cross-validated mean AUROC value of the QDA and the GLM?Hint: Before running the spatial cross-validation for both learners, set a seed to make sure that both use the same spatial partitions which in turn guarantees comparability.
  • Run the SVM without tuning the hyperparameters.Use the rbfdot kernel with (\sigma) = 1 and C = 1.Leaving the hyperparameters unspecified in kernlab’s ksvm() would otherwise initialize an automatic non-spatial hyperparameter tuning.For a discussion on the need for (spatial) tuning of hyperparameters, please refer to Schratz et al. (2018).

References

Zuur, Alain, Elena N. Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Statistics for Biology and Health. New York: Springer-Verlag.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani, eds. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics 103. New York: Springer.

Krainski, Elias, Virgilio Gómez Rubio, Haakon Bakka, Amanda Lenzi, Daniela Castro-Camilo, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2018. Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA.

Muenchow, Jannes, Alexander Brenning, and Michael Richter. 2012. “Geomorphic Process Rates of Landslides Along a Humidity Gradient in the Tropical Andes.” Geomorphology 139-140 (February): 271–84. https://doi.org/10.1016/j.geomorph.2011.10.029.

Zuur, Alain F., Elena N. Ieno, Anatoly A. Saveliev, and Alain F. Zuur. 2017. Beginner’s Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA. Vol. 1. Newburgh, United Kingdom: Highland Statistics Ltd.

Blangiardo, Marta, and Michela Cameletti. 2015. Spatial and Spatio-Temporal Bayesian Models with R-INLA. Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118950203.

Goovaerts, Pierre. 1997. Geostatistics for Natural Resources Evaluation. Applied Geostatistics Series. New York: Oxford University Press.

Hengl, Tomislav. 2007. A Practical Guide to Geostatistical Mapping of Environmental Variables. Luxembourg: Publications Office.

Bivand, Roger, Edzer J Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R. Vol. 747248717. Springer.

Miller, Harvey J. 2004. “Tobler’s First Law and Spatial Analysis.” Annals of the Association of American Geographers 94 (2).

Brenning, Alexander. 2012b. “Spatial Cross-Validation and Bootstrap for the Assessment of Prediction Rules in Remote Sensing: The R Package Sperrorest.” In, 5372–5. IEEE. https://doi.org/10.1109/IGARSS.2012.6352393.

Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary M. Jones. 2016. “Mlr: Machine Learning in R.” Journal of Machine Learning Research 17 (170): 1–5.

Probst, Philipp, Marvin Wright, and Anne-Laure Boulesteix. 2018. “Hyperparameters and Tuning Strategies for Random Forest.” arXiv:1804.03515 [Cs, Stat], April. http://arxiv.org/abs/1804.03515.

Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, and Achim Zeileis. 2004. “Kernlab - an S4 Package for Kernel Methods in R.” Journal of Statistical Software 11 (9). https://doi.org/10.18637/jss.v011.i09.

Cawley, Gavin C., and Nicola LC Talbot. 2010. “On over-Fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation.” Journal of Machine Learning Research 11 (Jul): 2079–2107.

Schratz, Patrick, Jannes Muenchow, Eugenia Iturritxa, Jakob Richter, and Alexander Brenning. 2018. “Performance Evaluation and Hyperparameter Tuning of Statistical and Machine-Learning Models Using Spatial Data.” arXiv:1803.11266 [Cs, Stat], March. http://arxiv.org/abs/1803.11266.

Meyer, Hanna, Christoph Reudenbach, Tomislav Hengl, Marwan Katurji, and Thomas Nauss. 2018. “Improving Performance of Spatio-Temporal Machine Learning Models Using Forward Feature Selection and Target-Oriented Validation.” Environmental Modelling & Software 101 (March): 1–9. https://doi.org/10.1016/j.envsoft.2017.12.001.


  • Package kernlab, pROC, RSAGA and spDataLarge must also be installed although these do not need to be attached.

  • Applying statistical techniques to geographic data has been an active topic of research for many decades in the fields of Geostatistics, Spatial Statistics and point pattern analysis (Diggle and Ribeiro 2007; Gelfand et al. 2010; Baddeley, Rubak, and Turner 2015).

  • The landslide initiation point is located in the scarp of a landslide polygon. See Muenchow, Brenning, and Richter (2012) for further details.

  • The landslides dataset has been used in classes and summer schools.To show how predictive performance of different algorithms changes with an unbalanced and highly spatially autocorrelated response variable, 1360 non-landslide points were randomly selected, i.e., many more absences than presences.However, especially a logistic regression with a log-link, as used in this chapter, expects roughly the same number of presences and absences in the response.

  • Note that package sperrorest initially implemented spatial cross-validation in R (Brenning 2012b).In the meantime, its functionality was integrated into the mlr package which is the reason why we are using mlr(Schratz et al. 2018).The caret package is another umbrella-package (Kuhn and Johnson 2013) for streamlined modeling in R; however, so far it does not provide spatial CV which is why we refrain from using it for spatial data.

  • For a detailed description of the difference between coefficients and hyperparameters, see the ‘machine mastery’ blog post on the subject.

  • See ?parallelStart for further modes and github.com/berndbischl/parallelMap for more on the unified interface to popular parallelization back-ends.