11 Statistical learning - 11.6 Conclusions - 《[英文] Geocomputation with R》

11.6 Conclusions

11.6 Conclusions

Resampling methods are an important part of a data scientist’s toolbox (James et al. 2013).This chapter used cross-validation to assess predictive performance of various models.As described in Section 11.4, observations with spatial coordinates may not be statistically independent due to spatial autocorrelation, violating a fundamental assumption of cross-validation.Spatial CV addresses this issue by reducing bias introduced by spatial autocorrelation.

The mlr package facilitates (spatial) resampling techniques in combination with the most popular statistical learning techniques including linear regression, semi-parametric models such as generalized additive models and machine learning techniques such as random forests, SVMs, and boosted regression trees (Bischl et al. 2016; Schratz et al. 2018).Machine learning algorithms often require hyperparameter inputs, the optimal ‘tuning’ of which can require thousands of model runs which require large computational resources, consuming much time, RAM and/or cores.mlr tackles this issue by enabling parallelization.

Machine learning overall, and its use to understand spatial data, is a large field and this chapter has provided the basics, but there is more to learn.We recommend the following resources in this direction:

The mlr tutorials on Machine Learning in R and Handling of spatial Data.
An academic paper on hyperparameter tuning (Schratz et al. 2018).
In case of spatio-temporal data, one should account for spatial and temporal autocorrelation when doing CV (Meyer et al. 2018).