Skip to main content
x

The importance of spatial cross-validation in predictive modeling

In the first part of this tutorial we will demonstrate the importance of spatial cross-validation and introduce mlr building blocks. We will assess the predictive performance of a random forest model that predicts the floristic composition of Mt. Mongón in northern Peru. In the second part of the tutorial, we will show one way of assessing optimal hyperparameters which are subsequently used for the predictive mapping of the floristic composition.

Software requirements: RStudio, R and its packages mlr, parallelMap, sf, raster, dplyr, mapview, vegan

Materials: The code and presentations can be found in the spatial_cv folders in our GeoStats geocompr repository (https://github.com/geocompr/geostats_18).

References:

  • Lovelace, R., Nowosad, J., and J. Muenchow (forthcoming). Geocomputation with R. Chapter 11: Statistical Learning for geographic data. https://bookdown.org/robinlovelace/geocompr/spatial-cv.html. CRS Press.
  • Lovelace, R., Nowosad, J., and J. Muenchow (forthcoming). Geocomputation with R. Chapter 14: Ecological use case. https://bookdown.org/robinlovelace/geocompr/eco.html. CRS Press.
  • Probst, P., Wright, M., & Boulesteix, A.-L. (2018). Hyperparameters and Tuning Strategies for Random Forest. ArXiv:1804.03515 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1804.03515
  • Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., & Brenning, A. (2018). Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data. ArXiv:1803.11266 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1803.11266