Skip to main content

Machine-learning based modelling of spatial and spatio-temporal data (introduction)

Remote sensing is a key method in bridging the gap between local observations and spatially comprehensive estimates of environmental variables. For such spatial or spatio-temporal predictions, machine learning algorithms have shown to be a promising tool to identify nonlinear patterns between locally measured and remotely sensed variables. While easy access to user-friendly machine learning libraries fosters their use in environmental sciences, the application of these methods is far from trivial. This holds especially true for spatio-temporal since its dependencies in space and time bear the risk of overfitting and considerable misinterpretation of the model performance. In this introductory lecture I will introduce the idea of using machine-learning for the (remote sensing based) monitoring of the environment and how they can be applied in R via the caret package. In this context error assessment is a crucial topic and I will show the importance of "target-oriented" spatial cross-validation strategies when working with spatio-temporal data to avoid an overoptimistic view on model performances. As spatio-temporal machine-learning models are highly prone to overfitting caused by misleading predictor variables, I will introduce a forward feature selection method that works in conjunction with target-oriented cross-validation from the CAST package. In summary this talk aims at showing how "basic" spatial machine-learning tasks can be performed in R, but also what needs to be considered for more complex spatio-temporal prediction tasks in order to produce scientifically valuable results. Based on this talk, we will go into a practical session on Tuesday, where machine-learning algorithms will be applied to two different spatial and spatio-temporal prediction tasks.