OpenLandMap: using Machine Learning for global good

At OpenGeoHub and EnvirometriX, we recognize that machine learning, combined with AI, is a game-changer for both science and business. It’s having a major impact on our daily lives in areas such as healthcare, security, web technology, self-driving vehicles, and many other fields. But it’s still relatively under-used in areas such as landscape planning, food production and land restoration. Even the gaming industry has made more progress in machine learning than natural resource conservation. There is common hope that machine learning will be used more for civil applications and for reaching the UN Sustainable Development Goals, and less, for example, for military purposes, but it is an open challenge.

To bridge this gap OpenGeoHub and EnvirometriX, partnering with and, have started a global predictive vegetation and soil mapping system available via We use state-of-the-art machine-learning to enable automated mapping applications so that complete and consistent global data can be used by researchers and businesses. We aim at not only at increasing the usability of national and global data sets, but also at opening the floor for collaboration and data sharing. We have recently opened most of the code we produced to generate training points and covariate layers, and which you can follow from:


Under the folder “Compiled ESS point data sets”, you can find up-to-date complete and consistent training point data sets that have been fully documented using R Markdown notebooks. Once the points are filtered for artifacts and bind to a common format, we overlay the points vs some 450 global 250 and 100 m spatial resolution layers and provide the modeling-ready data sets (classification- or regression-matrix) where various target variables such as physical or chemical soil properties, native vegetation observations, meteorological variables and similar, can be correlated with climatic, terrain, landform, lithological, vegetation-based indices primarily based on global remote sensing data products. The classification- and/or regression-matrices can be accessed publicly via the gitlab group above in RDS format (R Data Serialized, a serialized version of the dataset with internal gzip compression) and can be loaded directly into R software for statistical computing.

Global compilation of soil chemical and physical properties (soil profiles / soil samples).

Are you keen on testing own Machine Learning or statistical learning algorithms to solve global environmental and ecological problems? Use some of the global data sets and regression and classification problems published on and help us make the most accurate and most detailed world maps of soil, vegetation, climate, land potential and land degradation trends. Are you aware of any additional data sets that could be added to this compilation? Would you like to contribute to this initiative more systematically? Please open an issues or contact the main developers via:

Note: OpenGeoHub is currently involved in multiple international projects including the “Geo-harmonizer: EU-wide automated mapping system for harmonization of Open Data based on FOSS4G and Machine Learning” (EU-wide data sets), “Soil Spectroscopy for global good” (global data sets) and the H2020 project “AgriCapture — Developing EO-powered services to promote soil carbon sequestration through regenerative agriculture” (EU-wide data sets).

Spread the love