Hours: Full time (38 hours per week)
Location: On-site in Doorwerth
Working Hours: to be set between 09:00 and 18:30
Internship allowance: 500 EUR/month (with an additional travel allowance if this is not already covered by the student card) for a full-time position
Employer: Stichting OpenGeoHub
We are looking for an intern to assist the Digital Soil Mapping (DSM) team at OpenGeoHub Foundation in exploring the role of multivariate machine learning (ML) approaches in soil prediction modeling. Are you passionate about geospatial data science and machine learning applications for environmental research? If your answer is yes, this might be the internship for you!
The internship can start any time between November 2025 and January 2026, for a flexible duration between 4-6 months. The internship allowance is 500 EUR/month (with an additional travel allowance if this is not already covered by the student card) for a full-time position (38 hours/week).
Project Background
OpenGeoHub Foundation is a non-profit organization that promotes free and open geodata and facilitates open science development. OpenGeoHub is home to one of the only full cloud-free open-access Landsat archives in Europe and provides open geospatial products that support global initiatives such as the European Green Deal, UNCCD, and Land and Carbon Lab.
This internship focuses on testing, comparing, and developing ML models that could contribute to the next generation of pan-European high-resolution soil maps. You will have access to OpenGeoHub’s extensive geodata archive and receive technical support from our DSM experts.
Digital Soil Mapping (DSM) uses remote sensing, terrain modeling, and machine learning to predict soil properties across large regions. Traditional DSM models typically predict each soil property independently. However, multivariate machine learning can capture correlations between soil attributes (e.g., organic carbon, pH, clay content), potentially improving spatial consistency and predictive accuracy.At OpenGeoHub, one flagship DSM product is the SoilHealthDataCube (SHDC) — an EU-wide, 30 m resolution data stack covering major soil properties from 2000 to 2024+, as illustrated below. (Curious? Visit EcoDataCube.eu)
About your role
Although SHDC is already highly advanced, there’s room for innovation. Currently, each soil property is modeled separately (univariate approach), which can lead to mismatches among properties in the resulting maps — partly due to differing data availability, but also because independent models cannot exploit inter-property relationships.
We therefore would like to explore whether multivariate Random Forest (RF) models can help overcome these inconsistencies. Your main tasks would be:
- Task 1: Investigate which soil properties benefit most from multivariate modeling — identify promising combinations that improve both predictive accuracy and map coherence.
- Task 2: Compare the performance of multivariate vs. univariate Random Forests in predicting soil properties.
- Task 3 (Optional, if time allows and you are interested): Apply the models to selected test regions across Europe to assess spatial differences in resulting soil maps.
What we expect from you
Must-have:
- Good command of English – OpenGeoHub is an international organization collaborating 🙂
- Enrolled at a university (in geoinformation, soil science, environmental science, data science, geoscience, remote sensing, or related fields)
- A bit programming experience in Python
- Basic understanding of GIS and remote sensing, and familiarity with QGIS (basic operations)
- Availability for full-time work (38 hours/week) in our Doorwerth office (on-site at 4 days a week)
Nice-to-have:
- Basic knowledge of machine learning methods and concepts
- Familiarity with Python libraries such as scikit-learn, pandas, and numpy
- Some soil science knowledge (would be very great)
- Enthusiasm for open science and environmental data applications
What we can give you
- Hands-on experience with large-scale environmental ML workflows (mainly Python).
- Working with continental-scale geospatial datasets (e.g., SHDC, remote sensing, terrain data).
- Insights into modern DSM pipelines and reproducible research practices at OpenGeoHub.
- Many parties are cooked by BBQ masters (at least 1 hopefully).
Application
Interested? Send emails to the contact person Xuemeng Tian (xuemeng.tian@opengeohub.org / xuemeng.tian@wur.nl), including:
- A curriculum vitae, including references with contact details and your availability;
- Proof of enrollment is valid for the entire duration of the internship;
- Your institutional internship liaison (contact and brief procedure).
This work will be mainly supervised by Xuemeng, with support from the DSM team at OpenGeoHub Foundation.