Jorge Mendes de Jesus Tomislav Hengl, Gerard B. M. Heuvelink
SoilGrids250m: Global gridded soil information based on machine learning Journal Article
In: PLOS ONE, vol. 12, no. 2, pp. 1-40, 2017.
Abstract | Links | BibTeX | Tags: Soil mapping
@article{10.1371/journal.pone.0169748,
title = {SoilGrids250m: Global gridded soil information based on machine learning},
author = { Tomislav Hengl,Jorge Mendes de Jesus,Gerard B. M. Heuvelink,Maria Ruiperez Gonzalez,Milan Kilibarda,Aleksandar Blagotić,Wei Shangguan,Marvin N. Wright,Xiaoyuan Geng,Bernhard Bauer-Marschallinger,Mario Antonio Guevara,Rodrigo Vargas,Robert A. MacMillan,Niels H. Batjes,Johan G. B. Leenaars,Eloi Ribeiro,Ichsani Wheeler,Stephan Mantel,Bas Kempen},
url = {https://doi.org/10.1371/journal.pone.0169748},
doi = {10.1371/journal.pone.0169748},
year = {2017},
date = {2017-01-01},
urldate = {2017-01-01},
journal = {PLOS ONE},
volume = {12},
number = {2},
pages = {1-40},
publisher = {Public Library of Science},
abstract = {This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods—random forest and gradient boosting and/or multinomial logistic regression—as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10–fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.},
keywords = {Soil mapping},
pubstate = {published},
tppubtype = {article}
}
Gerard B. M. Heuvelink Tomislav Hengl, Bas Kempen
Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions Journal Article
In: PLOS ONE, vol. 10, no. 6, pp. 1-26, 2015.
Abstract | Links | BibTeX | Tags: Soil mapping
@article{10.1371/journal.pone.0125814,
title = {Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions},
author = {Tomislav Hengl ,Gerard B. M. Heuvelink,Bas Kempen,Johan G. B. Leenaars,Markus G. Walsh,Keith D. Shepherd,Andrew Sila,Robert A. MacMillan,Jorge Mendes de Jesus,Lulseged Tamene,Jérôme E. Tondoh},
url = {https://doi.org/10.1371/journal.pone.0125814},
doi = {10.1371/journal.pone.0125814},
year = {2015},
date = {2015-01-01},
urldate = {2015-01-01},
journal = {PLOS ONE},
volume = {10},
number = {6},
pages = {1-26},
publisher = {Public Library of Science},
abstract = {80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008–2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management—organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15–75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data.},
keywords = {Soil mapping},
pubstate = {published},
tppubtype = {article}
}
Hengl, Tomislav; Rossiter, David G; Stein, Alfred
Soil sampling strategies for spatial prediction by correlation with auxiliary maps Journal Article
In: Soil Research, vol. 41, no. 8, pp. 1403–1422, 2003.
Abstract | Links | BibTeX | Tags: Soil mapping
@article{hengl2003soil,
title = {Soil sampling strategies for spatial prediction by correlation with auxiliary maps},
author = {Tomislav Hengl and David G Rossiter and Alfred Stein},
url = {https://www.publish.csiro.au/sr/SR03005},
doi = {10.1071/SR03005},
year = {2003},
date = {2003-01-01},
urldate = {2003-01-01},
journal = {Soil Research},
volume = {41},
number = {8},
pages = {1403--1422},
publisher = {CSIRO Publishing},
abstract = {The paper evaluates spreading of observations in feature and geographical spaces as a key to sampling optimisation for spatial prediction by correlation with auxiliary maps. Although auxiliary data are commonly used for mapping soil variables, problems associated with the design of sampling strategies are rarely examined. When generalised least-squares estimation is used, the overall prediction error depends upon spreading of points in both feature and geographical space. Allocation of points uniformly over the feature space range proportionally to the distribution of predictor (equal range stratification, or ER design) is suggested as a prudent sampling strategy when the regression model between the soil and auxiliary variables is unknown. An existing 100-observation sample from a 50 by 50 km soil survey in central Croatia was used to illustrate these concepts. It was re-sampled to 25-point datasets using different experimental designs: ER and 2 response surface designs. The designs were compared for their performance in predicting soil organic matter from elevation (univariate example) using the overall prediction error as an evaluation criterion. The ER design gave overall prediction error similar to the minmax design, suggesting that it is a good compromise between accurate model estimation and minimisation of spatial autocorrelation of residuals. In addition, the ER design was extended to the multivariate case. Four predictors (elevation, temperature, wetness index, and NDVI) were transformed to standardised principal components. The sampling points were then assigned to the components in proportion to the variance explained by a principal component analysis and following the ER design. Since stratification of the feature space results in a large number of possible points in each cluster, the spreading in geographical space can also be maximised by selecting the best of several realisations.},
keywords = {Soil mapping},
pubstate = {published},
tppubtype = {article}
}
Hengl, Tomislav; Rossiter, David G.
Supervised Landform Classification to Enhance and Replace Photo-Interpretation in Semi-Detailed Soil Survey Journal Article
In: Soil Science Society of America Journal, vol. 67, no. 6, pp. 1810-1822, 2003.
Abstract | Links | BibTeX | Tags: Soil mapping
@article{https://doi.org/10.2136/sssaj2003.1810,
title = {Supervised Landform Classification to Enhance and Replace Photo-Interpretation in Semi-Detailed Soil Survey},
author = {Tomislav Hengl and David G. Rossiter},
url = {https://acsess.onlinelibrary.wiley.com/doi/abs/10.2136/sssaj2003.1810},
doi = {https://doi.org/10.2136/sssaj2003.1810},
year = {2003},
date = {2003-01-01},
urldate = {2003-01-01},
journal = {Soil Science Society of America Journal},
volume = {67},
number = {6},
pages = {1810-1822},
abstract = {A method to enhance manual landform delineation using photo-interpretation to map a larger area is described. Conventional aerial photo-interpretation (API) maps using a geo-pedological legend of 21 classes were prepared for six sample areas totaling 111 km2 in the Baranja region, eastern Croatia. Nine terrain parameters extracted from a digital elevation model (DEM) (ground water depth, slope, plan curvature, profile curvature, viewshed, accumulation flow, wetness index, sediment transport index, and the distance to nearest watercourse) were used to extrapolate photo-interpretation over the entire survey area (1062 km2). The classification accuracy was assessed using the error matrix, calculated by comparing both the whole API maps and point samples, with the results of classification. The first results, using a maximum-likelihood classifier, were 58.2% (hill land), 39.1% (plain), and 45.3% (entire area) reproducibility of the training set. Six classes in the plain were responsible for a large proportion of the misclassifications, due to an insufficiently detailed DEM and the complex nature of landforms (point bar complexes, levees, active channel banks), which cannot be explained with the terrain parameters only. Reproducibility for a simplified legend of 15 classes over the study area was improved to 65.8% (plain), 58.2% (hill land), and 63.4% (entire area) using the whole-API training set. After the simplification of legend (15) and with the iterative (3) selection of point-sample training set, classification was able to reproduce 97.6% (hill land), 86.7% (plain), and 90.2% (entire area) of the training set. The supervised classification showed fine details not achieved by photo-interpretation. The number of manual photo-interpretations that had to be prepared was reduced from 84 to 6. The methodology can be applied by soil survey teams to edit and update current maps and to enhance or replace API for new surveys.},
keywords = {Soil mapping},
pubstate = {published},
tppubtype = {article}
}