Aug 26 – 30, 2024
The Couvent des Jacobins
Europe/Paris timezone

Predicting and interpreting potato yield at field scales using Random Forest model for a whole country

Not scheduled
15m
Les Dortoirs (1st floor) (The Couvent des Jacobins)

Les Dortoirs (1st floor)

The Couvent des Jacobins

Rennes, France
Poster Synergies of technologies Poster session #1

Speaker

Qiuhong Huang

Description

Introduction:
The random forest (RF) machine learning model has been applied successfully for crop yield prediction in recent years (Silva et al., 2023). But extrapolation of predictions to other areas or other years where the feature space or time is considerably different from that of the training data very likely produces unreliable results (Wadoux et al., 2021). Yield measurement error can also be an important source of uncertainty and needs to be quantified. Furthermore, prediction uncertainty arising from limited knowledge about the nutrient supply (soil and fertilizer)-uptake-yield relationship (de Wit, 1953) may affect model performance and interpretation. If yield is predicted directly from soil nutrient contents, fertilizer application and other environmental variables, the contribution of uptake to yield would be masked. To the best of our knowledge, prediction uncertainties due to the lack of understanding about the nutrient-uptake-yield relationship have not been clearly addressed. The objective here is to evaluate the performance of trained RF models and to quantify the contributions of different uncertainty sources with potato cultivation in China as an example.

Materials and Methods:
A dataset of 2183 field-year observations was collected from a total of 491 fertilizer experiments in nine Chinese provinces from 2017 to 2019. The RF model was trained using 38 explanatory variables and default hyperparameters. The importance of selected variables was assessed by variable importance plots and partial dependence plots. The explanatory variables were then classified into five groups. The relative importance of each group was calculated by dividing the model efficiency coefficient (MEC) of a RF model using only that variable group by the sum of the MECs from all RF models using the five variable groups individually (Torres-Matallana et al., 2021). Since sampling locations in the study area were spatially and temporally clustered, model performance was evaluated not only by 10-fold cross-validation (CV) but also by leave-block-out (LBOCV), leave-site-out (LSOCV), and leave-year-out cross-validation (LYOCV).

Results:
Model performance decreased considerably when extrapolating over space and time. From 10-fold CV to LBOCV to LSOCV to LYOCV, the root mean square error (RMSE) increased from 3.4 to 8.3 to 9.1 to 10 t/ha, while the MEC decreased from 0.92 to 0.64 to 0.53 to 0.43. Cumulated sunshine duration and topography position index were the most important explanatory variables, while for the importance of variable group, weather and management groups were more important for yield prediction than soil, topography and fertilizer groups. Actual fresh potato yield ranged from 7.2 to 76 t/ha. The standard deviation of the yield measurement error was estimated as 3.1 t/ha, which equals 31% of the RMSE for LSOCV. For LSOCV, incorporating uptakes without fertilization, uptakes, and yields without fertilization as covariates reduced the RMSE by 5.6% to 50% t/ha, increased MEC by 9.6 to 64%, and decreased bias by 6.3% to 65% t/ha.

Conclusion:
The fitted RF models could explain a substantial part of the potato yield variability in China, although there was a considerable residual error when extrapolating model predictions to other areas or years. Yield measurement error accounted for one-third of the residual error, while incorporating uptakes without fertilization, uptakes, and yields without fertilization as covariates significantly improved model prediction performance.

References:
de Wit, C. T., 1953, A physical theory on placement of fertilizers, Wageningen University and Research ProQuest Dissertations Publishing.
Silva, J. V., Heerwaarden, J. V., Reidsma, P., Laborte, A. G., Tesfaye, K., and Ittersum, M. K. V., 2023. Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy. Field Crops Res 302, 109063.
Torres-Matallana, J. A., Leopold, U., and Heuvelink, G. B. M., 2021. Multivariate autoregressive modelling and conditional simulation for temporal uncertainty analysis of an urban water system in Luxembourg. Hydrol. Earth Syst. Sci. 25 (1), 193-216.
Wadoux, A. M. J. C., Heuvelink, G. B. M., de Bruin, S., and Brus, D. J., 2021. Spatial cross-validation is not the right way to evaluate map accuracy. Ecol. Modell. 457, 109692.

Keywords Random Forest; Extrapolation; Yield; Uptake; Uncertainty

Primary authors

Qiuhong Huang Prof. Gerard Heuvelink (Wageningen university; ISRIC - World Soil Information) Tom Schut (Wageningen university) Johan Leenaars (ISRIC - World Soil Information)

Presentation materials

There are no materials yet.