My Research Projects

A Hybrid Deep Ensemble and Evidential Regression Framework for Dual Uncertainty Quantification in Agricultural Yield Prediction (2024)

Agriculture-Artificial Intelligence-Data Science-Deep Learning

Author:
Mohammad Motaghianfar


Abstract

Predicting crop yields accurately is vital for ensuring food security worldwide, but most machine learning models only give a single number without showing how certain they are. This makes it hard for farmers and decision-makers to manage risks. Uncertainties in yield predictions come from two sources: the model’s limitations (epistemic) and natural variability in data (aleatoric). Current methods often mix these up, which isn’t ideal.

We developed a new method that combines Deep Ensembles and Evidential Regression to measure both types of uncertainty clearly. We tested it on a synthetic dataset with 16,000 samples, including factors like soil quality, weather, and farming practices. Our model performed well, with an error rate (RMSE) of 194.27 kg/hectare and an R² of 0.85. It also provided reliable 95% prediction intervals, covering 84.2% of actual outcomes. Most of the uncertainty (99.97%) came from natural variability, showing how unpredictable farming can be.

This approach offers trustworthy predictions and uncertainty estimates, helping farmers make informed, risk-aware decisions. Future work should test this on real-world data and incorporate expert farming knowledge to improve results.

Keywords: uncertainty quantification, evidential deep learning, crop yield prediction, deep ensembles, precision agriculture


1. Introduction

Predicting crop yields is tricky because of unpredictable factors like weather, soil differences, and farming choices. Most machine learning models just give a single estimate without indicating how reliable it is, which isn’t very helpful for farmers or policymakers who need to weigh risks.

Recent advances in uncertainty quantification, like Bayesian Neural Networks and Deep Ensembles, try to address this but often lump all uncertainties together. A newer technique, evidential deep learning, shows promise in separating model-related and data-related uncertainties, but it hasn’t been widely used in agriculture.

Our research tackles this gap by creating a hybrid method that blends Deep Ensembles with Evidential Regression. Our goals were to: (1) build an efficient way to measure both types of uncertainty, (2) test it on agricultural yield data, and (3) provide clear uncertainty estimates to support real-world decisions.


2. Methods

Data and Materials

We used a synthetic dataset called the Synthetic Agricultural Yield Prediction Dataset, which includes 16,000 training samples and 4,000 test samples. Each sample has six features: soil quality, seed variety, fertilizer amount, sunny days, rainfall, and irrigation schedule, with crop yield (in kg/hectare) as the target.

Preprocessing and Procedures

We standardized the features to ensure consistency and converted categorical data (like seed variety) into a format the model could use. The training data was split 80-20 for training and validation, and we scaled the yield values to make the model’s training process smoother.

Models and Algorithms

Our method uses five neural networks with the same structure (64-32-4 nodes) but different starting points to add variety. Each network has an evidential output layer that predicts four parameters to model uncertainty through a Normal-Inverse-Gamma distribution. We used a custom loss function based on the Student’s t-distribution to train the model.

Evaluation Metrics

We measured the model’s accuracy with RMSE (root mean square error) and MAE (mean absolute error). For uncertainty, we checked how often the actual yields fell within the predicted 95% confidence intervals and broke down the uncertainty into its aleatoric and epistemic parts.


3. Results

Our model achieved an RMSE of 194.27 kg/hectare and an MAE of 155.36 kg/hectare, with an R² of 0.85, showing strong predictive accuracy. The 95% prediction intervals covered 84.2% of actual outcomes, meaning the model’s confidence estimates were fairly reliable but slightly cautious.

Most of the uncertainty (99.97%) came from aleatoric sources, like weather variability, with only a tiny fraction (0.03%) from the model itself. This suggests the model captured the data’s patterns well. The average 95% prediction interval was 876.4 kg/hectare, covering 71.3% of the data range (57.5–1385.1 kg/hectare).


4. Discussion

Our hybrid approach successfully balances accurate predictions with clear uncertainty estimates. The dominance of aleatoric uncertainty makes sense, as farming is heavily influenced by unpredictable factors like weather. The low epistemic uncertainty suggests our ensemble of models worked well together, though it might also mean they were too similar or that the dataset was comprehensive enough.

Compared to older methods like Bayesian Neural Networks, our approach is faster and doesn’t require heavy computations, making it practical for real-time farming decisions. Farmers could use these uncertainty estimates to make smarter choices, like adjusting insurance or resources based on how confident the predictions are.

However, the model’s confidence intervals were slightly off (84.2% instead of the expected 95%), possibly due to how we set up the evidential layer or the limited variety in our ensemble. Also, since we used synthetic data, real-world results might differ, and the low model uncertainty might not hold for more complex farming scenarios.


5. Conclusion

We developed a new method that combines Deep Ensembles and Evidential Regression to predict crop yields while clearly separating model and data uncertainties. This approach delivers accurate predictions and actionable uncertainty estimates, paving the way for smarter, risk-aware farming decisions.

Moving forward, we plan to: (1) test this method on real-world datasets, (2) add time-based patterns using advanced network designs, (3) improve the variety in our ensemble to better capture model uncertainty, and (4) integrate expert farming knowledge to fine-tune uncertainty estimates.


References

[1] Khaki, S., & Wang, L. (2019). Crop yield prediction using deep neural networks. Frontiers in Plant Science, 10, 621.
[2] Van Klompenburg, T., Kassahun, A., & Catal, C. (2020). Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture, 177, 105709.
[3] Kendall, A., & Gal, Y. (2017). What uncertainties do we need in Bayesian deep learning for computer vision? Advances in Neural Information Processing Systems, 30.
[4] Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30.
[5] Amini, A., Schwarting, W., Soleimany, A., & Rus, D. (2020). Deep evidential regression. Advances in Neural Information Processing Systems, 33.
[6] Synthetic Agricultural Yield Prediction Dataset. (n.d.). Retrieved from https://www.kaggle.com/datasets/blueloki/synthetic-agricultural-yield-prediction-dataset
[7] Lobell, D. B., & Field, C. B. (2007). Global scale climate–crop yield relationships and the impacts of recent warming. Environmental Research Letters, 2(1), 014002.