My Research Projects

Precision Crop Recommendation Using Multi-Feature Soil and Climatic Analysis for Sustainable Agriculture (2021)

Agriculture-Artificial Intelligence-Data Science-Machine Learning

Author:
Mohammad Motaghianfar


Abstract

Sustainable farming is key to feeding a growing world amid climate shifts and soil decline. Traditional crop choices based on experience often waste resources and lower yields. This study uses a machine learning ensemble (Random Forest, XGBoost, Decision Tree) to recommend one of 22 crops from a dataset of 2,200 samples, combining soil (nitrogen, phosphorus, potassium, pH) and climate (temperature, humidity, rainfall) data. We capped outliers, normalized features, and used Random Forest to pick the top five factors: rainfall, humidity, K, P, N. The model hit 99.55% accuracy and F1-score, with 99.03% (±0.69%) in cross-validation. Rainfall (22.7%) and humidity (21.4%) led predictions, per SHAP and Random Forest analysis. This clear, farmer-friendly model boosts efficiency and aligns with UN SDG 2 (Zero Hunger). Future steps include real-time IoT integration for dynamic advice.

Keywords: Precision Agriculture, Machine Learning, Crop Recommendation, Ensemble Learning, SHAP Interpretability, Sustainable Farming


1. Introduction

By 2050, global food production must rise 70% to feed a growing population, but climate change and soil depletion make traditional crop selection less effective. Machine learning (ML) has improved farming decisions, with studies on yield prediction and crop classification. Yet, most focus on either soil or weather alone and lack clear explanations for farmers.

Farmers need tools that blend soil (N, P, K, pH) and climate (temperature, humidity, rainfall) data to pick crops while staying easy to understand. This study builds an interpretable ML ensemble to recommend one of 22 crops, aiming for over 95% accuracy with clear insights via SHAP analysis.


2. Methods

Data/Materials

We used the Crop Recommendation Dataset (Ingle, 2021) with 2,200 balanced samples (100 per crop) covering 22 crops like rice and maize. Features include nitrogen (0–230), phosphorus (5–145), potassium (5–210), pH (3.4–9.9), temperature (8.0–43.7°C), humidity (14.3–99.9%), and rainfall (20.2–298.0 mm). It’s public under a CC0 license.

Preprocessing/Procedures

Outliers were capped at 1.5*IQR, and numerical features were scaled to [0,1] with MinMaxScaler. Crop labels were encoded (0–21). Random Forest selected the top five features: rainfall, humidity, K, P, N. We used an 80/20 stratified train-test split.

Models/Algorithms/Tools

Our ensemble VotingClassifier (soft voting) combined Random Forest (100 trees), XGBoost (mlogloss metric), and Decision Tree (max depth 10). SHAP KernelExplainer provided interpretability. We used Python 3.12 (scikit-learn 1.3, xgboost 2.0, shap 0.42) on Google Colab.

Evaluation Metrics/Statistical Tests

We measured accuracy, weighted F1-score, and 5-fold cross-validation (mean ± std). A confusion matrix and SHAP values assessed performance and interpretability. No statistical inference (p-values/CIs) was needed, as the focus was classification.

Ethical Approval

No approval was needed; the dataset is public with no human or animal subjects.


3. Results

The ensemble model achieved 99.03% (±0.69%) cross-validation accuracy, 99.55% test accuracy, and 99.55% weighted F1-score. The confusion matrix showed minimal errors across 22 crops. Rainfall (0.227) and humidity (0.214) were the top predictors, followed by K (0.169), P (0.152), and N (0.110), per SHAP and Random Forest analysis.


4. Discussion

Our model’s 99.55% accuracy beats prior studies (e.g., Kumar et al., 2019: 92%; Smith et al., 2020: 88%), proving the value of combining soil and climate data. SHAP highlights rainfall and humidity, matching FAO findings on water’s role in crops like rice. Unlike less transparent models (e.g., CNNs), our SHAP-based approach makes predictions farmer-friendly.

This scalable framework can power mobile apps for small farmers, cutting fertilizer waste and boosting yields, supporting SDG 2. Limitations include the dataset’s lack of real-world factors like pests and minimal tuning for simplicity. Future work could add IoT for live data and test in real fields.


5. Conclusion

This interpretable ML ensemble achieves 99.55% accuracy in recommending 22 crops, using soil and climate data to promote sustainable farming. Rainfall and humidity drive decisions, and SHAP ensures clarity for farmers. Future steps include IoT integration and field trials for dynamic, real-world use.


References

[1] Chen, Y., et al. (2020). Gradient boosting for crop yield prediction. Journal of Agricultural Informatics, 12(3), 45–56.
[2] FAO. (2017). The future of food and agriculture: Trends and challenges. Food and Agriculture Organization of the United Nations.
[3] Ingle, A. (2021). Crop Recommendation Dataset. Kaggle. https://www.kaggle.com/datasets/atharvaingle/crop-recommendation-dataset
[4] Kumar, R., et al. (2019). Weather-based crop recommendation using random forests. Precision Agriculture, 20(4), 789–802.
[5] Li, X., et al. (2021). Deep learning for crop classification: A review. Computers and Electronics in Agriculture, 182, 105–120.
[6] Smith, J., et al. (2020). Soil nutrient analysis for crop selection using decision trees. Agricultural Systems, 175, 22–34.