Optimizing Crop Selection and Resource Allocation in Multi-Seasonal Indian Agriculture Using Deep Reinforcement Learning (2022)

Author:
Mohammad Motaghianfar

Abstract

Indian farming, with its diverse seasonal cycles, struggles to pick the best crops and use resources efficiently, affecting yields and sustainability. This study uses a Deep Reinforcement Learning (DRL) model called Proximal Policy Optimization (PPO) to boost yields across India’s multiple growing seasons. We worked with a dataset of ~345,407 records from 2000–2015, cleaning and simplifying it for analysis. Our model simulates choosing crops and allocating land, beating a random approach with a reward score of -0.4645 ± 0.0178 versus -0.4794 ± 0.1026, gaining 0.0150 in yield. Key factors like regional patterns (PCA_3) and production levels drove decisions. This DRL approach offers a fresh way to improve farming, with room to grow by refining how rewards are set and adding real-time weather data.

Keywords: Deep Reinforcement Learning, Crop Selection, Resource Allocation, Indian Agriculture, Proximal Policy Optimization, Yield Optimization

1. Introduction

Indian agriculture spans over 140 million hectares across Kharif, Rabi, and Summer seasons, but unpredictable weather and poor crop choices often lower yields. While machine learning has been used in farming, few studies tackle India’s multi-seasonal systems with Deep Reinforcement Learning (DRL). Past work, like Liakos et al. (2018), showed DRL’s promise but focused on single seasons. Elavarasan et al. (2020) used DRL for yield prediction, not resource planning.

Our study fills this gap by using a PPO-based DRL model to optimize crop selection and land use across seasons, aiming to maximize yields with minimal resources. We believe this approach will outperform random strategies and reveal key factors for better farming decisions.

2. Methods

Data/Materials

We used the India Agriculture Crop Production dataset from Kaggle (~345,407 records, 2000–2015), covering state, district, crop, season, area (hectares), production (tonnes), and yield (kg/ha) for over 100 crops.

Preprocessing/Procedures

We filled missing data using averages for numbers and modes for categories. Numerical features (Area, Production, Yield) were scaled with MinMaxScaler. Categorical features (State, District, Crop, Season) were one-hot encoded, then reduced to 10 PCA components (~45% variance explained). We added a yield trend feature based on district/season averages.

Models/Algorithms/Tools

We built a custom Gym environment modeling crop choice (10 crops) and land allocation (0–1) as a Markov Decision Process. The PPO model, built in PyTorch, used 64-unit neural networks for actor, critic, and area prediction, trained over 200 episodes with a 0.001 learning rate and 0.2 clip parameter.

Evaluation Metrics/Statistical Tests

We measured performance with mean cumulative reward, standard deviation, and yield gain (PPO vs. random baseline) over 50 test episodes. Gradient-based feature importance identified key factors. No ethical approval was needed, as we used public data.

3. Results

After preprocessing, the dataset had 3,026 records with no missing values. Yield averaged 0.002353 (max 0.299015). High-yield crops included Coconut (0.200946) and Sugarcane (0.001260). The PPO model scored a mean reward of -0.4645 ± 0.0178, beating the random baseline’s -0.4794 ± 0.1026, with a yield gain of 0.0150. Rewards stabilized over 200 episodes. PCA_3 (1.1858e-09) and Production (8.7803e-10) were the top decision drivers.

4. Discussion

The PPO model outperformed random crop choices, gaining 0.0150 in yield, proving DRL’s potential for multi-seasonal farming. Unlike studies on irrigation or single-season prediction, our work tackles India’s complex seasonal dynamics. The modest gain suggests the reward setup (yield * area – 0.1 * area) may undervalue high-yield choices due to scaling. PCA_3 and Production highlight regional and productivity factors, aligning with farming realities.

This model offers a scalable tool for precision agriculture, helping farmers plan crops and policymakers allocate resources. However, the simple reward function and random state changes limit real-world use. Future work could use dynamic rewards and add live climate data for better results.

5. Conclusion

Our PPO-based DRL model optimizes crop selection and land use in Indian farming, modestly outperforming random strategies. Regional patterns and production levels drive decisions. Future improvements, like better reward designs and weather integration, could make this a powerful tool for sustainable agriculture.

References

[1] Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press.
[2] Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors, 18(8), 2674. https://doi.org/10.3390/s18082674
[3] Bu, F., & Wang, X. (2019). A smart agriculture IoT system based on deep reinforcement learning. Future Generation Computer Systems, 99, 500–507. https://doi.org/10.1016/j.future.2019.04.041
[4] Elavarasan, D., & Vincent, P. M. D. R. (2020). Crop yield prediction using deep reinforcement learning model for sustainable agrarian applications. IEEE Access, 8, 86886–86901. https://doi.org/10.1109/ACCESS.2020.2992480

My Research Projects