Comparison of Agricultural Drought Monitoring Based on Three Machine Learning Methods: Postprint
WANG Xiaoyan
Submitted 2022-01-26 | ChinaXiv: chinaxiv-202201.00093

Abstract

Frequent droughts inflict serious damage upon the economy and agricultural production of Gansu Province; therefore, establishing accurate and reliable agricultural drought monitoring models through advanced methodologies is of paramount importance for drought prevention and mitigation efforts in the province. This study develops three distinct agricultural drought monitoring models based on three machine learning approaches—Random Forest (RF), BP neural network, and Support Vector Machine (SVM)—utilizing Vegetation Condition Index (VCI), Temperature Condition Index (TCI), Vegetation Supply Water Index (VSWI), and Precipitation Condition Index (PCI) derived from multi-source remote sensing data for the period of April–October during 2002–2019 in Gansu Province, in conjunction with DEM, soil available water capacity (AWC), and climate type as independent variables, while employing the 3-month Standardized Precipitation Evapotranspiration Index (SPEI_3) from meteorological stations as the dependent variable. The research analyzes and compares these models to identify the optimal approach for monitoring agricultural drought in Gansu Province, and further investigates the applicability of machine learning-based models across different environmental conditions. The results demonstrate that among the three constructed machine learning models, the Random Forest model exhibits a high average R² value (0.86) with minimal errors (RMSE of 0.40 and MAE of 0.31), thereby achieving superior agricultural drought monitoring performance compared to the BP neural network and Support Vector Machine models. Furthermore, the three machine learning models constructed separately for arid and humid environments all exhibit enhanced monitoring capabilities in humid environments (R² > 0.82), while the Random Forest model demonstrates stronger drought monitoring performance than the other two models across both environmental contexts. These findings provide novel scientific methodologies for agricultural drought monitoring and assessment in Gansu Province and hold significant implications for agricultural drought research.

Full Text

Comparative Agricultural Drought Monitoring Based on Three Machine Learning Methods

WANG Xiaoyan, LI Jing, XING Liting
College of Geography and Environmental Science, Northwest Normal University, Lanzhou 730070, Gansu, China

Abstract

Frequent drought disasters have caused serious damage to the economy and agricultural production of Gansu Province, making it crucial to establish accurate and reliable agricultural drought monitoring models using advanced methods. This study employs three machine learning methods—Random Forest (RF), BP Neural Network (BP), and Support Vector Machine (SVM)—to construct agricultural drought monitoring models for Gansu Province. Using monthly multi-source remote sensing data, we derived the Vegetation Condition Index (VCI), Temperature Condition Index (TCI), Vegetation Water Supply Index (VWSI), Precipitation Condition Index (PCI), Available Water Capacity (AWC), and climate type as independent variables, with the 3-month Standardized Precipitation Evapotranspiration Index (SPEI_3) from meteorological stations as the dependent variable. Three distinct agricultural drought monitoring models were developed and compared to identify the optimal model for monitoring agricultural drought in Gansu Province, while further investigating the applicability of machine learning models under different environmental conditions. The results demonstrate that among the three machine learning models, the Random Forest model achieved the highest average coefficient of determination (R² = 0.86) with the smallest errors (RMSE = 0.40, MAE = 0.31), outperforming both the BP Neural Network and Support Vector Machine models in agricultural drought monitoring. The three machine learning models constructed separately for dry and humid environments all showed superior monitoring capability in humid environments (R² > 0.82), with the Random Forest model demonstrating stronger drought monitoring performance than the other two models in both environments. These findings provide a new scientific approach for agricultural drought monitoring and assessment in Gansu Province and hold significant importance for agricultural drought research.

Keywords: agricultural drought; machine learning; SPEI; MODIS

Introduction

Drought is one of the most frequent, persistent, and widespread meteorological disasters. Agricultural drought arises from soil moisture deficits caused by below-normal precipitation or above-average evapotranspiration, leading to severe economic losses. Accurate, real-time or near-real-time agricultural drought monitoring is therefore essential. Drought indices are critical tools for monitoring and analyzing agricultural drought, particularly for quantifying severity and spatial extent. Based on data sources, these indices are typically classified into two categories: those derived from meteorological station data and those from remote sensing data.

Commonly used station-based drought indices include the Palmer Drought Severity Index (PDSI), Crop Drought Identification Index (CDDI), Composite Index (CI), Standardized Precipitation Index (SPI), and Standardized Precipitation Evapotranspiration Index (SPEI). The SPEI is widely applied because it considers both precipitation and temperature, enabling monitoring of different drought types across various regions. While station-based indices effectively monitor drought severity around meteorological stations, remote sensing data offer advantages of broad coverage, high spatial resolution, and strong timeliness, making remote sensing-based drought indices more reliable for large-area spatiotemporal monitoring.

Current remote sensing drought indices include the Normalized Difference Vegetation Index (NDVI), Vegetation Condition Index (VCI), Temperature Condition Index (TCI), Normalized Multi-band Drought Index (NMDI), Normalized Difference Water Index (NDWI), and Vegetation Water Supply Index (VWSI). Initially, single-factor remote sensing indices were used for drought monitoring, but agricultural drought processes are complex and influenced by numerous factors. Single-factor indices often fail to capture the multi-type and multi-scale characteristics of drought. Consequently, advanced methods for integrating multi-source data to construct comprehensive drought monitoring models have become a research frontier.

From a methodological perspective, integrated drought monitoring models have been developed using weighted combination, multivariate joint distribution, and machine learning approaches. Weighted combination methods require linear relationship assumptions and weight allocation based on expert judgment or correlation analysis, which may not capture the true non-linear relationships among drought factors. Joint distribution methods preserve marginal distributions of individual indices and describe complex dependencies, but become difficult to implement with numerous variables. Machine learning methods have emerged as a promising alternative, capable of handling complex non-linear relationships while efficiently integrating multi-source data to establish comprehensive drought monitoring models.

However, machine learning models exhibit regional variability in agricultural drought monitoring performance. Given Gansu's complex climate types and frequent drought occurrence, this study employs Random Forest, BP Neural Network, and Support Vector Machine methods to compare the applicability of three integrated agricultural drought monitoring models in Gansu Province. We further investigate model performance under different environmental conditions and analyze the relative importance of various drought-causing factors, providing new methods and scientific references for agricultural drought monitoring research.

1.1 Study Area

Gansu Province is located in northwestern China between 32°11'–42°57'N and 92°13'–108°46'E. The region has a typical temperate continental climate characterized by low precipitation and high evaporation. Agricultural drought occurs almost annually in Gansu, with average affected areas reaching 82.68×10⁴ hectares and causing significant grain yield losses. Meteorological data were obtained from the China Meteorological Data Network (http://data.cma.cn/). Based on data completeness and station distribution across cultivated land, 25 meteorological stations were selected for this study (Fig. 1). Station data include monthly average temperature and precipitation from 2000–2019 for calculating SPEI at various time scales.

1.2 Data Processing

We selected factors from meteorological, soil, and vegetation categories. PCI served as the meteorological factor, AWC as the soil factor, and VCI, TCI, and VWSI as vegetation factors. Considering spatial heterogeneity in moisture, temperature, and vegetation coverage across different terrains, as well as high variability in soil productivity and drought resistance, we included Digital Elevation Model (DEM), AWC, and Chinese climate zoning as auxiliary factors.

The study period was 2000–2019 with monthly temporal resolution and 1 km spatial resolution. Data sources include:
- MODIS land surface temperature (LST) from MOD11A2 product: 8-day temporal resolution, 1 km spatial resolution, aggregated to monthly averages
- MODIS NDVI and EVI from MOD13A2 product: 16-day temporal resolution, 1 km spatial resolution
- Precipitation from TRMM3B43 product: monthly temporal resolution, 0.25° spatial resolution
- DEM, climate zoning, soil sand/clay content, and land cover data: 1 km resolution from the Chinese Academy of Sciences Resource and Environmental Science Data Center (https://www.resdc.cn/)

All MODIS and TRMM data were obtained from https://ladsweb.modaps.eosdis.nasa.gov/. Processing involved converting TRMM precipitation rates to monthly totals, cropping to the study area, projecting to a consistent coordinate system, and resampling to 1 km resolution using nearest-neighbor method. Remote sensing drought indices were calculated as shown in Table 1. Soil AWC was estimated using the empirical linear model of Gupta and Larson (1979) based on soil sand and clay content percentages.

Table 1. Remote sensing drought index calculation formulas

Index Formula Description VCI (NDVIᵢ - NDVIₘᵢₙ)/(NDVIₘₐₓ - NDVIₘᵢₙ) Vegetation Condition Index TCI (LSTₘₐₓ - LSTᵢ)/(LSTₘₐₓ - LSTₘᵢₙ) Temperature Condition Index PCI (TRMMᵢ - TRMMₘᵢₙ)/(TRMMₘₐₓ - TRMMₘᵢₙ) Precipitation Condition Index VWSI EVI/LST Vegetation Water Supply Index

Note: Subscript i denotes the i-th month; max and min represent maximum and minimum values for month i across all years.

1.3 Machine Learning Methods

1.3.1 Random Forest

Random Forest is an ensemble learning method that constructs multiple decision trees during training. For regression tasks, the final prediction is the average of all individual tree predictions. The algorithm uses bootstrap sampling to create diverse training subsets and randomly selects feature subsets at each node to build decorrelated trees, effectively reducing prediction variance. Two key parameters require optimization: the number of decision trees (n) and the number of preselected variables at each node (m). While n must be sufficiently large to avoid underfitting, m is typically set to √P or P/3, where P is the total number of candidate features.

1.3.2 BP Neural Network

The BP Neural Network is a multilayer feedforward network trained by error backpropagation, consisting of an input layer, hidden layers, and an output layer. The training process comprises forward propagation and backward error propagation. During forward propagation, input signals pass through hidden layers to output nodes via nonlinear transformations. If actual outputs deviate from expected values, errors are backpropagated to adjust network weights and thresholds iteratively. The Levenberg-Marquardt algorithm (trainlm) was selected for its fast training speed, though it requires substantial memory.

1.3.3 Support Vector Machine

Support Vector Machine is a supervised learning method that maps low-dimensional, linearly inseparable data to a high-dimensional feature space using nonlinear transformations, where it becomes linearly separable. Based on structural risk minimization principles, SVM constructs optimal separating hyperplanes for classification or regression. The radial basis function (RBF) kernel was employed for its high accuracy and computational efficiency. Two critical parameters were optimized: kernel parameter g (affecting training/prediction speed) and penalty coefficient C (controlling overfitting/underfitting trade-off).

Results

2.1 Drought Factor Analysis

To assess the monitoring capability of individual remote sensing drought indices and the necessity of multi-source data integration, we extracted SPEI values at meteorological stations and performed Pearson correlation analysis with remote sensing indices at 1-, 3-, and 6-month time scales. As shown in Table 2, all correlations except VCI during vegetation dormancy passed significance tests at the 0.01 level.

Table 2. Correlation analysis between remote sensing indices and SPEI at different time scales

Index SPEI_1 SPEI_3 SPEI_6 VCI 0.42** 0.51** 0.48** TCI 0.38** 0.45** 0.43** PCI 0.68** 0.72** 0.69** VWSI 0.45** 0.52** 0.49** AWC 0.44** 0.53** 0.50**

Note: ** indicates significance at the 0.01 level; * indicates significance at the 0.05 level. VCI = Vegetation Condition Index, TCI = Temperature Condition Index, PCI = Precipitation Condition Index, VWSI = Vegetation Water Supply Index.

The precipitation index (PCI) showed the highest correlation across all time scales, peaking in July-August, indicating that precipitation-based drought indices are most reliable during rainy seasons. The temperature index (TCI) exhibited higher correlations during early vegetation growth periods (March-April) than mid-to-late stages. Vegetation indices (VCI, VWSI) showed correlations that increased initially then decreased, reaching maximum values during peak vegetation growth (July-August), demonstrating stronger drought monitoring capability in well-vegetated areas. Soil moisture index (AWC) displayed a similar pattern to vegetation indices, confirming its suitability for drought monitoring in densely vegetated regions.

These analyses reveal limitations of single-factor remote sensing indices. While PCI shows high correlation, precipitation alone cannot fully represent drought conditions. Therefore, applying advanced machine learning methods to integrate multiple drought-causing factors is essential for constructing comprehensive agricultural drought monitoring models.

2.2 Model Construction and Validation

From the 25 meteorological stations, we randomly selected 5 different groups of 5 stations each as validation datasets, with the remaining 20 stations serving as training data for each group. Each training sample included monthly data from 2000–2019, comprising PCI, TCI, VWSI, AWC, and climate type as independent variables and SPEI_3 as the dependent variable. This yielded 5 groups × 3 methods × 12 months = 180 agricultural drought monitoring models.

For Random Forest, parameter optimization involved selecting the number of trees (n) and features per node (m). While n must exceed 1000 for model stability, m should be less than the total feature count. Through systematic testing, n = 1000 and m = 2 produced the minimum error. For SVM with RBF kernel, optimal parameters were g = 0.01 and C = 100, balancing model stability and generalization. The BP network used the trainlm training function for fastest convergence.

Model validation involved correlating simulated SPEI_3 values with observed data. Figure 2 shows scatter plots for one validation group, where all three methods achieved R² ≥ 0.78, confirming their applicability for drought monitoring. The Random Forest model demonstrated the strongest agreement with observed values, while SVM outperformed BP.

2.3 Comparison of Three Machine Learning Methods

Statistical comparison using R², RMSE, and MAE across all validation groups (Table 3) revealed consistent performance rankings. Random Forest achieved the highest average R² (0.86) with lowest errors (RMSE = 0.40, MAE = 0.31), indicating superior explanatory power for SPEI_3. BP Neural Network showed R² ranging 0.70–0.92, while SVM ranged 0.73–0.91. Random Forest's simulation values differed least from observed SPEI_3, demonstrating the best overall performance, followed by SVM and then BP.

Table 3. Statistical comparison of three machine learning methods on validation data

Method R² RMSE MAE Random Forest 0.86 0.40 0.31 BP Neural Network 0.81 0.48 0.38 Support Vector Machine 0.82 0.46 0.36

2.4 Spatial Sensitivity Analysis

Given Gansu's complex climate patterns, we assessed whether the three models are influenced by different hydroclimatic regimes. Using K-means clustering based on multi-year precipitation, stations were classified into dry and wet environments (Table 4). Separate training and validation datasets were created for each cluster, and model performance was evaluated using R², RMSE, and MAE (Table 5).

Table 4. Station classification results

Cluster Stations Annual precipitation (mm) Dry (13 stations) Jiuquan, Zhangye, Wuwei, Minqin, Jingyuan, Yongchang, Jingtai, Gaolan, Gaotai, Dunhuang, Guazhou, Yumen, Shandan < 300 Wet (12 stations) Kongtong, Linxia, Yuzhong, Lintao, Huanxian, Xifeng, Minxian, Wudu, Maiji, Huajialing, Huining, Tianshui > 400

Table 5. Model performance evaluation in different environments

Environment Random Forest BP Neural Network Support Vector Machine Dry R² = 0.84, RMSE = 0.42, MAE = 0.33 R² = 0.78, RMSE = 0.51, MAE = 0.40 R² = 0.76, RMSE = 0.53, MAE = 0.42 Wet R² = 0.88, RMSE = 0.38, MAE = 0.29 R² = 0.82, RMSE = 0.46, MAE = 0.36 R² = 0.85, RMSE = 0.43, MAE = 0.34

All models performed better in wet environments (R² > 0.82) than in dry environments. Random Forest consistently outperformed the other methods in both conditions. In dry environments, BP showed marginally better performance than SVM, while in wet environments, SVM slightly outperformed BP.

To evaluate temporal consistency, we analyzed continuous time series from 2012–2019 for selected stations in both clusters (Fig. 3). Random Forest demonstrated the best agreement with observed SPEI_3 across all sites. In dry environments (Shandan, Jingtai, Zhangye, Yongchang, Gaotai, Gaolan), BP performed better than SVM, while in wet environments (Yuzhong, Lintao, Huanxian, Linxia, Wudu, Minxian), SVM marginally outperformed BP. These results confirm the reliability of environment-specific models.

Discussion

Agricultural drought is a natural disaster that significantly impacts human life and production, necessitating precise and real-time monitoring solutions. This study integrated meteorological and remote sensing data using Random Forest, Support Vector Machine, and BP Neural Network to construct three comprehensive agricultural drought monitoring models for Gansu Province. Our findings align with previous research demonstrating that machine learning models effectively improve drought monitoring accuracy.

Correlation analysis revealed that the integrated models achieved higher correlations with SPEI_3 than any single-factor index, confirming that multi-source data fusion enhances agricultural drought monitoring precision. Among the three methods, Random Forest consistently outperformed SVM and BP, with larger R² values and smaller errors. This result is consistent with Dong et al. [22], who found Random Forest more universally applicable for drought monitoring.

Random Forest enables variable importance ranking, revealing that meteorological factors (PCI, TCI) consistently ranked as the two most important factors across both environments, confirming precipitation and temperature as primary drought drivers. In wet environments, vegetation factors ranked third in importance, while soil factors ranked third in dry environments. This reflects better vegetation growth conditions in humid areas. All machine learning models performed better in wet, high-vegetation regions, with Random Forest maintaining superior performance across both environments.

Despite these advances, several limitations remain. The TRMM precipitation data have a relatively coarse spatial resolution (0.25°) and monthly temporal resolution, which could be improved. Additionally, the study did not incorporate evapotranspiration or human activity factors in the model construction. Future research should address these limitations to further enhance model performance.

Conclusions

Using Random Forest, Support Vector Machine, and BP Neural Network methods, this study integrated multi-source data to construct and validate three agricultural drought monitoring models for Gansu Province. Key conclusions are:

  1. All three machine learning models demonstrated strong performance, with correlation coefficients between simulated and observed SPEI_3 exceeding 0.78, confirming their effectiveness for agricultural drought monitoring in Gansu Province.

  2. Comparative analysis using R², RMSE, and MAE indicated that the Random Forest model outperformed both SVM and BP models, providing more comprehensive, reliable, and accurate agricultural drought monitoring for Gansu Province.

  3. Spatial sensitivity analysis revealed that Random Forest maintained superior performance in both dry and humid environments compared to the other methods, demonstrating its robustness and reliability for agricultural drought monitoring research.

  4. Factor importance analysis showed that machine learning models perform better in semi-arid regions with high vegetation coverage, with meteorological factors being the most critical drivers of agricultural drought.

These results provide a new scientific methodology for agricultural drought monitoring and evaluation in Gansu Province, offering valuable insights for drought research and management.

References

[1] Fang Xiuqin, Guo Xiaomeng, Yuan Ling, et al. Application of random forest algorithm in global drought assessment[J]. Journal of Geo-Information Science, 2021, 23(6): 1040-1049.

[2] Chen Shaodan, Zhang Liping, Tang Rouxin, et al. Analysis on temporal and spatial variation of drought in Henan province based on SPEI and TVDI[J]. Transactions of the Chinese Society of Agricultural Engineering, 2017, 33(24): 126-132.

[3] Tian L Y, Yuan S S, Quiring S M. Evaluation of six indices for monitoring agricultural drought in the south-central United States[J]. Agricultural and Forest Meteorology, 2018, 249: 107-119.

[4] Zhang Ke. PDSI-based analysis of characteristics and spatiotemporal changes of meteorological drought in China from 1982 to 2015[J]. Water Resources Protection, 2020, 36(5): 50-56.

[5] Wu X, Wang P, Huo Z, et al. Crop Drought Identification Index for winter wheat based on evapotranspiration in the Huang-Huai-Hai Plain, China[J]. Agriculture, Ecosystems & Environment, 2018, 263: 18-30.

[6] Zhang Gang, Teng Jiduan, Zhang Lijie, et al. Feasibility analysis and improvement of comprehensive meteorological drought index in Chongzuo[J]. Pearl River, 2020, 41(3): 23-29.

[7] Wang Jinsong, Guo Jiangyong, Qing Jizu. Application of a kind of K drought index in the spring drought analysis in Northwest China[J]. Journal of Natural Resources, 2007, 22(5): 709-717.

[8] Lin Hui, Wang Jingcai, Huang Jinbai, et al. Comparative study on spatial and temporal distribution characteristics of meteorological drought in the upper and middle reaches of Huai River basin based on SPI and SPEI[J]. Journal of Water Resources and Water Engineering, 2019, 30(6): 59-67.

[9] Zhang Lu, Zhu Zhongyuan, Xi Xiaokang, et al. Analysis of drought evolution in the Xilin River basin based on standardized precipitation evapotranspiration index[J]. Arid Zone Research, 2020, 37(4): 819-829.

[10] Xu Yidan, Ren Chuanyou, Ma Xida, et al. Change of drought at multiple temporal scales based on SPI/SPEI in Northeast China[J]. Arid Zone Research, 2017, 34(6): 1250-1262.

[11] Shi Xiaoliang, Shang Yu, Chen Chong, et al. Correlation of vegetation NDVI and drought conditions in Huaihe River basin[J]. Journal of Xi'an University of Science and Technology, 2019, 39(6): 1033-1040, 1064.

[12] Li Weijiao, Wang Yunpeng. An analysis of the spatial-temporal characteristics of drought in Guangdong based on vegetation condition index from 2003 to 2017[J]. Journal of South China Normal University (Natural Science Edition), 2020, 52(3): 85-91.

[13] Zhang Jing, Wei Wei, Pang Sufei, et al. Monitoring and assessment of drought in arid area in Northwest China based on FY-3C and TRMM data[J]. Chinese Journal of Ecology, 2020, 39(2): 690-702.

[14] Zhang Hongwei, Chen Huailiang, Zhou Guanhui, et al. Application of normalized multiband drought index method in cropland drought monitoring[J]. Science & Technology Review, 2009, 27(11): 23-26.

[15] Gu Jiahe, Xue Huazhu, Dong Guotao, et al. Applicability analysis of NDWI for drought monitoring in Henan Province[J]. Agricultural Research in Arid Areas, 2020, 38(6): 209-217.

[16] Wang Lili, Zhang Anbing. Spring drought monitoring in Beijing-Tianjin-Hebei based on temperature water supply drought index[J]. Geomatics & Spatial Information Technology, 2021, 44(4): 72-75.

[17] Yang Sha. Drought monitoring analysis in Xilinhot City based on vegetation water supply index[J]. Grassland and Prataculture, 2021, 33(3): 42-45.

[18] Liu Ji, Zhang Te, Wei Rong, et al. Development of agricultural drought monitoring model using remote sensing based on bias-correcting random forest[J]. Transactions of the Chinese Society of Agricultural Machinery, 2020, 51(7): 170-177.

[19] Wu Zhiyong, Cheng Dandan, He Hai, et al. Research progress of composite drought index[J]. Water Resources Protection, 2021, 37(1): 36-45.

[20] Zhang A Z, Jia G S. Monitoring meteorological drought in semiarid regions using multi-sensor microwave remote sensing data[J]. Remote Sensing of Environment, 2013, 134: 12-23.

[21] Li Xiaohui, Yang Yong, Yang Hongwei. Combining BP neural network with gray model to achieve drought predicting[J]. Journal of Shenyang Agricultural University, 2014, 45(2): 253-256.

[22] Dong Ting, Ren Dong, Shao Pan, et al. Construction of integrated drought condition index based on multi-sensor remote sensing and random forest[J]. Transactions of the Chinese Society of Agricultural Machinery, 2019, 50(8): 200-212.

[23] Zhao Guoyang, Tu Xinjun, Wang Tian, et al. Drought prediction based on artificial neural network and support vector regression machine[J]. Pearl River, 2021, 42(4): 1-9.

[24] Han Lanying, Zhang Qiang, Zhao Hongyan, et al. The characteristics of agricultural drought disaster loss and response to climate warming in Gansu, China[J]. Journal of Desert Research, 2016, 36(3): 767-776.

[25] Gupta S C, Larson W E. Estimating soil water retention characteristics from particle size distribution, organic matter percent, and bulk density[J]. Water Resources Research, 1979, 15(6): 1633-1635.

[26] Shen Runping, Guo Jia, Zhang Jingxian, et al. Construction of a drought monitoring model using the random forest based remote sensing[J]. Journal of Geo-Information Science, 2017, 19(1): 125-133.

[27] Hu Pengfei, Li Jing, Wang Dan, et al. Monitoring agricultural drought in the Loess Plateau using MODIS and TRMM data[J]. Arid Land Geography, 2019, 42(1): 172-179.

[28] Feng P Y, Wang B, Liu D L, et al. Machine learning-based integration of remotely-sensed drought factors can improve the estimation of agricultural drought in South-Eastern Australia[J]. Agricultural Systems, 2019, 173: 303-316.

Submission history

Comparison of Agricultural Drought Monitoring Based on Three Machine Learning Methods: Postprint