Applicability Analysis of ERA5 Reanalysis Precipitation Data in China (Postprint)
Liu Tingting
Submitted 2022-01-21 | ChinaXiv: chinaxiv-202201.00074

Abstract

To investigate the applicability of ERA5 reanalysis precipitation data over China, using daily precipitation data from 728 stations nationwide as reference, we analyzed the accuracy of ERA5 reanalysis precipitation data across different temporal scales (monthly, seasonal), different climate zones, and different altitude gradients, as well as the capability of ERA5 reanalysis data to characterize heavy rainfall and drought events, by employing Pearson correlation coefficient, root mean square error, mean absolute error, probability of detection, false alarm rate, and equitable threat score. The results indicate that the capability of ERA5 precipitation data to identify daily precipitation events exhibits spatial and temporal variations. Overall: the accuracy is highest in the north temperate zone; lower in summer and autumn compared to winter and spring; and lower in areas with altitude >500 m than in areas with altitude ≤500 m. When identifying heavy rainfall events, ERA5 data shows considerable bias compared to station observations, and the larger the threshold (i.e., the more intense the heavy rainfall), the greater the bias. The accuracy of Standardized Precipitation Index (SPI) calculated from ERA5 varies across different timescales, with the 3-month SPI showing the highest accuracy. When identifying drought events, the lower the threshold (i.e., the more severe the drought), the larger the error. This study can provide reference for the application scope and methodology of ERA5 precipitation data and help analyze uncertainties in related research.

Full Text

Abstract

To investigate the applicability of ERA5 reanalysis precipitation data in China, daily precipitation data from 728 meteorological stations nationwide were used as reference. The accuracy of ERA5 reanalysis precipitation data was analyzed across different time scales (monthly, seasonal), climate zones, and elevation gradients using Pearson correlation coefficient (r), root mean square error (RMSE), mean absolute error (MAE), probability of detection (POD), false alarm rate (FAR), and equitable threat score (ETS). Additionally, the capability of ERA5 data to characterize heavy rain and drought events was evaluated. The results demonstrate that the ability of ERA5 precipitation data to identify daily precipitation events varies both spatially and temporally. Among all climate zones, the north temperate zone exhibits the highest accuracy. The precision is lower in summer and autumn compared to winter and spring. The accuracy of ERA5 data in areas above 500 m elevation is lower than in areas below 500 m. When identifying heavy rain events, ERA5 data shows substantial deviation from station observations, with larger thresholds (i.e., stronger heavy rain) corresponding to greater deviations. The accuracy differs among standardized precipitation indices (SPI) calculated from ERA5 data at various time scales, with the 3-month SPI showing the highest precision. For drought event identification, lower thresholds (i.e., more severe drought) yield larger errors. This study provides references for the application scope and methods of ERA5 precipitation data, and helps analyze uncertainties in related research.

Keywords: ERA5 data; precipitation; heavy rain; drought

Introduction

Precipitation is a crucial climate variable closely related to water resources, agricultural production, and economic development. Understanding precipitation quantity, frequency, spatial distribution, and trends is essential for rational water resource utilization and agricultural strategy formulation. However, many remote or economically underdeveloped regions lack meteorological stations [1-2]. Spatial interpolation of limited station precipitation data often yields large errors. Consequently, many scholars use atmospheric reanalysis precipitation data for related studies, such as analyzing precipitation spatiotemporal characteristics, extracting rainstorm and flood events, and driving crop models [3-5]. Nevertheless, atmospheric reanalysis data is a product of merging numerical forecast products with observations, and errors in forecast products, observation data, and assimilation methods all affect reanalysis climate data quality [6]. The accuracy of reanalysis data products directly influences the uncertainty of related studies. Against the backdrop of climate change, studies using reanalysis data to drive various models for simulating climate change impacts are increasing, making verification and evaluation of reanalysis data accuracy both necessary and urgent [7].

ERA5 is the fifth-generation reanalysis product released by the European Centre for Medium-Range Weather Forecasts (ECMWF) after ERA-Interim, providing extensive marine climate and hourly climate variables. The data cover the globe on a 0.25°×0.25° grid, with the dataset containing 240 parameters that provide numerous hourly atmospheric, land, and ocean climate variables. Based on improved three-dimensional variational techniques, this data features high spatiotemporal resolution, rapid updates, and numerous parameters, attracting widespread attention. Studies indicate that ERA5 shows significant improvement over ERA-Interim [8-9]. For example, Graham et al. [10] found that ERA5 temperature and snowfall data in the Arctic region show improvement over ERA-Interim; Betts et al. [11] evaluated ERA5 hourly climate station data for temperature, wind speed, precipitation, and longwave/shortwave downward radiation flux over the Canadian Prairies, revealing substantial quality improvement in temperature data but poor wind speed quality, with precipitation and radiation flux similar to ERA-Interim. Hénin et al. [12] assessed ERA5 daily precipitation data at 0.25° resolution, finding major improvements over ERA-Interim, though significant errors remain in convection-dominated regions.

Many researchers have evaluated the suitability of this dataset in different regions. For instance, Nogueira [13] used the Global Precipitation Climatology Project (GPCP) dataset to validate monthly ERA5 precipitation, finding that ERA5 generally overestimates precipitation in most tropical regions, with biases typically lower than ERA-Interim except over tropical oceans and the Himalayas. Xu et al. [14] evaluated multiple precipitation datasets including ERA5 in the Assiniboine River Basin, finding that ERA5 substantially overestimates spring precipitation in the region. Amjad et al. [15] assessed ERA5 precipitation accuracy using 70 ground meteorological stations, revealing that ERA5 overestimates precipitation in relatively humid and complex terrain regions, with wet biases reaching 0.5 mm·d⁻¹.

Overall, existing suitability analyses of ERA5 precipitation data often focus on overall correlation and deviation between reanalysis products and actual observations [16,22,24,26], with few analyses of simulation accuracy for extreme climate events (e.g., heavy rain). Among the limited analyses targeting extreme events, typical meteorological disaster events are often selected for evaluation, which is not comprehensive enough. For example, Jiang et al. [27] analyzed the accuracy of ERA5 precipitation data in simulating extreme precipitation during 22 typhoon events using China Meteorological Administration station data.

Data and Methods

Data Sources

This study utilized three datasets: (1) ERA5 reanalysis precipitation data from the European Centre for Medium-Range Weather Forecasts (ECMWF); (2) the China Meteorological Administration's national ground meteorological station daily precipitation dataset (V3.0) from the Meteorological Information Center; and (3) China's climate zoning map from the Resource and Environmental Science and Data Center. Daily precipitation data from 2008-2017 were used. No stations are available in the mid-tropical and south-tropical zones, so these two regions were excluded from the climate zone analysis.

Evaluation Methods

Based on the latitude and longitude of each meteorological station, the corresponding ERA5 grid cell was identified and daily precipitation values were extracted to calculate monthly total precipitation and standardized precipitation index (SPI). The evaluation metrics can be divided into two categories: (1) indices measuring correlation and difference between ERA5 and observed precipitation data, including Pearson correlation coefficient (r), RMSE, and MAE. The r reflects consistency between station measurements and ERA5 data; RMSE evaluates overall error level and fluctuation degree of ERA5 precipitation series; MAE assesses average absolute deviation between ERA5 and station data. (2) Indices measuring the capability to capture precipitation, heavy rain, and drought events, including POD, FAR, and ETS. POD represents the probability of correctly detected precipitation events among all observed events; FAR indicates the proportion of falsely detected events among all detected events; ETS reflects the comprehensive detection accuracy of precipitation across different spatiotemporal scales. Calculation formulas and optimal values for these metrics are shown in Table 1.

The specific analysis process is as follows: (1) Using the above metrics, the overall accuracy of ERA5 daily precipitation data across China was first evaluated, where r, RMSE, and MAE reflect overall consistency and difference between datasets, while POD, FAR, and ETS reflect the reliability of precipitation events identified by ERA5 data. Precipitation events are defined as days with precipitation ≥0.1 mm·d⁻¹. (2) Next, the r, RMSE, MAE, POD, FAR, and ETS of ERA5 and observed precipitation data were analyzed across different months, seasons, climate zones, and elevations. Seasons were defined as spring (March-May), summer (June-August), autumn (September-November), and winter (December-February). Elevation gradients were set as 0-200 m, 200-500 m, and >500 m, with station counts of 157, 271, and 300 respectively. (3) The ability of ERA5 precipitation data to identify heavy rain events was then assessed. Following the China Meteorological Administration's definition, heavy rain was classified into two categories: heavy rain (50-100 mm·d⁻¹) and torrential rain (≥100 mm·d⁻¹). (4) Finally, the accuracy of SPI calculated from ERA5 precipitation data at different time scales (1, 3, 6, and 12 months) was evaluated. Using the 3-month SPI as an example and referencing the "Meteorological Drought Grade GB/T20481-2017" standard, different drought grades were identified using various thresholds (Table 2), and the capability of ERA5 to characterize drought events was assessed using POD, FAR, and ETS.

Results

Overall Accuracy of ERA5 Precipitation Data

Figure 2 shows the spatial distribution of evaluation metrics for ERA5 precipitation data across 728 stations. The results indicate spatial variations in accuracy. Areas with larger r and smaller RMSE/MAE represent higher precision. Nationally, 83.6% of stations have r>0.5, 53.3% have POD>0.80, 51.9% have FAR<0.50, 46.9% have ETS>0.25, and 39.6% have MAE<2 mm·d⁻¹. RMSE ranges from 2.466 to 8.071 mm·d⁻¹, and MAE ranges from 1.013 to 3.140 mm·d⁻¹. Overall, ERA5 can effectively detect precipitation across China, with better detection capability in eastern regions than western regions.

Accuracy Across Months and Seasons

Figure 3 presents the accuracy of ERA5 daily precipitation across different months and seasons. Monthly precipitation r ranges from 0.482 to 0.542, RMSE from 2.232 to 8.530 mm·d⁻¹, and MAE from 0.945 to 3.438 mm·d⁻¹. Seasonal precipitation r ranges from 0.488 to 0.525, RMSE from 1.825 to 11.746 mm·d⁻¹, and MAE from 0.619 to 4.913 mm·d⁻¹. Substantial seasonal differences exist, with lower accuracy in summer and autumn compared to winter and spring.

Accuracy Across Climate Zones and Elevation Gradients

Figure 4 shows validation results across different climate zones and elevation gradients. For daily precipitation across climate zones, r ranges from 0.437 to 0.587, RMSE from 1.825 to 11.746 mm·d⁻¹, and MAE from 0.619 to 4.913 mm·d⁻¹. The north temperate zone shows the highest accuracy (r=0.587, RMSE=4.040 mm·d⁻¹, MAE=1.472 mm·d⁻¹). The south subtropical and north subtropical zones show the smallest r range. Across elevation gradients, the lowest accuracy occurs at elevations >500 m (r=0.487, RMSE=5.157 mm·d⁻¹, MAE=2.126 mm·d⁻¹), while the 0-200 m elevation band shows the highest accuracy (mean RMSE=3.188 mm·d⁻¹).

Accuracy for Heavy Rain Event Identification

Figure 5 shows the accuracy of ERA5 for heavy rain identification. For the 50 mm·d⁻¹ threshold, 51.9% of stations have POD>0.1, 53.7% have FAR>0.75, and 46.9% have ETS>0.1. High POD values are mainly concentrated in eastern China (southern mid-temperate zone, south temperate zone, north subtropical zone, eastern mid-subtropical zone, and eastern south subtropical zone). High FAR values are concentrated in the eastern mid-temperate zone, western south temperate zone, and western mid-subtropical zone, indicating high false alarm rates in these regions. Overall, ERA5 captures heavy rain events best in southeastern China (north subtropical, eastern mid-subtropical, and eastern south subtropical zones). For the 100 mm·d⁻¹ threshold, only 23.6% of stations have POD>0.1 and 72.9% have FAR>0.75, indicating poorer identification capability for more extreme heavy rain events, though the spatial patterns are similar.

Accuracy for Drought Event Identification

Figures 6-9 show validation results for SPI calculated from ERA5 precipitation data. Different SPI time scales represent droughts of different durations: SPI1 indicates monthly drought, SPI3 quarterly drought, SPI6 half-year drought, and SPI12 annual drought. Accuracy varies by time scale, with SPI3 showing the highest precision overall. The spatial distribution patterns of r, RMSE, and MAE are similar across time scales, with r values higher in eastern regions and lower in the Qinghai-Tibet region.

When identifying drought events using different thresholds, lower thresholds (i.e., more severe drought) yield poorer identification results. For threshold=-0.5, the mean POD, FAR, and ETS values across climate zones are 0.61, 0.32, and 0.29 respectively. For threshold=-1.0, the values are 0.51, 0.40, and 0.20; for threshold=-1.5, 0.44, 0.46, and 0.15; and for threshold=-2.0, 0.38, 0.50, and 0.11. ERA5 shows better drought capture capability in the north temperate, south temperate, and north subtropical zones, with higher FAR values in the plateau climate zone, mid-subtropical zone, and north tropical zone indicating more false alarms.

Discussion

ERA5 is ECMWF's latest reanalysis product, and its precipitation data can provide inputs for analyzing precipitation spatiotemporal characteristics, extracting rainstorm and flood events, and driving crop, hydrological, and land surface models. This study, using daily precipitation data from the National Meteorological Information Center as reference, is the first comprehensive analysis of ERA5 accuracy in China focusing on extreme climate events (heavy rain and drought). As climate change increases the frequency of extreme events with numerous negative impacts and economic losses, reliable data products reflecting extreme conditions are essential for impact assessment and adaptation strategy development. However, previous climate data product accuracy analyses have largely ignored extreme event capture capability. This study fills this gap, expands the perspective of climate dataset validation, and provides methodological references for similar research.

The study reveals that ERA5 accuracy varies across climate zones, seasons, and elevation gradients. ERA5 can effectively characterize 3-month SPI drought but shows poorer performance for more severe droughts. Overall, ERA5 simulates precipitation better in eastern than western China, consistent with Cheng et al.'s findings [28]. Accuracy is lower in summer and autumn than in winter and spring, aligning with previous reanalysis precipitation studies [16,28-30]. ERA5 shows strongest capability for depicting heavy rain in southeastern China.

Several factors explain regional accuracy differences. First, as reanalysis data, ERA5's accuracy is affected by input data quality and assimilation algorithms [6]; regions with sparse observations or low-quality input data produce lower-quality reanalysis products. Second, precipitation has strong local characteristics, and ERA5's 0.25°×0.25° grid represents area-averaged precipitation while station data represent point measurements, creating scale mismatches. Direct comparison introduces errors, particularly in areas with complex terrain where precipitation varies greatly over small distances.

This study has limitations. First, uneven national station distribution leads to varying station densities across climate zones and elevation gradients, with most stations in eastern and low-elevation regions, potentially biasing validation results in data-sparse areas. Second, ERA5's 0.25° resolution reflects grid-cell average precipitation while station data represent local conditions, and precipitation can vary substantially over small areas, particularly in complex terrain, introducing comparison errors.

Conclusions

Using station observations as reference, this study analyzed ERA5 reanalysis precipitation data accuracy across different time scales (monthly, seasonal), climate zones, and elevation gradients, and evaluated its capability to characterize heavy rain and drought events. The main conclusions are:

  1. ERA5 precipitation data shows spatial and temporal differences in identifying daily precipitation events. Among eight climate zones, the north temperate zone has the highest accuracy, with r, RMSE, and MAE values of 0.587, 4.040 mm·d⁻¹, and 1.472 mm·d⁻¹ respectively. Accuracy is lower in areas above 500 m elevation than below 500 m. Seasonal accuracy is higher in winter and spring than in summer and autumn.

  2. When identifying heavy rain events, ERA5 data deviates substantially from station observations, with larger thresholds (i.e., stronger heavy rain) showing greater deviations. Overall, ERA5 captures heavy rain events best in southeastern China (north subtropical, eastern mid-subtropical, and eastern south subtropical zones).

  3. The accuracy of SPI calculated from ERA5 precipitation data varies by time scale, with 3-month SPI showing the highest precision. When identifying drought events, lower thresholds (i.e., more severe drought) yield larger errors. Overall, ERA5 shows better drought capture capability in the north temperate, south temperate, and north subtropical zones.

These findings provide references for researchers considering whether to use ERA5 data, which regions to use it in, and what analyses to perform, thereby helping evaluate uncertainties in related studies.

References

[1] Xia Jun, Tan Ge. Hydrological science towards global change: Progress and challenge[J]. Resources Science, 2002, 24(3): 1-7.

[2] Tan Ge, Xia Jun, Li Xin. Hydrological prediction in ungauged basins[J]. Journal of Glaciology and Geocryology, 2004, 26(2): 192-196.

[3] Adler R F, Huffman G J, Chang A, et al. The version-2 global precipitation climatology project (GPCP) monthly precipitation analysis (1979-present)[J]. Journal of Hydrometeorology, 2003, 4(6): 1147-1167.

[4] Su F, Hong Y, Lettenmaier D P. Evaluation of TRMM multisatellite precipitation analysis (TMPA) and its utility in hydrologic prediction in the La Plata Basin[J]. Journal of Hydrometeorology, 2008, 9(4): 622-640.

[5] Zhang Xiaoli, Peng Yong, Wang Bende, et al. Suitability evaluation of precipitation data using SWAT model[J]. Transactions of the Chinese Society of Agricultural Engineering, 2014, 30(19): 88-96.

[6] Albergel C, Dutra E, Munier S, et al. ERA-5 and ERA Interim driven ISBA land surface model simulations: Which one performs better?[J]. Hydrology and Earth System Sciences, 2018, 22(6): 3515-3532.

[7] Huang Jianping, Zhang Guolong, Yu Haipeng, et al. Characteristics of climate change in the Yellow River Basin during recent 40 years[J]. Journal of Hydraulic Engineering, 2020, 51(9): 1048-1058.

[8] Hersbach H, Bell B, Berrisford P, et al. The ERA5 global reanalysis[J]. Quarterly Journal of the Royal Meteorological Society, 2020, 146(730): 1999-2049.

[9] Graham R M, Hudson S R, Maturilli M. Improved performance of ERA5 in Arctic gateway relative to four global atmospheric reanalyses[J]. Geophysical Research Letters, 2019, 46(11): 6138-6147.

[10] Betts A K, Chan D Z, Desjardins R L. Near surface biases in ERA5 over the Canadian prairies[J]. Frontiers in Environmental Science, 2019, 7: 129.

[11] Hénin R, Liberato M L, Ramos A M, et al. Assessing the use of satellite-based estimates and high resolution precipitation datasets for the study of extreme precipitation events over the Iberian Peninsula[J]. Water, 2018, 10(11): 1688.

[12] Wang C, Graham R M, Wang K, et al. Comparison of ERA5 and ERA-Interim near surface air temperature, snowfall and precipitation over Arctic sea ice: Effects on sea ice thermodynamics and evolution[J]. The Cryosphere, 2019, 13(6): 1661-1679.

[13] Nogueira M. Inter-comparison of ERA-5, ERA-Interim and GPCP rainfall over the last 40 years: Process-based analysis of systematic and random differences[J]. Journal of Hydrology, 2020, 583: 124632.

[14] Xu X, Frey S K, Boluwade A, et al. Evaluation of variability among different precipitation products in the Northern Great Plains[J]. Journal of Hydrology: Regional Studies, 2019, 24: 100608.

[15] Amjad M, Yilmaz M T, Yucel I, et al. Performance evaluation of satellite and model-based precipitation products over varying climate and complex topography[J]. Journal of Hydrology, 2020, 584: 124707.

[16] Cheng Xiaoyu, Wang Yanhua, Li Guochun, et al. Evaluation of three reanalysis precipitation datasets in China[J]. Climate Change Research, 2013, 9(4): 258-265.

[17] Jiang Q, Li W, Fan Z, et al. Evaluation of the ERA5 reanalysis precipitation dataset over Chinese Mainland[J]. Journal of Hydrology, 2020: 125660.

[18] Tarek M, Brissette F P, Arsenault R. Evaluation of the ERA5 reanalysis as a potential reference dataset for hydrological modelling over North America[J]. Hydrology and Earth System Sciences, 2020, 24(5): 2527-2544.

[19] Beck H E, Pan M, Roy T, et al. Daily evaluation of 26 precipitation datasets using Stage IV gauge-radar data for the CONUS[J]. Hydrology and Earth System Sciences, 2019, 23(1): 207-224.

[20] Fallah A, Rakhshandehroo G R, Berg P, et al. Evaluation of precipitation datasets against local observations in southwestern Iran[J]. International Journal of Climatology, 2020, 40(9): 4102-4116.

[21] Xu Kun, Zhu Xiufang, Liu Ying, et al. Vulnerability of drought disaster of maize in China based on AquaCrop model[J]. Transactions of the Chinese Society of Agricultural Engineering, 2020, 36(1): 154-161.

[22] Xu Kun, Zhu Xiufang, Liu Ying, et al. Effects of drought on maize yield under climate change in China[J]. Transactions of the Chinese Society of Agricultural Engineering, 2020, 36(11): 149-158.

[23] Feng Kepeng, Hong Yang, Tian Juncang, et al. Evaluating runoff simulation of multi-source precipitation data in small watersheds[J]. Arid Land Geography, 2020, 43(5): 1179-1191.

[24] Liu Pengfei, Liu Dandan, Liang Feng, et al. Comparison of the adaptability of CFSR, MERRA, NCEP reanalysis precipitation data and observation in northeast China[J]. Research of Soil and Water Conservation, 2018, 25(4): 215-221.

[25] Sun Jia, Zhang Xinping, Huang Yimin. Evaluation of precipitation from ERA-Interim, CRU, GPCP and TRMM reanalysis data in the Dongting Lake Basin[J]. Resources and Environment in the Yangtze Basin, 2015, 24(11): 1850-1859.

[26] Huang Ying, Mao Wenqian, Wang Xiaoya, et al. Temporal and spatial distribution of precipitation in the Qilian Mountain and its surrounding areas in recent 39 years[J]. Journal of Arid Meteorology, 2020, 38(4): 527-534.

[27] Zhao Jianting, Wang Yanjun, Su Buda, et al. Spatiotemporal distributions of temperature, precipitation, evapotranspiration, and drought in the Indus River Basin[J]. Arid Land Geography, 2020, 43(2): 72-82.

[28] Wei Fenfen, Tang Jianping, Wang Shuyu. A reliability assessment of upper level reanalysis datasets over China[J]. Chinese Journal of Geophysics, 2015, 58(2): 383-397.

[29] Hu Zengyun, Ni Yongyong, Shao Hua, et al. Applicability study of CFSR, ERA-Interim and MERRA precipitation estimates in Central Asia[J]. Arid Land Geography, 2013, 36(4): 700-708.

[30] Jiao Zhenhang, Shu Hong, Wu Kai, et al. The rainfall calibration methods impact on VIC soil moisture simulation[J]. Urban Surveying, 2017(4): 37-41.

Submission history

Applicability Analysis of ERA5 Reanalysis Precipitation Data in China (Postprint)