-
ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate.Reanalysis combines modeling data with observations from across the world into a globally complete and consistent dataset that adheres to the laws of physics (Copernicus Climate Change Service, 2017). ERA5 provides hourly estimates for many atmospheric, ocean-wave, and land-surface quantities. The raw spatial resolution for surface and high-level areas is 0.125° × 0.125° and 0.250° × 0.250°, respectively. In the present work, the spatial resolution is unified to 0.250° × 0.250°, consistent with the QPFNet’s forecasts. In this work, the ERA5 data from June 2016 to June 2019 was used to build the training set.
Some predictors from ERA5 reanalysis data are listed in Table 1. Terrain altitude was included alongside the basic atmospheric variables of standard pressure levels and near the ground. The experimental forecast area spanned 18°–54°N, 72°–135°E (The topographic distribution can be seen in Fig. 1). In performing rainfall prediction, basic deterministic forecast variables were fed into QPFNet from ECMWF's highest-resolution model (HRES) with a spatiotemporal resolution of 3 h and 0.125° × 0.125°. The HRES data from July to September of 2019 was used to build the test set.
Category Predictors Pressure Layer (hPa) Upper air Temperature, Geopotential height, Relative humidity, U (zonal wind), V (Meridional wind), w (vertical speed) 100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 850, 900, 925, 950, 1000 Surface 2-m temperature, Ground pressure, 2-m dew point temperature, 10-m U component, 10-m V component − Terrain Altitude − Table 1. Predictors used in the deep learning model.
-
Observational precipitation data was collected from the Multi-source merged Precipitation Analysis System (CMPAS-V2.1) by the China Meteorological Administration (CMA). The data sources used by CMPAS include observational precipitation from more than 40 000 gauges, satellite-derived rainfall from FY2, CMORPH (CPC MORPHing technique), and radar-derived precipitation. The CMPAS-V2.1 data from June 2016 to September 2019 labeled the samples. An independent verification showed that its accuracy was higher than that of any of the three sources of precipitation (Xie and Xiong, 2011; Pan et al., 2015). The spatiotemporal resolution of the data was 0.01° × 0.01° and 1 h.
In consideration of the requirements of operational weather forecasting, the accumulated precipitation of three hours (R3h) was classified into 102 classes, corresponding to precipitation depths of {0, 0.1, 1, 2, 3……99, ≥100} mm. Within the proposed network architecture, a softmax classifier (Table A1 in Appendix) is used to obtain the probability for each category.
-
Consistent with the physical laws that govern precipitation, the QPFNet DL forecasting model comprised three-dimensional (3D) convolutional layers, residual connection layers, pooling layers, upsampling layers, and attention layers, among others, as shown in Fig. 2. The predictors of QPFNet are listed in Table 1, while the labels were taken from observed precipitation information from CMPAS-V2.1. Based on the abovementioned research on weather forecasting tasks and DL methods, we explain the role of the proposed neural network architecture in performing QPF.
-
Synoptic systems present a typical 3D structure. The different vertical distributions of variables, such as forward sloping troughs and temperature inversion layers, etc., can be useful in assessing weather system development from a meteorological perspective (Sun and Tao, 2012). Therefore, further development of 3D DL models is considered necessary to extract generation, development, and dissipation features of synoptic systems.
ERA5 reanalysis data provides 3D atmospheric structures with various fundamental elements from high altitudes to the surface. Based on ERA5 reanalysis data, QPFNet was designed to extract 3D features of atmospheric evolution and thus consists of 3D convolutional layers, 3D max-pooling layers, 3D upsampling layers, etc.
-
Meteorologists must analyze various sources of information to predict the evolution of weather systems and often seek to extract key information from various dynamical, thermodynamical and environmental features to support weather prediction. Similarly, an encoder-decoder architecture was adopted to construct an overall U-shape network architecture. During the encoding process, the encoder continuously extracts useful precipitation information while the feature maps (Table A1 in Appendix) are continuously compressed. Finally, the feature map size is reduced to 1/16 of the input size.
The detailed characteristics of the feature map are partly lost by the compression encoding, so the decoding process is necessary to restore the lost information. During the decoding process, the QPFNet model deduces the forecasting results in each grid. The decoder has the same number of convolutions and the same number of blocks. Instead of pooling, the decoder performs upsampling using upsampling layers that replace the pooling layers in the network architecture. A mirrored decoder in the decoder network connects their corresponding encoder feature map(s) by skip connections (copy and crop, black arrows in Fig. 2).
-
Residual connections are a type of skip connection that learns residual functions by referencing layer inputs instead of learning unreferenced functions, which were first proposed by He et al. (2016). The depth of a DL network is considered crucial (Simonyan and Zisserman, 2015; Szegedy et al., 2015). However, increasing network depth is associated with many challenges, such as gradient explosion and vanishing. Due to the increase in the number of layers, the gradient may become unstable during the backpropagation process. Residual connections can automatically learn identity mappings to accelerate training. As QPFNet consists of 124 layers, we apply residual connection in the proposed model.
-
To investigate precipitation processes, meteorologists analyze synoptic-scale systems such as cold vortices, upper-level troughs, typhoons, etc., as well as mesoscale systems such as lower-level convergence lines and cold pools to perform a comprehensive analysis. Hence, a good QPF must comprehensively extract global and local information from weather conditions.
Convolutional Block Attention Modules (CBAM) were proposed by Woo et al. (2018). In the proposed approach, CBAM was used to calculate correlation coefficients between feature maps and a label matrix so that a DL network could better extract global features. As a result, the DL network was able to acquire a better understanding of weather systems of various scales and then forecast occurrences of convective weather.
Average and maximum pooling (Table A1 in Appendix) aggregate the spatial information of the feature map. Then, it is sent to a shared multilayer perceptron network (MLP) to compress the spatial dimensions of the input feature map. The channel’s attention map Mch was generated by summing pixel by pixel.
where F is the feature map,
${F}_{\mathrm{a}\mathrm{v}\mathrm{g},\mathrm{ch}}$ and${F}_{\mathrm{m}\mathrm{a}\mathrm{x},\mathrm{ch}}$ are feature maps with channel information after global average pooling and maximum global pool. W0 and W1 are the weight parameters of two layers in MLP, and$ \sigma $ is the sigmoid activation function (Table A1 in Appendix).The average pooling and maximum pooling processes were also applied to compress the input feature map at the channels. We connect these layers with 7 × 7 convolutions (
$ {\mathrm{C}\mathrm{o}\mathrm{n}\mathrm{v}}^{7\times 7} $ ) to generate a spatial attention map Ms as given below.$ {F}_{\mathrm{a}\mathrm{v}\mathrm{g},\mathrm{s}} $ and$ {F}_{\mathrm{m}\mathrm{a}\mathrm{x},\mathrm{s}} $ are the feature maps with spatial information after global average pooling and maximum global pool. -
In general, QPF tasks involve continuous value prediction, which falls under the category of regression models. Instead of modeling the continuous values of the targets y, we discretize the variables into many small intervals that continuously cover a range of interest (Allwein et al., 2000; Sønderb et al., 2020).
The probability of all categories is the output of the softmax classifier. Suppose we assign the category with the highest probability to be the predicted category. In that case, the model is inclined to predict light rain because the larger is the rain rate, the smaller its climate probability.
Therefore, a new strategy was designed. Assuming that the probability of each category output by softmax is
$ {p}_{c} $ and a probability threshold τc is set for each precipitation level c, the value of τc decreases as c increases, consistent with the climate probability. For each c, the algorithm calculates the cumulative probability${p}'_{c}$ of all categories that exceed c. If${p}'_{c}$ is greater than τc, then c is taken as the predicted category. If multiple values of${p}'_{c}$ are greater than τc, the largest c is selected.where argmax is the operation that finds the argument that gives the maximum value from the target function,
and max is the maximum function, yi is the predicted category value, m is the total number of categories. τc is calculated according to the evaluation results from the validation set. For each c, the cumulative probability
${p}'_{c}$ is assigned the highest threat score (TS; see section 4.2) so long as the BIAS remains less than VBIAS, and then${p}'_{c}$ was confirmed to be τc. VBIAS could be set as required, and we set it to be 1.5 in this study. -
QPFNet is designed to pay more attention to heavy rainfall areas, which improves the QPF, especially for heavy precipitation events. Nonetheless, QPFNet is also designed to avoid paying excessive attention to the heavy rainfall grid in the training process. Doing so may lead to failure to predict occurrences of weak rainfall events. Moreover, QPFNet is designed to focus on difficult samples to enhance its forecast capability in such cases.
Therefore, the multiclass focal loss [Eq. (4), Lin et al., 2017] is applied as given below.
where we apply
$ \mathrm{\alpha }\;\mathrm{a}\mathrm{n}\mathrm{d}\;\mathrm{\gamma } $ as weights whose default configurations are 0.25 and 2.0, respectively. The parameter$ {y}_{i,c} $ is the label for sample i, c is the category,$ {p}_{i,c} $ is the prediction probability of sample i for category c, and m is the total number of categories. -
The predictors used by QPFNet are listed in Table 1, labeled by the observational precipitation data from CMPAS2.1. Predictors of time T were labeled by the precipitation of [T–(T + 3)] h. Finally, 6542 samples were constructed, spanning June 2016 to June 2019. A total of 1308 samples (about 20%) were randomly selected to generate the validation set, and the remaining 5234 samples were selected as the training set. The test set contained 1200 samples from July to September of 2019.
The forecasting area was 18°–54°N, 72°–135°E. Consistent with a spatial resolution of 0.25°, the width and length were 145 and 253 grids, respectively. The size of the samples was Nlayer × 145 × 253 × Npredictor, where Nlayer is the number of layers (from the surface to 100 hPa, a total of 16 layers), and Npredictor is the number of predictors, including air temperature, humidity, pressure or geopotential height, U wind, V wind, and vertical velocity (w). Because there was no win the surface layer, we replaced it with topographic altitude.
-
The Adam algorithm (Kingma and Ba, 2014) was utilized as an optimizer, and the learning rate was set to 10−4. Other parameters were held at their default values (Perol et al., 2018). The model was trained for 100 epochs, with a batch size of 2.
An early stopping strategy was used during the training to avoid overfitting. When the loss on the validation set was no longer reduced in 10 epochs, the training process was terminated, and the model weight with the minimum loss on the validation set was saved.
The NVidia CUDA (Compute Unified Device Architecture) library and an NVidia Tesla graphic processing unit (GPU) were used to perform the training and forecasting processes of QPFNet.
Using the GPU, the training time was roughly 27 hours. Moreover, 0–72 h forecasts (at 3-h intervals) at a 0.25° × 0.25° resolution over a study area of (18°–54°N, 72°–135°E) could be completed in 5 min, which is suitable for practical forecasting operations.
-
After training, the optimal weights of QPFNet were obtained. The basic forecast variables from HRES were fed into QPFNet to obtain the QPF of corresponding forecast times. A comprehensive evaluation and a case evaluation of QPFNet and HRES are given below.
-
The evaluation results of the QPF performed by QPFNet and HRES are listed in Table 2. Classical evaluation measures, including the probability of detection (POD), false alarm ratio (FAR), threat score (TS), bias, F1-score, and equitable threat score (ETS), were used to evaluate the forecasts generated by the two models.
Forecast result Precipitation intensity
mm (3 h)–1POD
$\dfrac{\mathit{h} }{\mathit{h}+\rm{m}\rm{i}\rm{s}\rm{s} }$FAR
$ \dfrac{\mathit{f}}{\mathit{h}+\mathit{f}} $Bias
$\dfrac{\mathit{h}+\mathit{f} }{\mathit{h}+\rm{m}\rm{i}\rm{s}\rm{s} }$TS
$\dfrac{\mathit{h} }{\mathit{h}+\rm{m}\rm{i}\rm{s}\rm{s}+\mathit{f} }$ETS
$\dfrac{\mathit{h}-{\mathit{h} }_{\rm{r}\rm{a}\rm{n}\rm{d}\rm{o}\rm{m} } }{\mathit{h}+\rm{m}\rm{i}\rm{s}\rm{s}+\mathit{f}-{\mathit{h} }_{\rm{r}\rm{a}\rm{n}\rm{d}\rm{o}\rm{m} } }$F1
$\dfrac{2\mathit{h} }{2\mathit{h}+\rm{m}\rm{i}\rm{s}\rm{s}+\mathit{f} }$ECMWF
HRES≥0.1 0.718 0.662 2.123 0.299 0.181 0.460 ≥3 0.347 0.733 1.301 0.178 0.152 0.302 ≥10 0.132 0.827 0.759 0.081 0.076 0.150 ≥20 0.051 0.925 0.676 0.031 0.030 0.061 QPFNet ≥0.1 0.735 0.653 1.268 0.358 0.267 0.527 ≥3 0.402 0.706 1.365 0.205 0.179 0.340 ≥10 0.264 0.830 1.549 0.116 0.108 0.207 ≥20 0.159 0.916 1.897 0.058 0.056 0.110 Table 2. The evaluation results of ECMWF HRES and QPFNet for R3h prediction for next 0–72 h (3-h interval, July to September of 2019); h is the number of hits, f is the number of false forecasts, cn is the number of correct negatives, and miss is the number of missed forecasts.
$ {h}_{\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{o}\mathrm{m}}=(h+f)\times (h+m)/(h+m+f+\mathrm{c}\mathrm{n} $ ). POD (probability of detection), FAR (false alarm ratio), TS (threat score), bias, F1-score, and ETS (equitable threat score) are listed.The overall evaluation scores from July to September 2019, with all leading times from 0 to 72 hours, are listed in Table 2. It was observed that the performance of QPFNet’s QPF was better than that of HRES in the following three aspects.
(1) In predicting drizzle (R3h ≥ 0.1 mm), QPFNet improved the POD of drizzle forecasts while reducing the FAR, resulting in an obvious improvement in ETS and TS. The TS of QPFNet improved by 19.7% compared to the HRES, and the BIAS was relatively closer to 1.
(2) In predicting heavy rain (R3h ≥10 mm and R3h ≥ 20 mm), the PODs of QPFNet obviously increased, while the FARs decreased slightly compared to the HRES, while the POD of QPF of 10 mm of precipitation increased from 0.132 to 0.264, while that of 20 mm depth increased from 0.051 to 0.159. As a result, TS and ETS were obviously improved. The TS of 10 mm (3 h)–1 and 20 mm (3 h)–1 precipitation increased by 43.2% and 87.1 %, respectively.
(3) In predicting heavy rain (R3h≥ 10 mm and R3h≥20 mm), the QPFs produced by HRES were relatively conservative predictions. HRES’s BIAS was less than 0.76, indicating the forecasting area was sub-optimally small. In contrast, the forecast area of QPFNet was relatively large, with a BIAS larger than 1.5.
Some valuable information may be extracted from Fig. 3. First, for varying precipitation intensities and forecasting lead times, the TS values of QPFNet were better than those of HRES. The greater the precipitation intensity, the larger the improvement in TS. Second, with increasing forecast lead time, the performance of the QPFNet and HRES showed a clear downward trend, consistent with the conventional understanding of weather predictability.
-
Precipitation was recorded over large areas in China from 17–20 July 2019. The precipitation intensity ranged from 0 to more than 50 mm (3 h)–1, providing a good case to evaluate the performance of the QPFNet. The QPFs of both QPFNet and HRES are shown in Fig. 4. The initial time was 0000 UTC on 17 July 2019.
Figure 4. Precipitation forecasts of QPFNet and HRES initiated at 0000 UTC on 17 July 2019 for the next 12–72 h (the tables list the corresponding TS of QPFNet and HRES of each forecast).
The following facts may be observed from Fig. 4.
(1) For light rain (0.1 mm ≤ R3h < 3 mm) events, the QPF of QPFNet was more conservative than that of HRES. The BIAS of QPFNet was closer to 1, consistent with the results in Table 2. In the experiments conducted, QPFNet demonstrated the capability to provide more accurate predictions for rainfall occurrences. The TS of QPFNet forecasts of varying lead times was also better than that of HRES.
(2) For moderate or heavier precipitation (R3h ≥ 3 mm), the QPFNet's forecast area was less conservative. While HRES missed many heavy precipitation grids, QPFNet hit an obviously larger number of heavy precipitation grids, although QPFNet also gave more false alarms. This is also consistent with the evaluation results in Table 2. Obviously, the forecasts of QPFNet would be more informative in practice.
(3) The precipitation area produced by QPFNet was smoother than that of HRES, indicating that there would be less information in the forecasts produced QPFNet. This may result from the QPFNet seeking the global minimum loss during training, which would lead to some details being smoothed.
In general, QPFNet showed better QPF capability than HRES, both for drizzle and heavy precipitation. However, the distribution of QPFNet’s QPF was smoother, and some detailed information may have been lost, which may confuse meteorologists in operational applications.
-
Based on basic meteorological variables from ERA5, as well as terrain data and the observational precipitation data from CMPAS-V2.1, a semantic segmentation DL model named QPFNet has been proposed, incorporating basic meteorological variables including air temperature, geopotential height or sea-level pressure, humidity, and wind in 16 pressure layers. A residual mechanism, an attention mechanism, and a multi-classification method were applied in QPFNet to fit the non-linear relationship between basic meteorological variables and precipitation. The trained model was employed to perform QPF with inputs from the basic variables forecast of the HRES. Forecast evaluation results show that QPFNet outperformed HRES on various precipitation intensities.
(1) The QPFNet achieved a better precipitation forecast than HRES, showing obvious improvement in the various evaluation indices. For the 0–72 h forecast of precipitation accumulated over 3 h, the TS at 0.1 mm, 3 mm, 10 mm, and 20 mm precipitation depths improved by 19.7 %, 15.2 %, 43.2 %, and 87.1 %, respectively.
(2) In forecasting drizzle (0.1 mm), QPFNet was more precise than HRES in terms of higher POD, lower FAR, and having a BIAS closer to 1. For convective precipitation [the intensity≥ 20 mm (3 h)–1], QPFNet could better predict the central location of heavy precipitation and obviously improve TS.
The above results show that QPFNet can extract precipitation characteristics from three-dimensional basic meteorological variables to realize an effective QPF. Especially for intensive precipitation, the forecast performance of QPFNet obviously improved upon that of the conventional model. Therefore, using DL to extract precipitation features from basic meteorological variables can provide an important reference for QPF, and avoid the uncertainties of PSs.
Based on the proposed DL model, the predictability of precipitation has been discussed. QPFNet’s performance on precipitation simulations on the ERA5 analysis and HRES forecast fields has been compared. Furthermore, as physical laws of cumulus clouds, lightning, heavy fog, and other weather phenomena are not fully understood, the present work can provide insight and a feasible solution for forecasting these complex weather phenomena.
The experimental results show that the forecast errors of the basic variables from HRES have less impact on forecasting drizzle than intensive rainfall. The greater the precipitation intensity, the more obvious the impact is. However, the error effect induced by basic variables was tiny on the convective precipitation process [≥30 mm (3 h)–1]. This indicates that the basic variables, either from reanalysis or forecast, could not effectively capture the relevant information on violent convection.
Machine Learning (ML) is often criticized by forecasters as being a “black box” because of a perceived inability to understand on what physical basis ML makes its predictions (Mcgovern et al., 2019), when in reality it may reveal further insights. In this study, the permutation importance is used to analyze the sensitivity of the different pressure levels and meteorological variables to precipitation forecasting. This method is beneficial in investigating the complex mechanisms relevant to different rainfall intensities. Some results are consistent with conventional knowledge, such as the variables on 200 hPa playing an important role in various rainfall intensities. However, some results differed from conventional understanding and may be valuable for understanding rainfall processes and forecasting errors. Some examples are given below.
(1) Out of the entire troposphere, the evolution of the basic variables at 400 hPa, instead of 500 hPa, is most important in rainfall forecasting. More specifically, the vertical velocity (w) in 400 hPa level was the most sensitive factor among all variables on forecasting precipitation intensity, except for extremely heavy rain. If the precipitation was led by violent convective activity, the vertical movements derived from the grid-scale diagnostic and grid PSs were unreliable. As a result, the greater the convective precipitation intensity, the lower the relative sensitivities to w from NWP. The relative sensitivity to air temperature at 400 hPa is beneficial to help understand the complex joint impacts of cold/warm air activity, frontal action, and evolution of stratification instability in the mid-troposphere on precipitation led by different dynamical processes.
(2) The 900 hPa level also had an important influence on precipitation. Located near the top of the boundary layer, the 900 hPa level is an exchange layer for heat, momentum, and water vapor between the free atmosphere and the boundary layer. Therefore, the atmospheric conditions in this layer exert an important influence on precipitation. In contrast, the influence of the boundary layer on precipitation is "relatively unimportant", which may be caused by the fact that the simulation capability of NWP in the boundary layer was much lower than that of the free atmosphere.
The model interpretation and visualization (MIV) of DL provides a new perspective to understand precipitation mechanisms. Specifically, some of the analytical results, such as the critical role of the 400 hPa layer on precipitation, can guide us to better analyze mechanisms related to precipitation. Other analytical results, such as the "relatively unimportant" effects of boundary layer conditions on precipitation, can guide us to better understand the limitations of NWP as well.
Future research should be conducted to improve QPFNet to allow it to become operationally feasible. For example, QPFNet's forecast is smoother than that of ECMWF, indicating a loss of detail. This may be caused by our training strategy and loss function, which sought the global minimum loss value. Moreover, the QPFNet’s forecast map is not a physical field like ECMWF forecast, and it is not drawn from the multivariate distribution of rain maps, leading to difficulties in obtaining realistic maps.
Therefore, a customized loss function and generative adversarial network may be used to enrich the details of the precipitation forecast. In addition, state-of-the-art operational approaches such as non-homogeneous regression or quantile random forests, etc., should be also be evaluated and compared under operational conditions.
Acknowledgements. The authors would like to acknowledge the financial support of the National Key Research and Development Program (Grant No. 2017YFC1502000) and the National Natural Science Foundation of China (Key Program, 91937301), and thank the Copernicus Climate Change Service (C3S) for providing ERA5 reanalysis data, and National Meteorological Information Center for providing the CMPAS-V2.1 product.
APPENDIX
Jargons Explanation Deep Learning A class of machine learning algorithms that uses multiple layers to progressively extract higher-level features from the raw input. Feature maps Generated by applying Filters or Feature detectors to the input image or the feature map output of the prior layers. Softmaxclassifier A mathematical function that converts a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector. Average pooling A layer downsamples feature maps by averaging the presence of features in patches of the feature map. Maximum pooling A layer downsamples feature maps by maximizing the presence of features in patches of the feature map. Semantic segmentation The process of classifying each pixel belonging to a particular label. Sigmoid activation function A neuron activation function based on a sigmoid function $ f\left(x\right)={(1+{e}^{-x})}^{-1} $. Table A1. Jargon explanation
Category | Predictors | Pressure Layer (hPa) |
Upper air | Temperature, Geopotential height, Relative humidity, U (zonal wind), V (Meridional wind), w (vertical speed) | 100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 850, 900, 925, 950, 1000 |
Surface | 2-m temperature, Ground pressure, 2-m dew point temperature, 10-m U component, 10-m V component | − |
Terrain | Altitude | − |