-
In this study, we used data from the ECMWF-IFS (grid resolution: 0.125°) from 2005 to 2018. The forecast data are issued twice a day at 0000 UTC and 1200 UTC, and the forecast lead time is from 24 h to 240 h (10 days). The ground truth used by the correction methods is the ERA5 dataset with the same grid resolution; this dataset has been widely used to replace the previous reanalysis dataset ERA-Interim (Hersbach et al., 2020). In addition, ERA5 is often used as the observation data in studies of numerical model bias correction (He et al., 2019; Hersbach et al., 2020). It should be noted that ECMWF-IFS is a forecast model while ERA5 is a reanalysis model.
The study domain is located at 35.125°–47°N and 103°–126.875°E, which roughly covers northeast China. The grid size is 96 × 192 (lat × lon). The study domain and terrain features are shown in Fig. 1. In this paper, the ECMWF-IFS grid forecast data will also be referred to as the "forecast data", which provides predictor variables (inputs to CU-Net); the ERA5 provides target variables (correct answers for outputs from CU-Net).
This study uses 14 years (2005–18) of ECMWF-IFS forecast and ERA5 data. The 2005–16 data were used as the training dataset, and the 2017 and 2018 data were used as the validation and test datasets, respectively. As the correction was performed at 24 h intervals from 24–240 h, we needed to train 10 models corresponding to each correction, i.e., 24 h, 48 h, 72 h, and so on, up to 240 h. For example, the input of the 24 h correction model included observation data (ERA5) and the 24 h forecast data of ECMWF-IFS at the issue time t, whereas the label data, or ground truth data, corresponded to the observation data at t + 24 h. Table 1 shows the training, validation, and testing data sample statistics for the 10 (24–240 h) models.
Lead time Number of training examples 24 h 8760 48 h 8758 72 h 8756 96 h 8754 120 h 8752 144 h 8750 168 h 8748 192 h 8746 216 h 8744 240 h 8742 Table 1. Statistics of the training, validation, and testing datasets for 10 models. There are 730 validation and 730 testing examples for each lead time.
-
CU-net and ANO were used to correct the forecast of four weather variables from ECMWF-IFS: 2m-T, 2m-RH, 10m-WS, and 10m-WD. For each weather variable, the corresponding correction performance is discussed according to different seasons. In this study, spring includes March, April, and May; summer corresponds to June, July, and August; autumn consists of September, October, and November; and winter includes December, January, and February.
We use the root mean square error (RMSE) to evaluate the correction performance, which is defined as:
where T represents the number of samples in the testing dataset; M represents the length of the study domain on the x-axis; and N represents the width of the study domain on the y-axis.
$ {p}_{t,i,j} $ represents the forecast value or corrected value at (i, j) at the forecast issue time t, and$ {y}_{t,i,j} $ represents the observation value of (i, j) at t. T is 730, M is 192, and N is 96 in this study.According to the China Meteorological Administration’s standard “QXT 229–2014 Verification Method for Wind Forecast,” the RMSE (in degrees) of wind direction is defined as follows:
The bootstrap method for significance test was used in this study. The null hypothesis is that the correction method provides no improvement over ECMWF-IFS. First, setting R as the number of bootstrap replicates (1000 in our case) and N as the number of examples in testing data, R bootstrap replicates of the testing data were created. Then, for each bootstrap replicate, the difference between the models in the score of interest (e.g., RMSE for CU-Net minus RMSE for ECMWF) was calculated. Finally, these R values of the difference in the score of interest were used to perform significance testing at the 95% level. In this study, all the confidence intervals at the 95% level were also created with bootstrapping.
-
Figure 5 shows the RMSE spatial distribution of the corrected 24 h forecast for 2m-T in all seasons in 2018. Significance tests were conducted on the data for the whole year, and significant grid points (at the 95% confidence level) are represented in Fig. 5 with stippling.
Figure 5. Root mean square error (RMSE) distributions of the corrected 24 h forecast of 2m-T in different seasons in 2018. The left column represents the forecast errors of ECMWF, whereas the middle and right columns are for corrected product based on ANO and CU-net, respectively. In each panel, points with stippling denote places where differences with respect to ECMWF-IFS are statistically significant at the 95% level. The number on the right represents the percentage of stippled points in all points, which means the correction methods provide improvement over ECMWF-IFS on these grid points.
The forecast RMSE of ECMWF- IFS (as shown in the left column of Fig. 5) is relatively large in spring and winter, and smaller in summer and autumn; the error over the ocean is very small, whereas the error over complex terrain is relatively large. Both ANO (middle column of Fig. 5) and CU-net (right column of Fig. 5) had smaller RMSE than raw IFS output; however, CU-net outperformed ANO in every season, as well as for the whole year. Over areas with complex terrain, the RMSE of the ANO method exceeded 2.0°C, whereas CU-net reduced this to about 1.5°C.
Figure 6a shows the RMSE values for temperature for each model in different seasons. CU-net had smaller RMSEs than ANO, which also outperforms ECMWF-IFS. In all seasons, CU-net outperforms ANO and ECMWF-IFS. Table 2 shows the bias, mean absolute error (MAE), correlation coefficient, and RMSE values of the corrected 24 h forecast for 2m-T in 2018 using ECMWF, ANO, and CU-net. The confidence intervals are at the 95% confidence level. ANO achieves better performance than ECMWF-IFS, but CU-net has the best performance in terms of all evaluation metrics.
Figure 6. RMSE of the corrected 24 h forecast in all seasons in 2018 for 2m-T (a), 2m-RH (b), 10m-WS (c), and 10m-WD (d). The confidence intervals at the 95% confidence level are shown with black error bars.
Score ECMWF-IFS ANO CU-net RMSE (1.68, 1.75) (1.47, 1.52) (1.21, 1.25) Bias (0.27, 0.40) (−0.17, −0.10) (0.07, 0.14) MAE (1.26, 1.31) (1.11, 1.15) (0.91, 0.94) CC (0.95, 0.96) (0.95, 0.96) (0.96, 0.97) Table 2. Bias, MAE, correlation coefficient (CC), and RMSE of the corrected 24 h forecast for 2m-T. The confidence intervals are at the 95% confidence level.
Figure 7 shows an example 24 h forecast case at 1200 UTC on 11 January 2018. It is obvious that the corrected result using CU-net is more consistent with the observation (ERA5). It should be noted that ERA5 is reanalysis data, which is smoother than ECMWF and ANO. As CU-net uses ERA5 as the ground truth to perform correction, its result also seems smooth.
Figure 7. Illustration of 24 h 2m-T forecast at 1200 UTC on 11 January 2018: (a) ECMWF; (b) corrected forecast using ANO; (c) corrected forecast using CU-net; (d) ERA5.
For longer-term forecasts of 2m-T, Fig. 8 shows the change in the RMSE of CU-net and ANO according to different forecast lead times (24–240 h). CU-net achieved the smallest RMSE for all forecast lead times. Even for the 240 h forecast, CU-net had a percentage decrease of 10.75%, compared to almost 0% for ANO.
-
Figure 9 shows the RMSE spatial distribution of the corrected 24 h forecast for 2m-RH. The same significance tests as in Fig. 5 were conducted for the data from different seasons. Compared to the ECMWF results in the left column of Fig. 10, both ANO and CU-net exhibited improved forecast accuracy; however, CU-net was superior to ANO for every season, as well as for the entire year. In Fig. 9, the area marked in red represents a large RMSE of about 0.14. For the winter season, over the red areas, we can see that ANO and CU-net reduced the RMSE to 0.12 and <0.1, respectively.
Figure 9. Same as Fig. 5, but for 2m-RH.
Figure 10. Illustration of 24 h 2m-RH forecast at 1200 UTC on 19 October 2018: (a) ECMWF; (b) corrected forecast using ANO; (c) corrected forecast using CU-net; (d) ERA5.
Figure 6b shows the RMSE values for each model in different seasons for 2m-RH. The confidence intervals are at the 95% confidence level. ANO achieved positive correction performance in spring, autumn, and winter, but had negative performance during the summer. By contrast, CU-net achieved better performance in all seasons than ANO and ECMWF-IFS. Table 3 shows the bias, MAE, correlation coefficient, and RMSE values of the corrected 24 h forecast for 2m-RH. CU-net achieved the best performance for all four evaluation metrics.
Score ECMWF-IFS ANO CU-net RMSE (8.80, 9.11) (8.32, 8.53) (6.83, 7.03) Bias (1.38, 1.92) (0.08, 0.48) (−0.22, 0.09) MAE (6.47, 6.69) (6.19, 6.36) (5.09, 5.23) CC (0.88, 0.89) (0.88, 0.89) (0.91, 0.92) Table 3. Same as Table 2 but for 2m-RH.
Figure 10 shows an example 24 h forecast case at 1200 UTC on 19 October 2018, to illustrate that the corrected result using U-net is more consistent with the observation (ERA5).
For longer-term forecasts of 2m-RH, Fig. 11 shows the change in the RMSE of CU-net and ANO, according to different forecast lead times (24–240 h). Similar to the results of the 2m-T correction discussed above, CU-net achieved the smallest RMSE for all forecast lead times. For the 240 h forecast correction, ANO showed a percentage decrease of –2.26% compared to 20.14% for CU-net.
Figure 11. Same as Fig. 8, but for 2m-RH.
-
Figure 12 shows the RMSE spatial distribution of the corrected 24 h forecast for 10m-WS. Significance tests were also conducted in the same way as in Fig. 5. Compared to the results of ECMWF in the left column of Fig. 12, ANO showed no improvement, whereas CU-net showed obvious improvements in all seasons.
Figure 12. Same as Fig. 5, but for 10m-WS.
Figure 6c shows the RMSE values for each model in different seasons for 10m-WS. ANO improved over the ECMWF only in autumn and had negative performance for all other seasons, whereas CU-net achieved improvement in all seasons. Table 4 shows the bias, MAE, correlation coefficient, and RMSE values of the corrected 24 h forecasts for 10m-WS using ECMWF, ANO, and CU-net. The confidence intervals are at the 95% confidence level. Again, CU-net has the best correction performance in terms of all evaluation metrics.
Score ECMWF-IFS ANO CU-net RMSE (1.05, 1.08) (1.06, 1.09) (0.76, 0.79) Bias (−0.22, −0.20) (0.13, 0.15) (−0.01, 0.02) MAE (0.76, 0.79) (0.80, 0.82) (0.55, 0.57) CC (0.84, 0.86) (0.84, 0.85) (0.89, 0.90) Table 4. Same as Table 2 but for 10m-WS.
Figure 13 shows the 24 h forecast on 15 March 2018, and therein the red ellipse indicates that ECMWF and ANO have obvious error while CU-net’s correction is more consistent with the observations.
Figure 13. Illustration of 24 h 10m-WS forecast at 1200 UTC on 15 March 2018: (a) ECMWF; (b) corrected forecast using ANO; (c) corrected forecast using CU-net; (d) ERA5. The red ellipse indicates that ECMWF and ANO have obvious error while CU-net’s correction is more consistent with the observations.
For longer-term forecasts of 10m-WS, Fig. 14 shows the change in the RMSEs of CU-net and ANO, according to the forecast lead time. Similar to the results of 2m-T and 2m-RH correction, CU-net achieved the smallest RMSE for all forecast lead times. In general, ANO did not perform well for 10m-WS forecast correction. For all different forecast lead times, ANO did not have a positive correction effect. CU-net continued to provide positive results as the forecast lead time approached 240 h.
Figure 14. Same as Fig. 8, but for 10m-WS.
-
Figure 15 shows the RMSE spatial distribution of the corrected 24 h forecast for 10m-WD. Significance tests were also conducted in the same way as in Fig. 5. Similar to the results for 10m-WS correction, ANO showed no improvement and in some cases showed worse results. CU-net showed improvements in all seasons. Notably, the correction of wind direction has been a challenging issue, as described in previous studies (Bao et al., 2010).
Figure 15. Same as Fig. 5, but for 10m-WD.
Figure 6d shows the RMSE values for each model in different seasons for 10m-WD. ANO did not show a positive performance in any season. By contrast, CU-net achieved a positive performance in all seasons. Table 5 shows the bias, MAE, correlation coefficient, and RMSE values of the corrected 24 h forecast for 10m-WD. The confidence intervals are at the 95% confidence level. Figure 16 shows the forecast results on 18 April 2018. Similar to previous experiments, CU-net’s correction is more consistent with the observation although it has a smoothing effect.
Score ECMWF-IFS ANO CU-net RMSE (38.22, 39.41) (38.84, 40.09) (30.98, 32.24) Bias (23.11, 24.05) (23.82,24.79) (17.77, 18.67) MAE (23.11, 24.05) (23.81,24.79) (17.77, 18.67) CC (0.60, 0.62) (0.59, 0.61) (0.68, 0.70) Table 5. Same as Table 2 but for 10m-WD.
Figure 16. Illustration of 24 h 10m-WD forecast at 1200 UTC on 18 April 2018: (a) ECMWF; (b) corrected forecast using ANO; (c) corrected forecast using CU-net; (d) ERA5.
For longer-term forecasts of 10m-WD, Fig. 17 shows the change in the RMSE of CU-net and ANO according to different forecast lead times. Again, CU-net achieved the smallest RMSE for all forecast lead times. Similar to the 10m-WS correction, ANO did not perform well for the 10m-WD forecast correction and did not have a positive correction effect for any lead time. Although CU-net continued to provide positive results as the forecast lead time increased, its performance continued to degrade from 18.57% (24 h) to 3.7% (240 h).
Figure 17. Same as Fig. 8, but for 10m-WD.
-
In order to identify if CU-net has conditional bias, we use reliability curves to further evaluate its performance for each forecast value. The 24 h forecast results are used for analysis. As shown in Fig. 18, the reliability curve has forecast values on the x-axis and mean observation on the y-axis. The black line is the perfect-reliability line (x = y) which represents the observation ERA5 (correct answers).
Figure 18. Reliability curves of the corrected 24 h forecast in 2018 for 2m-T (a), 2m-RH (b), 10m-WS (c), and 10m-WD (d).
Figures 18a and b show that for 2m-T and 2m-RH, CU-net performs better than other methods. For 10m-WS, as shown in Fig. 18c, CU-net achieves overall good performance, though there are slight underestimations whenever it predicts >~6.5 m s−1. For 10m-WD, Fig. 18d shows that CU-net obviously fits better with the diagonal line than other models. Besides, all panels show that each model has conditional bias, while CU-net has smallest bias.
-
This section describes an additional experiment that was conducted to test whether including terrain information in the proposed CU-net model could offer further improvements in the correction (Steinacker et al., 2006). The terrain data Q (a grid of orographic height), along with pt+Δt and yt, were input into the CU-net model, as shown in Fig. 3; the new model with terrain data is referred to as TCU-net. The experimental results of the 24 h forecast correction are shown in Table 6. The confidence intervals are at the 95% confidence level. It can be seen that TCU-net improved the performance, as all four weather variables showed smaller RMSEs after including the terrain information.
Variables ECMWF CU-net TCU-net 2m-T (°C) (1.68, 1.75) (1.21, 1.25) (1.17, 1.21) 2m-RH (%) (8.80, 9.11) (6.83, 7.03) (6.60, 6.77) 10m-WS (m s−1) (1.05, 1.08) (0.76, 0.79) (0.76, 0.79) 10m-WD (°) (38.22, 39.41) (30.98, 32.24) (30.78, 32.05) Table 6. RMSE of the corrected 24 h forecast in 2018 for four weather variables using ECMWF, CU-net, and TCU-net. The confidence intervals are at the 95% confidence level.
-
Some studies have shown that the ANO method has more stringent requirements on the length of the time period of the data (Chang et al., 2015). In this study, we used 14 years of data. Using longer-term data may help improve the correction performance of ANO. It also should be mentioned that, for the ANO method, each grid point remained independent during the correction process. However, weather phenomena are continuous not only in time but also in space (i.e., each grid point is impacted by its neighboring grid points). Hence, it is necessary to take these spatial impacts into account, which happens to be the strength of CNN since it inherently learns the spatial information through the convolution operations. In general, the above factors explain the limited performance of ANO compared to CU-net.
Lead time | Number of training examples |
24 h | 8760 |
48 h | 8758 |
72 h | 8756 |
96 h | 8754 |
120 h | 8752 |
144 h | 8750 |
168 h | 8748 |
192 h | 8746 |
216 h | 8744 |
240 h | 8742 |