-
To evaluate the improved performance of the NEPS and the extent to which it can further add application value to weather forecast services in comparison with the NPS and REPS, a more comprehensive evaluation of the performance of the NEPS was done based on an almost one-month period from 1 July 2021 to 25 July 2021. The sourced forecasts from the NEPS and REPS have different temporal resolutions and spatial grid spacings. The REPS runs start at 0000 and 1200 UTC to generate a forecast with 1-hourly output with a grid spacing of 3 km. The NEPS and NPS, which have a grid spacing of 500 m, are run once an hour to generate a forecast with 1-hourly output. For a proper comparison, the REPS forecast was bilinearly interpolated to the NEPS domain and 500-m grid spacing. Since hourly initializations (0000, 0100, …, 2300 UTC) of the NEPS are both from the two cycles at 0000 and 1200 UTC of the REPS, our primary interest is to evaluate the added value brought by the NPS. Therefore, hourly initializations of the NEPS and NPS were all considered and combined for each of the 6-h forecast lead times to compare the NEPS to the REPS and NPS.
Since the NEPS is mainly concerned with forecasting surface weather variables, we verified the 2-m temperature, 10-m wind, and precipitation from deterministic and probabilistic comparative tests, respectively. There are about 4000 automatic stations in the verification domain, including national and regional automatic stations, marked with black dots in Fig. 2. The automatic stations cover most of the topographic elevation range from 0 to 2194 m. The observation is matched to the nearest grid point, and observation uncertainties are not considered.
For the deterministic forecast of the 2-m temperature and 10-m wind, we compared the ensemble mean product of the NEPS with the NPS. The root-mean-square error (RMSE, Yang et al., 2019) was calculated for 2-m temperature and 10-m wind. Deterministic forecasts of precipitation in the NEPS were obtained by using the PM technique (Clark, 2017). To test the forecast skill of precipitation and how well the areal coverage of precipitation matched the observations, the neighborhood-based equitable threat score (ETS, Wang and Yan, 2007; Clark et al., 2010) and bias score (BIAS, Wilks, 2006) were calculated for precipitation using accumulation thresholds of 0.1 mm h–1, 1 mm h–1, 5 mm h–1 and 10 mm h–1.
The formulation of a neighborhood-based ETS is described by Clark et al. (2010). In this study, we set the neighborhood radius (r) to 5 km so that if a given precipitation threshold (q) that is observed at a grid point is met, it is considered a hit if the event is forecast at any grid point within the neighborhood radius (r). If an event is observed or forecasted at a grid point, but no grid points within radius (r) forecast or observe the event, it is considered as a miss or false alarm. Correct negatives are calculated when an event is neither observed nor forecasted at a single point. Then, the neighborhood-based ETS can be computed according to the hits, misses, false alarms, and correct negatives. Using these elements, a neighborhood-based ETS is expressed as:
Similarly, BIAS can be calculated as follows:
The spatial representation of the PM product is given by the ensemble mean and the rainfall amounts are given by the 90-th percentile value in the distribution of ensemble member quantitative precipitation forecasts (QPFs). We chose the 90% value in the distribution of ensemble member QPFs because it is the best frequency distribution of rainfall amount according to the neighborhood-based ETS and BIAS score.
To verify the probabilistic forecast, we compared the probabilistic forecast results of the NEPS and REPS. The Relative Operating Characteristic (ROC) measures the combined effect of the Probability Of Detection (POD) and the False Alarm Rate (FAR). The area under the ROC curve (AROC, Zhong et al., 2017) is often calculated to determine whether the forecast is skillful, and forecasting systems with a ROC area greater than 0.7 are considered useful (Stensrud and Yussouf, 2007).
The RMSE, the proper continuous ranked probability score (CRPS, Gneiting and Raftery, 2007), the percentage of outliers (OUTLIERS, Suklitsch et al., 2015), and the Talagrand histogram (Talagrand et al., 1997; Hamill, 2001) were used to assess the probabilistic 2-m temperature and 10-m wind products.
The CRPS score measures the probabilistic skill of the ensemble forecasting system, and it measures the overall ensemble forecast performance compared to the observations (Hersbach, 2000). A zero value of CRPS translates to the best forecast. The higher the CRPS score, the worse the performance of the ensemble forecasting system. The Talagrand histogram is a tool for testing the reliability of the ensemble forecast system. If the ensemble forecast is reliable, the predicted and the observed values of the ensemble forecast member at a given point should be regarded as a random sample subject to the same probability distribution. Furthermore, the Talagrand histogram indicates bias, with an L-shaped (U-shaped) rank histogram indicating a tendency for members to over-forecast (under-forecast) the variable being examined (Hamill, 2001). The percentage of outliers is the sum of the probabilities at the two ends of the Talagrand histogram. The percentage of outliers shows how many observed values lie outside the full forecasted range. The smaller the outliers, the better the reliability of the ensemble forecasting system.
-
Table 1 shows the 1–6 h aggregate neighborhood-based ETS and BIAS scores of the NEPS and NPS using different accumulation thresholds of 0.1 mm h–1, 1 mm h–1, 5 mm h–1, and 10 mm h-1 and the improvement percentage of the ETS scores of the NEPS compared to NPS for the period of 1 to 25 July 2021. In addition, Fig. 5 presents the neighborhood-based ETS and BIAS scores for various hourly fixed precipitation thresholds of the NEPS and NPS.
Precipitation thresholds 1–6 h aggregate ETS score 1–6 h aggregate BIAS score NEPS NPS Improvement rates of ETS scores NEPS NPS 0.1 mm h–1 0.70 0.32 123% 0.96 1.79 1 mm h–1 0.48 0.31 55% 1.2 1.60 5 mm h–1 0.32 0.14 122% 1.53 1.42 10 mm h–1 0.09 0.10 –13% 1.60 1.04 Table 1. 1–6 h aggregate neighborhood-based ETS and BIAS scores at different precipitation thresholds for the NEPS and NPS and the improvement rates of ETS scores of the NEPS compared with the NPS for the period of 1 to 25 July 2021. Deterministic forecasts of precipitation in the NEPS were obtained from the PM technique.
Figure 5. Neighborhood-based ETS and BIAS scores of the NEPS and NPS, plotted as a function of lead time for the different accumulation thresholds of 0.1 mm h–1, 1 mm h–1, 5 mm h–1, and 10 mm h–1, aggregated for the period of 1 to 25 July 2021. The columns and the lines denote the corresponding ETS and BIAS of the NEPS and NPS, respectively (refer to the legend at the top of the figure). Deterministic forecasts of precipitation in the NEPS were obtained from the PM technique.
It can be seen from Table 1 and Fig. 5 that the NEPS produces more skillful forecasts at the 0.1 mm h–1, 1 mm h–1, and 5 mm h–1 thresholds and that the improvements persist through the 6-h validation period. The improvement rates of ETS scores were respectively 123%, 55%, and 122%. The ETS score of the NEPS PM product at the 0.1 and 1 mm h–1 threshold were about 0.7 and 0.45, respectively, within the 6-h forecast lead time, which was better than that of the NPS (about 0.3). The ETS score of the NEPS at the 5 mm h–1 threshold was about 0.27 within the 6-h forecast lead time, slightly better than the NPS (about 0.12), and there was not much difference in the BIAS score. At the 10 mm h–1 threshold, the NPS performed better and had higher ETS values. The BIAS score of the NEPS was close to 1 at the 0.1 and 1 mm h–1 thresholds through the 6-h forecast period, while the BIAS score of the NEP increased with increasing precipitation threshold. The BIAS was about 2 at the 5 mm h–1 and 10 mm h–1 thresholds for forecast lead times of 2–6 h. The NEPS overpredicted moderate and heavy precipitation amounts, especially at the 10 mm h–1 threshold.
The NPS appeared to have lower ETS scores than the NEPS, indicating that most of the skill comes from ensemble variance information from the REPS. The PM product represents an improvement relative to the deterministic forecast in the NPS. The problem with the NPS is that light precipitation amounts are overpredicted, whereas the BIAS of the NEPS is closer to 1, and the areal coverage of precipitation at the 0.1 mm h–1 and 1 mm h–1 thresholds match better with the observations. However, very little skill is obtained for the NEPS at the 10 mm h–1 accumulation threshold, and the ETS scores of the NEPS are higher than the NPS in the first two hours while they are lower than the NPS in the following forecast hours.
Table 2 presents the 1–6 h aggregate RMSE of the 2-m temperature and 10-m wind field for the NEPS and NPS and the RMSE reduction rate of the NEPS compared with the NPS. In addition, Fig. 6 shows the RMSE and the reduction rate of the NEPS compared with the NPS for different lead times. Table 2 and Fig. 6 show that the deterministic forecasts of 2-m temperature and 10-m wind speed computed from the ensemble mean of the NEPS perform better than the NPS, especially for the 10-m wind speed. The NEPS’ forecasts of 2-m temperature (Fig. 6a) are more skillful than those of the NPS, whose forecast errors are slightly smaller than the NEPS. The NPS showed an RMSE of around 1.7°C, which was reduced to about 1.62°C in the NEPS the first six hours. The RMSE of the NEPS for the 10-m wind field (Fig. 6b) was lower compared to NPS. The RMSE of the 10-m wind field was reduced from 1.2 m s–1 to 1.0 m s–1.
Verification variables 1–6 h aggregate RMSE NEPS NPS RMSE reduction rate 2-m temperature (°C) 1.62 1.70 4.25% 10-m wind field (m s–1) 0.99 1.24 19.02% Table 2. 1–6 h aggregate RMSE of the 2-m temperature and 10-m wind field for the NEPS and NPS and the RMSE reduction rate of the NEPS compared with NPS for the period of 1 to 25 July 2021. Deterministic forecasts of 2-m temperature and 10-m wind field in the NEPS were obtained from the ensemble mean product.
Figure 6. RMSE of 2-m temperature (a) and 10-m wind field (b) for 1–25 July 2021 of 6-hour forecasts with the NEPS (dashed line) and NPS (solid line), the grey histogram represents the RMSE reduction rate of the NEPS compared with the NPS. Deterministic forecasts of the 2-m temperature and 10-m wind field in the NEPS were obtained from the ensemble mean product.
The RMSE of the ensemble mean describes the correctness of the average estimate from the ensemble. The RMSE reduction rate for the 2-m temperature and 10-m wind speed was 4.25% and 19.02%, respectively. The NEPS and NPS systems use the same underlying topography and observational data. The underperformance of the NPS data in estimating the 2-m temperature and 10-m wind field may be largely due to the improper distribution of initial conditions and physical parameterizations from a single numerical model. So the added value of deterministic products comes from the ensemble mean product of the REPS system, which describes the central estimate produced by the ensemble, and the most uncertain aspects of the individual member forecast are filtered out by computing the ensemble mean (Leith, 1974; Holton, 2004).
-
Figure 7 compares the aggregate AROC for various 1–6 h hourly fixed precipitation thresholds for the REPS and NEPS. The REPS generally has lower AROC scores than the NEPS for each precipitation threshold. Thus, most of the increase in the AROC is realized from the NPS. Using a ROC area of 0.7 as a threshold to determine forecast skill, the REPS could not produce useful forecasts when the precipitation threshold was equal to 5 mm h–1, 10 mm h–1, and 25 mm h–1. However, the NEPS provided useful information at all thresholds except 25 mm h–1. This finding indicates that the NEPS can improve the skill of probabilistic precipitation forecasts and yield an added value over the REPS forecasts in predicting the hourly rainfall.
Figure 7. The area under the ROC curve (AROC) using data aggregated from 1 to 25 July 2021 using accumulation thresholds of 0.1 mm h–1, 1 mm h–1, 5 mm h–1,10 mm h–1, and 25 mm h–1, with the REPS and NEPS.
For the 2-m temperature (Fig. 8a), the REPS and NEPS had a very similar U-shaped rank histogram, with both ensembles exhibiting a lack of variability. However, it appeared that the probability of ensemble forecast members falling outside the maximum and minimum values of the NEPS was slightly lower than that of the REPS, as indicated by a flatter rank histogram in the NEPS compared to the REPS. In the case of a 10-m wind, it was found that the REPS had a greater overestimated bias for wind speed than the NEPS by comparing the Talagrand histogram of the REPS and NEPS. The REPS results showed an L-shaped distribution (Fig. 8b), indicating that the ensemble forecasts were systemically large for wind speed. The distribution probability of the NEPS system was relatively flat (Fig. 8b), indicating that the systematic bias was recalibrated to some extent with the NEPS system, thus achieving improved probabilistic forecasts. The added value of probabilistic forecasts comes from the increased horizontal resolution of the underlying topography, the blending of multi-source observation data, and the integrated nowcasting methods. However, the probability of the observation point appearing in the last box is slightly higher than the others, showing a slightly inverse L-shaped distribution. It also provides information that may be used in the future to recalibrate ensemble forecasts through ensemble post-processing methods.
Figure 8. Talagrand diagrams using data aggregated for 1 to 25 July 2021 for the (a) 2-m temperature and (b) 10-m wind for the REPS and the NEPS.
The lines in Fig. 9 show the CRPS values, and the columns represent the percentage of outliers for the 2-m temperature and 10-m wind. Evidently, the additional skill in 2-m temperature and 10-m wind of the NEPS compared with the REPS corresponds to lower CRPS scores. In the case of temperature, the CRPS score decreased from 1.2°C to 1.0°C in the nowcasting ranges of up to 6 h. For 10-m wind, there was a considerable improvement by the NEPS, as the CRPS score was reduced from 1.1 m s–1 to 0.7 m s–1. Still, the NEPS had smaller outliers in both parameters for the first six hours. The percentage of outliers was reduced from 0.47 (REPS) to about 0.3 (NEPS) for temperature; for the 10-m wind, it was reduced from 0.5 (REPS) to 0.38 (NEPS).
Figure 9. Continuous Ranked Probability Score (CRPS; lines) for the (a) 2-m temperature and (b) 10-m wind with the REPS (solid line) and the NEPS (dashed line), and the percentage of outliers (columns) using data aggregated from 1 to 25 July 2021 for the (a) 2-m temperature and (b) 10-m wind in the REPS and NEPS.
This result shows that the integrated probabilistic nowcasting system of the NEPS can improve the skill of ensemble forecasts, providing probabilistic nowcasting with high spatial and temporal resolution. Deterministic ensemble mean values over long periods of time could also be improved and add value to the NPS and the coarser REPS forecasts.
Precipitation thresholds | 1–6 h aggregate ETS score | 1–6 h aggregate BIAS score | ||||
NEPS | NPS | Improvement rates of ETS scores | NEPS | NPS | ||
0.1 mm h–1 | 0.70 | 0.32 | 123% | 0.96 | 1.79 | |
1 mm h–1 | 0.48 | 0.31 | 55% | 1.2 | 1.60 | |
5 mm h–1 | 0.32 | 0.14 | 122% | 1.53 | 1.42 | |
10 mm h–1 | 0.09 | 0.10 | –13% | 1.60 | 1.04 |