-
In what follows, the results of all evaluations associated with each of the eight regions are described. Due to a large number of results (prediction evaluation at 13 stations from 2008 to 2016), only the evaluation of 24-h precipitation at all stations is provided, and then the forecasts are evaluated for different lead times at the end of the section. Since in most parts of Iran precipitation is low in the dry seasons, the evaluations were carried out and reported for the wet seasons only. The wet seasons in Iran generally take place from November to April.
As previously noted, the IDW and Kriging methods were used for spatial interpolation of precipitation forecasts. Nevertheless, the results of these two methods showed no significant difference. Hence, in what follows, only the IDW results are presented.
3.1. Total annual QPF evaluation
-
According to (Modarres, 2006), the G1 region is the dominant precipitation regime in Iran and has a high coefficient of variation with low precipitation in a predominantly arid and semi-arid climate condition. Due to the extent of this region, three stations (Esfahan, Semnan, and Zahedan) were selected. Figure 3 presents the total annual precipitation associated with this region. In most years, all centers overestimated the annual precipitation, while ECMWF offered better precipitation predictions at Semnan and Esfahan compared to that at Zahedan. On the contrary, NCEP performed better in predicting the annual precipitation at Zahedan but comparatively poorly at Esfahan and Semnan.
In the G2 region, which essentially constitutes mountainous areas upstream of the G1 region, three stations were selected: Mashhad, Shahrekord, and Tehran. Similar to G1, all centers overestimated the annual precipitation for most years at Mashhad and Tehran. At Shahrekord, which receives higher precipitation than the other two stations, UKMO underestimated, whereas the other two generally overestimated, the precipitation.
Some centers showed different performance in predicting precipitation in the wet seasons compared with those of the whole year. For example, UKMO, which performed better than the other two models at Shahrekord, was the weakest for the wet season. The total NCEP predicted precipitation over the study period was significantly different from the total observed precipitation at Tehran.
In the G3 region, which encompasses cold regions in northwestern Iran, the station at Tabriz was studied. According to Fig. 3, NCEP predictions were the poorest in all years, except in 2010 and 2011, compared to those of the other centers, while better predictions were achieved by ECMWF compared to those of UKMO and NCEP.
In the G4 region, the stations at Ahvaz and Bandar Abbas were selected. Based on Fig. 3, although all three centers overestimated the annual precipitation, UKMO did quite poorly. For Sanandaj station in the G5 region, similar to other regions, all centers overestimated the annual precipitation. Predictions made by UKMO were better compared to those of ECMWF. Moreover, poorer predictions were made by ECMWF in 2008 and 2009.
In the rainy climate of the G6 region, the station at Babolsar was selected. Based on Fig. 4, the centers overestimated and underestimated precipitation in different years. At Ilam in G7, which generally receives more precipitation than G5, NCEP was the poorest of all the centers, whereas ECMWF's predictions were better than those of UKMO in most years. As shown in Fig. 4, in the G8 region, receiving higher precipitation than the G6 region, ECMWF offered better predictions compared to those of the other centers, while NCEP's was the poorest, underestimating the precipitation in all years.
Overall, the products of all the centers underestimated the precipitation in the relatively wetter climate regions but overestimated the precipitation in dryer climate areas. This implies a systematic bias in forecasts and demands application of bias correction techniques, such as quantile mapping.
3.2. QPF deterministic evaluation
-
For the deterministic evaluation, this study adopted four criteria: the correlation coefficient (r), MAE, RMSE, and RRMSE, whose formulations are presented in Table 3. The results are shown in Fig. 5. Due to limitations in displaying all examined cases, the average performance of the stations in each cluster is presented. Moreover, the results of each station are presented in Table 5.
At Esfahan and Semnan in the G1 region, ECMWF and NCEP yielded the best and poorest scores, respectively. In contrast, at Zahedan, ECMWF and NCEP were the poorest and the best predicting centers, respectively. All in all, in this region, ECMWF was the best and NCEP was the poorest.
In the G2 region, and based on the correlation coefficient, ECMWF at all three selected stations produced the best scores, while NCEP was the poorest. At Shahrekord, UKMO performed well, but was poorest at Mashhad.
In the cold climate of the G3 region, based on all three indicators, ECMWF was the best and NCEP was the poorest of all. In the hot and dry G4 region, NCEP yielded smaller prediction errors compared to those of the other centers, while UKMO performed comparatively poorly in terms of the deterministic evaluation scores.
In the G5 region, of all three centers, UKMO resulted in smaller prediction error, whereas NCEP performed the poorest. In the G6 rainy region, ECMWF and NCEP had the best and poorest scores, respectively. However, in this region, due to higher precipitation relative to other areas in Iran, large prediction errors were produced by all three centers.
At Ilam in the G7 region, ECMWF's predictions were slightly better than those of UKMO; NCEP was the poorest of all. In G8, based on the correlation coefficient and RMSE, ECMWF was the best and UKMO was the poorest.
In general, based on deterministic evaluation, ECMWF in most regions of Iran, UKMO in mountainous regions, and NCEP in southern Iran, provided better results compared to other centers. In addition, TIGGE numerical precipitation predictions at Ilam within the G7 region performed best among all examined stations in terms of annual precipitation.
3.3. QPF dichotomous (yes/no) evaluation
-
This study used four indicators (POD, FAR, ETS and BIAS) for dichotomous evaluation. The evaluation results are shown in Fig. 5. According to the BIAS criteria, which is the ratio of the number of predicted precipitation events to observed precipitation events, NCEP and UKMO respectively offered the best and poorest predictions of the number of precipitation days. ECMWF showed smaller BIAS in the G3 region compared to that of NCEP. All centers overestimated the number of precipitation days.
Based on the ETS score, which measures the fraction of forecast events that were correctly predicted, NCEP achieved comparatively better scores at all stations, except in the G3 region. In addition, the prediction quality of UKMO was poor. However, the very low scores of ETS at most stations represents an inappropriate prediction accuracy of the number of precipitation events.
According to Fig. 5d, POD values are high, which is due to a high BIAS score at most stations. Of all centers, UKMO, due to the higher values of BIAS compared to those of other centers, yielded better POD, while NCEP had the lowest scores. Based on FAR, which represents the number of false alarms in precipitation events, UKMO was the poorest and NCEP, in most regions, was better than other centers. The number of false identifications was quite high in the G1 and G4 regions, most likely due to the rarity of precipitation events in these regions. In conclusion, the number of precipitation events predicted by all three centers was higher than observed, while NCEP had better scores in most regions.
3.4. QPF probabilistic evaluation
-
In this section, the gamma PDF was used to represent the QPF distribution. Four common methods (ROC.Area, CRPS, BS and BSS) were used for the probabilistic evaluation and the results are presented in Fig. 6. BS, which is a function of resolution, uncertainty and reliability, measures the mean squared probability error. BSS, which expresses the BS skill score relative to the reference BS, is usually determined by climatology predictions. CRPS evaluates the accuracy of the probabilistic forecast distribution. The ROC curve is a measure of the prediction's isolation skill in occurrence/non-occurrence of precipitation. The area under the curve is also an evaluation criterion. The values closer to 1.0 represent higher confidence in predictions.
Figure 7 shows the average probabilistic evaluations over the eight study years. Based on BS, precipitation at stations in the G4 region was better predicted than that at other selected stations. However, based on BSS, predictions were poor due to, as previously mentioned, the rarity of precipitation events. In all regions, based on BSS, NCEP showed better prediction capability compared to ECMWF, except in G1 and G3, whereas UKMO was the poorest based on both BS and BSS. Moreover, based on CRPS, UKMO and ECMWF had higher scores in some regions while NCEP did poorly compared to other models. Based on ROC.Area, ECMWF and NCEP yielded the highest and lowest scores, respectively.
As a whole, according to the probabilistic evaluations in Table 5, precipitation at Semnan and Zahedan in the G1 region, as well as Bandar Abbas in G4, were poorly predicted. Mashhad, Zahedan, Ilam had better scores than those of other stations. ECMWF and NCEP performed almost the same, while UKMO performed poorer in the probability of precipitation occurrence/non-occurrence criteria.
Summary results are presented in Table 5, showing ECMWF performed better in all regions. UKMO had slightly better performance compared to NCEP in precipitation prediction. However, according to the dichotomous evaluation, NCEP performed better in almost all regions and could predict precipitation occurrence/non-occurrence better than other centers. Figure 8 presents the evaluation results for lead times of between one and three days. The results clearly illustrate that the precipitation prediction skill decreases with an increase in lead time. This reduction is quite obvious based on CRPS. According to Fig. 8, region G7 had the best scores, while the poorest performance in precipitation prediction was achieved in G1 and G4.
Also, Fig. 9 compares the performance of the models in the dry and wet seasons. Only the results of the rainy regions of G6 and G8 are presented because other regions receive very little precipitation in the dry season. Based on Fig. 9, all models performed better in the wet than in the dry season, whereas UKMO failed in the G8 region for the dry season.
Overall, the results indicate that better numerical prediction performance is expected in regions with high precipitation.