The Q-Q plots for the daily mean air temperature are given in Fig. 3. Shown are the results for the hybrid downscaling technique (WRF[3 km]+CDF-t) in red and for the higher-resolution dynamic-only downscaling approach (WRF[1 km]) in blue. The numbers at the top of each panel show the MAE between each pair of CDFs, with the panels for a given season sorted out in ascending order of MAE. For all stations and seasons, roughly 1.3% of the total number of points were subjected to the Déqué (2007) out of bounds correction, i.e., less than one point per panel.
Figure 3. Q-Q plots for the daily mean air temperature (K), for the central month of each season (April, July, October 2016 and January 2017), and for the seven stations shown in Fig. 2. The red dots represent the 100 quantiles of the dynamically and statistically downscaled (WRF[3km]+CDF-t) data against that observed (FMI). The blue dots are the same but for the higher-resolution dynamically downscaled (WRF[1km]) data. The main diagonal, which indicates a perfect agreement between each pair of CDFs, is drawn as a black line. The numbers at the top of each panel show the MAE (K) between each set of distributions, with the panels for a given season sorted in ascending order of the hybrid method’s MAE.
Figure 3 shows that the air temperature in the Honkajoki region exhibits a considerable seasonal variability, as expected of a site located at high latitudes. During winter, it hovers around freezing, being warmer than that recorded further east in eastern Finland and neighboring Russia due to the moderating influence of the prevailing midlatitude westerlies, with extreme cold events occurring when the flow is reversed (e.g., Linderson, 2001; Trigo et al., 2002). In the summer, it is lower than that at the referred places, with daily mean values around 290 K, due to the presence of the nearby Gulf of Bothnia and associated mesoscale circulations (e.g., Rummukainen et al., 2001). A visual inspection of Fig. 3 and of the MAE scores reveals that, for the vast majority of the stations and seasons, the hybrid downscaling outperforms the dynamical-only approach. The only stations for which the latter gives the best results are stations 5 and 6 in spring and stations 2, 3 and 5 in winter. While in the spring, summer and autumn seasons the model-predicted temperatures are in general agreement with those observed, in winter the WRF-predictions at all stations exhibit a warm bias, which can exceed 5 K, and has been reported in other studies (e.g., García-Díez et al., 2013). It likely arises from an incorrect simulation of the near-surface atmospheric conditions in stably stratified environments that are ubiquitous in this region and season (Pepin et al., 2009; Steeneveld, 2014). The use of a statistical downscaling technique (CDF-t) does not seem to fix this problem. In spring, on the other hand, in the three coastal stations (2, 3 and 4) WRF has a cold bias, in the three inland stations (5, 6 and 7) it has a warm bias, while at station 1 there is a good agreement between the observed and modelled temperature range. The warm bias in the inland stations may be related to the tendency of the Noah LSM to underestimate the snow cover extent and give a shorter snow season compared to observations (e.g., Sheffield et al., 2003). In fact, and in comparison with the FMI daily data, for the month of April WRF has a tendency to underpredict the snow depth at the location of these stations (not shown). The cold bias in the coastal stations may be due to an incorrect simulation of the observed sea-ice cover/depth over the Gulf of Bothnia, both fields read in from the 6-hourly 0.5° × 0.5° CFSR dataset (Saha et al., 2010), as the sea-ice in the region usually melts in April (Leppäranta and Seinä, 1985). In addition, in the default Noah LSM configuration in WRF there are four soil layers with a maximum depth of 2 m. A lower soil depth may be needed for the model to successfully simulate the freeze–thaw cycles that are important to the Arctic system (Barlage et al., 2008). For the month of January, the snow depth biases have both positive and negative signs, so they cannot fully explain the warm temperature bias shown in the Q-Q plots in the last column of Fig. 3. For nearly all stations and seasons, the temperature extremes (i.e., the first and last quantiles) are not well captured by both methods. These are harder to simulate, and probably require higher spatial resolutions than those used in this work (3 or 1 km).
The model performance is quantified using the CvM diagnostic, with the scores given in Table 1. As discussed in section 2.4, the smaller the CvM score, the more skillful the model forecast. The results in Table 1 show that, in line with the MAE scores given in Fig. 3, for nearly all stations and seasons the hybrid downscaling approach (WRF[3 km]+CDF-t) gives the smallest CvM values, most of the time by an order of magnitude, compared to those obtained with the dynamical-only downscaling approach (WRF[1 km]). The only exceptions are station 6 in spring and station 2 in winter, for which the hybrid approach gives higher MAEs. Besides an individual inspection of the scores, the Wilcoxon rank-sum test is performed to quantify whether the two distributions of the CvM scores are statistically different. As the CvM scores from the hybrid downscaling approach are generally smaller than the ones from the dynamical-only downscaling approach, a one-sided test is performed, i.e., the alternative hypothesis is that the distribution for the dynamical-only downscaling approach is shifted to the right of the one for the hybrid downscaling approach. The p-value is found to be smaller than 0.05, indicating that there is less than 5% probability that the CvM scores obtained with the two approaches are drawn from the same distribution. In other words, the distributions of the CvM scores are statistically different, with the hybrid approach being closer to the observed data for the daily mean temperature. The CvM scores for the two methods averaged over all stations and seasons are 0.15 and 0.84, respectively. When averaged over all stations and seasons, the RMSE and MAE for the WRF[3 km]+CDF-t are 1.5 K and 1.1 K, whereas for WRF[1 km] they are 1.9 K and 1.6 K, respectively. In other words, all skill scores show that the hybrid method outperforms the dynamical-only downscaling approach, giving predictions generally within 2 K of the observed value. Hence, a WRF[3 km]+CDF-t downscaling technique is not only computationally cheaper than a WRF[1 km] downscaling method, it also gives more skillful predictions of the daily mean air temperature, at least for the Honkajoki region in Finland.
Season Method Station 1 2 3 4 5 6 7 Spring: April 2016 WRF[3 km]+CDF-t 0.06 0.19 0.54 0.27 0.41 0.48 0.42 WRF[1 km] 0.72 2.63 3.31 2.38 0.43 0.17 0.56 Summer: July 2016 WRF[3 km]+CDF-t 0.71 0.24 0.29 0.17 0.06 0.05 0.02 WRF[1 km] 0.91 2.72 2.33 1.40 0.43 0.33 0.95 Autumn: Octorber 2016 WRF[3 km]+CDF-t 0.18 0.08 0.04 0.07 0.05 0.05 0.04 WRF[1 km] 0.62 0.13 0.20 0.15 0.33 0.22 0.48 Winter: January 2017 WRF[3 km]+CDF-t 0.03 0.07 0.06 0.06 0.07 0.06 0.03 WRF[1 km] 0.41 0.03 0.30 0.27 0.30 0.37 0.35
Table 1. CvM scores for the daily mean air temperature obtained with the hybrid product (WRF[3 km]+CDF-t) and the higher-resolution dynamically-only downscaling (WRF[1 km]) with respect to the FMI data, for the central month of each season (April, July, October 2016 and January 2017) and for the seven stations shown in Fig. 2. The CvM scores for the stations/seasons for which (WRF[1 km]) outperforms (WRF[3 km]+CDF-t) are highlighted in bold.
In addition to the daily mean temperature, the FMI dataset also provides daily temperature extremes. As they may occur more frequently in a hypothetical warmer world (e.g., Kjellström et al., 2007; Koenigk et al., 2013), it is also of interest to consider them. Figure 4 and Table 2 are similar to Fig. 3 and Table 1 but for the daily maximum air temperature. For this field, and for all stations and seasons, roughly 1.5% of the total number of points are subjected to the Déqué (2007) out of bounds correction, i.e., less than one point per panel.
Figure 4. As in Fig. 3 but for the daily maximum air temperature (K).
Season Method Station 1 2 3 4 5 6 7 Spring: April 2016 WRF[3 km]+CDF-t 0.13 0.27 0.49 0.16 0.17 0.21 0.31 WRF[1 km] 0.73 3.68 3.28 2.64 0.50 0.25 0.37 Summer: July 2016 WRF[3 km]+CDF-t 0.07 0.16 0.30 0.10 0.06 0.04 0.05 WRF[1 km] 0.40 3.63 2.43 1.61 0.45 0.22 0.78 Autumn: Octorber 2016 WRF[3 km]+CDF-t 0.04 0.17 0.07 0.11 0.04 0.07 0.06 WRF[1 km] 0.12 0.25 0.20 0.15 0.08 0.11 0.14 Winter: January 2017 WRF[3 km]+CDF-t 0.14 0.21 0.10 0.13 0.16 0.09 0.09 WRF[1 km] 0.63 0.09 0.31 0.67 0.51 0.69 0.42
Table 2. As Table 1 but for the daily maximum air temperature.
As is the case for the daily mean temperature, the hybrid method gives the lowest MAEs for nearly all the stations and seasons, except for station 6 in spring and 2 in winter. In addition, some of the findings reached in the analysis of Fig. 3 hold for the daily maximum temperature, such as (i) warm bias in the model at the location of all stations in winter, at times with a magnitude larger than 5 K; (ii) warm bias in the inland stations (5, 6 and 7) and cold bias in the coastal stations (2, 3 and 4) in spring; (iii) smaller-magnitude biases in summer and autumn, particularly for the WRF[3 km]+CDF-t downscaling; (iv) not so skillful simulation of the extremes of the daily maximum temperature distribution (i.e., the tails of the CDF distributions).
Except for station 2 in winter, the CvM scores for the WRF[3 km]+CDF-t approach are smaller than those obtained with the WRF[1 km] downscaling. For this station, the MAE given by the hybrid approach is also higher. The CvM scores for the two methods averaged over all stations and seasons are 0.14 and 0.91, with RMSE values of 1.8 K and 2.4 K, and MAE values of 1.4 K and 2.1 K, respectively. The p-value obtained with the Wilcoxon rank-sum test with the same alternative hypothesis as for the daily mean temperature is 0.018. Hence, and as is the case for the daily mean temperature, the two distributions of CvM scores are statistically different from each other, with the hybrid approach giving more accurate predictions according to all the verification diagnostics considered. Despite the slightly higher magnitude of the RMSEs and MAEs compared to those of the daily mean temperature, the model-predicted daily maximum temperatures are generally within 2 K of those observed, in particular for the WRF[3 km]+CDF-t downscaling approach. As temperature extremes are likely to change more than the mean values in a hypothetical warmer world (e.g., Seneviratne et al., 2014), it is of interest to look into how the model performs in the warm (summer) and cold (winter) seasons separately. The averaged CvM scores for the hybrid, WRF[3 km]+CDF-t, and dynamical-only, WRF[1 km], downscaling are 0.11 and 1.36 for the summer and 0.13 and 0.47 for the winter season, respectively. The corresponding RMSE (MAE) scores for the hybrid and dynamical-only simulations are 1.8 K (1.4 K) and 2.9 K (2.6 K) for the summer, and 1.6 K (1.2 K) and 1.9 K (1.5 K) for the winter season, respectively. Hence, while WRF[3 km]+CDF-t clearly outperforms WRF[1 km] in the summer season, in winter the scores are more comparable, even though the hybrid approach still has the edge.
Figure 5 is similar to Fig. 3 and Table 3 is similar to Table 1 but for the daily minimum air temperature. For this field, and for all stations and seasons, roughly 0.6% of the total number of points are subjected to the Déqué (2007) out of bounds correction, i.e., less than one point per panel.
Figure 5. As in Fig. 3 but for the daily minimum air temperature (K).
Season Method Station 1 2 3 4 5 6 7 Spring: April 2016 WRF[3 km]+CDF-t 0.53 0.06 0.47 0.18 0.89 0.98 0.66 WRF[1 km] 0.24 1.10 2.44 1.56 0.25 0.10 0.18 Summer: July 2016 WRF[3 km]+CDF-t 1.43 0.12 0.25 0.07 0.09 0.08 0.12 WRF[1 km] 1.56 1.14 2.14 1.10 0.49 0.59 1.04 Autumn: Octorber 2016 WRF[3 km]+CDF-t 1.14 0.03 0.21 0.04 0.32 0.58 0.56 WRF[1 km] 0.64 0.05 0.13 0.04 0.80 0.10 0.74 Winter: January 2017 WRF[3 km]+CDF-t 0.18 0.04 0.08 0.02 0.05 0.06 0.10 WRF[1 km] 0.26 0.03 0.17 0.09 0.33 0.26 0.37
Table 3. As Table 1 but for the daily minimum air temperature.
While for the daily mean and maximum air temperatures the hybrid method clearly outperforms the dynamical-only downscaling approach, for the daily minimum air temperature the performance of the two techniques is more comparable, with generally higher MAEs. For this field, both methods show a clear warm bias not just in winter but in all seasons, even though it has a larger magnitude in the cold season. These warmer nighttime temperatures may arise from excessive cloud cover and/or deficiencies in the Noah LSM (e.g. Katragkou et al., 2015; Bastin et al., 2018). For summer and autumn, and except for stations 1 and 2 in the latter, the hybrid approach gives the lowest MAEs. In winter and spring, however, the results are mixed, even though in the former the main difference between the two methods is in the tail of the distribution. The magnitude of the MAEs is also higher for this field: while for the daily mean and maximum air temperatures the hybrid method always gives MAEs lower than 2 K, for this method and for the minimum air temperature the MAEs exceed 2 K at more than 40% of the stations. Recent work has shown that the daily minimum temperature is expected to change more significantly than the daily maximum and mean air temperatures in a hypothetical warmer world (e.g., Nikulin et al., 2011). It is possible then that the poorer performance of the hybrid method for this field may also arise from the violation of the stationary assumption made in the development of the statistical technique.
In line with the Q-Q plots shown in Fig. 5, no method is found to outperform another when inspecting the CvM scores given in Table 3. A comparison of Table 3 with Tables 1 and 2 reveals that, while for the daily mean and maximum temperatures WRF[1 km] only outperforms WRF[3 km]+CDF-t in two cases and one case, respectively, for the daily minimum temperature, the dynamical-only downscaling outperforms the hybrid approach in a total of eight cases. Looking at each season separately, four out of the eight cases occur in spring, three in autumn, and one in winter. The poorer performance of WRF[3 km]+CDF-t seems to be almost exclusively in the transition seasons. As stated before, the WRF forecasts at this time of the year, and particularly in spring, are less skillful, possibly because of deficiencies inherent to the Noah LSM that lead to discrepancies between the observed and modelled snow cover and extent of the snow season (Sheffield et al., 2003). This is confirmed when the observed and modelled snow depths are compared (not shown). The fact that the two methods give more comparable results may suggest that the CDF-t does not work so well in the transition seasons. One possible explanation is that, while in both April and October the daylight period is still relatively long, allowing for a more well-mixed atmosphere and therefore more predictable daytime temperatures, at night the better representation of the static fields and local-scale dynamics in the WRF[1 km] run may give it the edge at the location of some of the stations. Despite this, however, the p-value obtained with the Wilcoxon rank-sum test with the same alternative hypothesis as for daily mean and maximum temperatures is 0.018, indicating that at the 95% confidence level the hybrid downscaling still gives the best results. The averaged CvM scores over all stations and seasons for the hybrid and dynamical-only downscaling methods are 0.33 and 0.64, respectively. For the summer season, the scores for the WRF[3 km]+CDF-t and WRF[1 km] are 0.30 and 1.15, and for the winter season they are 0.08 and 0.22. The corresponding RMSEs (MAEs) values are 2.3 K (1.7 K) and 2.2 K (1.9 K) for all seasons, 1.8 K (1.5 K) and 2.5 K (2.2 K) for the summer season, and 2.9 K (2 K) and 2.1 K (1.7 K) for the winter season, respectively. It is interesting to note that, while for the summer season WRF[3 km]+CDF-t gives the smallest RMSE and MAE, for winter WRF[1 km] performs the best according to those two scores, even though its CvM score is higher. This apparent contradiction can be explained by looking at the Q-Q plots in Fig. 5. The main difference between the two modelled CDFs is in the tail of the distribution, with the CDF of the hybrid method showing larger discrepancies with respect to that observed. However, overall the hybrid method’s CDF is closer to the main diagonal compared to that given by the WRF[1 km] simulation, and hence it has a lower CvM but a higher RMSE and MAE. As highlighted before, the model-predicted daily minimum air temperatures forecasts are not as skillful as those for the daily mean and maximum air temperatures but are still generally within 3 K of those observed.
In the discussion so far the temperature distributions predicted by WRF[3 km]+CDF-t and WRF[1 km] are evaluated against those observed through the analysis of the correspondent Q-Q plots and the MAE and CvM scores. In order to allow for a more direct evaluation of the three temperature distributions, in Fig. 6 they are shown at the location of station 2 for the summer season. Similar results are obtained at the other stations for this season (not shown). The curves plotted in Fig. 6 are in line with the findings highlighted before: WRF[3 km]+CDF-t gives more skillful air temperature predictions compared to WRF[1 km], particularly for the daily-mean and maximum air temperatures. For these two variables, the three temperature distributions are bimodal, with the WRF[1 km] distribution exhibiting a clear cold bias, more significant for the maximum temperature. The two model-predicted distributions are more similar for the daily minimum temperature, being close to a Gaussian shape, as is the case for the observed data, which is consistent with the more comparable MAE and CvM scores given in Fig. 5 and Table 3.