The performance of each model in simulating the climatology of indices in the period 1986-2005 is summarized in a "portrait" diagram, just as in (Gleckler et al., 2008). First, the median of the model is obtained by calculating the multi-model median of each index and then obtaining its relative RMSE. The performance of each model is assessed with respect to CN05.1. Table 4 presents a summary of relative errors in the model by using "portrait" diagrams. In the table, different colors are used to characterize the magnitudes of the RMSEs; warmer colors indicate those models that perform worse, while colder colors indicated those that perform better (Gleckler et al., 2008). The portraits are arranged such that the columns are labeled by the name of the model and the rows by the extreme index name.
Consistent with the results of other multi-model studies (Flato et al., 2013; Gleckler et al., 2008; Sillmann et al., 2013; Sheffield et al., 2013), the ensemble mean generally outperforms individual models because part of the systematic errors of the individual models are offset in the multi-model mean. Most temperature indices are also captured reasonably well for most models, particularly in IPSL-CM5A-MR, MPI-ESM-LR, and CMCC-CMS. It is also noted that models with higher resolution often do not exhibit better performance than those with lower resolutions in simulations of indices of extreme temperatures (e.g., INMCM4, whose resolution is higher than NorESM1-M etc.).
For the construction of the percentile indices (TX90p, TX10p, TN90p, and TN10p), the performances of the models are generally good over China. The comparison between global results (Sillmann et al., 2013) and regions of China, based on RMSE', further indicates that temperature-based percentile indices are generally better captured in China. The magnitude of the multi-model mean error over China, as measured by RMSE', is generally larger for the threshold indices than the duration and absolute indices.
4.2.1. Intensity extremes (threshold indices and absolute indices)
The spatial patterns in simulating the threshold indices in the CMIP5 ensemble are shown in Fig. 2. These spatial pattern features of simulated threshold indices are similar to those of observations derived by daily minimum or maximum temperatures, e.g. FD or SU. Some fluctuations in extreme temperatures in the simulated indices, caused by regional topography, exhibit good performance, e.g., the low value area in the high temperature zone of the low elevation region in the northwest, which demonstrates the small-scale characteristics of the simulation. However, inadequacies are illustrated by the lower simulated value in the low elevation region in the southwest.
The spatial structure of absolute indices shows the distribution pattern is similar to the observations. However, there are still some divergences, particularly for high elevation terrain such as the Tibetan Plateau. In these regions, CN05.1 shows higher TXx and TXn values than the ensemble mean of CMIP5. The spatial coverage of the absolute indices of minimum temperature [TNn (0.95) and TNx (0.95)] is better than for maximum temperature [TXx (0.94) and TXn (0.94)] compared with CN05.1 (Fig. 3).
For a detailed evaluation of model performance, extreme temperature indices are analyzed using Taylor diagrams. The Taylor diagrams for extreme temperature indices during 1986-2005 over China are shown in Figs. 4 and 5. Each number corresponds to a region and the performance of its multi-model mean. Radial and angular coordinates indicate the magnitude of normalized standard deviation and correlation, respectively. The radial distance from the origin is proportional to the normalized standard deviation of a pattern. The seven subregions in the Taylor diagrams are shown in Fig. 1. Each field is normalized by the corresponding standard deviation of the reference data (hereafter referred to as NSD) (Gleckler et al., 2008), which allows the ensemble mean in the different subregions (distinguished by number) to be shown in each panel. In this figure, each numbered dot represents a subregion in the ensemble mean simulation, where each number represents a subregion about the ensemble mean. The nearest NSD to 1 in the extreme temperature can be found in the SU index. In contrast with other indices, the Taylor diagram for the SU indices indicate that in most regions they perform relatively well, as they are nearly close to the reference point (Fig. 4).
It is also clear that the accuracy of the model simulation depends on the extreme indices as well as the subregions. Generally, there is a much larger inter-index spread for the subregions (Fig. 4). In some simulated fields, FD shows correlations with the reference data of greater than 0.9 [e.g., the NEC (Northeast China), NC (North China) and SWC (Southwest China) subregions], whereas other subregions have much lower correlations [e.g., NWC (Northwest China) and EC (East China)]. The absolute indices over China are very good, and the absolute indices in the SWC subregion are better than in the other subregions (Fig. 5).
The multi-year mean of extreme temperature indices over each region of China during 1986-2005 were calculated for the different models and the observations. Temporal and spatial averages of extreme temperature indices are also summarized in box-and-whisker plots (Fig. 6). The colored solid mark within the box is the median of the multiple models (blue round solid mark within the box), and the blue dot is the observations. The interquartile model range is the range between the lower (25th) and upper (75th) percentiles of the total model ensemble, and the whiskers are the total inter-model range. It can be seen from Fig. 6 that the models compare well with CN05.1. In particular, the median of the CMIP5 models agrees well with CN05.1 in the representation of FD, SU, and TR, especially the latter, which is closer to CN05.1 compared with the other indices. There is also reasonable correspondence between the CMIP5 multi-model simulation of the median of the absolute indices and CN05.1, with differences typically within several degrees over most subregions of China. However, the results also show that the CMIP5 median of TXn is smaller than CN05.1 across all regions, especially in the SWC subregion.
4.2.2. Duration extremes (duration indices)
The CMIP5 ensemble mean HWDI is close to the observed distribution pattern, and the HWDI index of the ensemble simulation in northern China is lower compared with observations (Fig. 7). However, in northern, northeastern and northwestern China, as well as the Tibetan Plateau area, the simulated CWDI is higher than observed. The reason may be the difference between the modeled and actual topography, especially in the Tibetan Plateau. Also, the number of observation stations in the west is relatively small, such as in the northern part of the Tibetan Plateau to the northern foot of the Kunlun Mountains and Xinjiang's Taklimakan Desert hinterland, which may also have affected the results of the interpolation of the data.
Figure 8 clearly shows which regions exaggerate the amplitude of extreme temperature index (e.g., the NWC subregion) and which models' ensembles grossly underestimate NSD in most regions. Basically, the largest NSD and the smallest correlation index in the temperature indices over China can be found in HWDI (Fig. 8).
These results show that, generally, the CMIP5 median of CWDI (Fig. 6) is larger than in CN05.1 across all regions; and thus, more cold waves are simulated in comparison with CN05.1. The CWDI index also produces the most significant outliers (biases) in the simulation of CWDI compared with CN05.1, and it also exhibits significant outliers in the SWC subregion in the CMIP5 ensemble for CWDI, and in the SC subregion for HWDI.
4.2.3. Frequency extremes (percentile indices)
The models also disagree in terms of the annual mean value over China, insofar as the percentile indices show much larger values compared to CN05.1 with respect to TN10p and TX10p, but much lower values compared to CN05.1 with respect to TN90p and TX90p (not shown). The Taylor diagrams for the percentile indices reveal that some fields show low correlation values and NSD with the reference data in most regions (not shown).
The results of the box-and-whisker plots (Fig. 6) show that the median of the CMIP5 models is generally underestimated with respect to TX90p in comparison with CN05.1 in all regions. The CMIP5 median tends to underestimate the TN10p and Tx90p indices over China compared with CN05.1 in all regions, and overestimates the TX10p index compared with CN05.1 in most regions. The models disagree with CN05.1 with respect to TX90p, showing much smaller values than the CMIP5 median. The discrepancy is most prominent for TN10p in the NEC subregion and TX90p in the CC (central China) subregion, for which the CN05.1 values are located far above the CMIP model range.
A comparison of the spatial structure between the CMIP5 models and observations shows that the main features of the spatial distribution of temperature extremes are captured well by the model ensemble percentile indices. As a whole, the CMIP5 models compare well with CN05.1, and the spatial structure of the ensemble result is better for indices of threshold extremes than for indices of intensity extremes (Alexander et al., 2006). In some regions, the number of observation stations in the west is relatively small, and in the northern part of the Tibetan Plateau to the northern foot of the Kunlun Mountains and Xinjiang's Taklimakan Desert hinterland, the basic distribution of observation sites, which also determines the interpolation of the data of these areas, has relatively large uncertainty. Warm extremes (SU, TR, TNx, HWDI), except TXx, are underestimated over high northern latitudes, particularly in the northwest, while cold extremes (FD, TXn, TNn, CWDI) are overestimated, which is in accordance with previous findings based on IPCC AR4 models (Wang et al., 2008).
(Sillmann et al., 2013) suggested that there is not a clear relationship between a model's spatial resolution and its representation of temperature indices, and thus the results from individual CMIP5 models were not shown. For historical trends, the CMIP5 models' ensemble generally captures well the observed trends in the indices of temperature extremes during 1961-2005. The long-term trends in simulating the historical temporal evolution of the indices for the anomalies are more distinct in Figs. 9 and 10.
4.3.1. Intensity extremes (threshold indices and absolute indices)
The modeled threshold index trends are consistent with CN05.1 (figures not shown). The threshold indices of the ensemble mean show corresponding trends, with increasing numbers of TR [1.31 d (10 yr)-1] and SU [1.64 d (10 yr)-1] and decreasing numbers of FD [-2.02 d (10 yr)-1]. There are similar increasing trends in the models' ensemble mean compared with CN05.1 for SU [1.44 d (10 yr)-1] and TR [1.00 d (10 yr)-1] during 1961-2005.
The multi-model ensemble mean shows similar warming trends in the absolute indices starting in the 1960s, compared with CN05.1 (Fig. 9), with a general increase in TNn, TNx, TXx, and TXn. In China, the increases in both TNx [0.23°C (10 yr)-1] and TNn [0.29°C (10 yr)-1], which depend on the minimum temperature, are greater than the increases in both TXx [0.21°C (10 yr)-1] and TXn [0.24°C (10 yr)-1], which depend on maximum temperature.
4.3.2. Duration extremes (duration indices)
During 1961 to 2005, the mean CMIP5 models simulate a decrease in CWDI [-0.61 d (10 yr)-1], while showing a strong increase in HWDI [1.38 d (10 yr)-1] over China (not shown).
4.3.3. Frequency extremes (percentile indices)
The changes are much more pronounced in the percentile indices compared to the absolute indices, which are derived from annual extremes (Fig. 10). The construction of the percentile indices leads to adequate correlation (0.67-0.91) in the temporal evolution of the CMIP models and the observations. The average across China shows a decrease in terms of cold nights (TN10p) and cold days (TX10p), but an increase in warm nights (TN90p) and warm days (TX90p).
Differences in the percentile index lines are especially prominent between the most recent two decades and for those indices derived from minimum temperature. Increasing trends in warm days (TX90p) [1.21% (10 yr)-1] and nights (TN90p) [1.70% (10 yr)-1] and decreasing trends in cold days (TX10p) [-0.68% (10 yr)-1] and nights (TN10p) [-0.96% (10 yr)-1] can be seen in the CMIP5 models, which is consistent with globally observed changes (Alexander et al., 2006; Donat et al., 2013) and global CMIP5 model results (Sillmann et al., 2013). This is also consistent with the general trend of extreme observations in China, which shows that the frequency of extreme cold temperatures and cold-temperature events is reducing (Ding et al., 2002; Ding et al., 2009).