-
Data used in this paper were collected from a database of the KNMI HYDRA Project, created by the Royal Netherlands Meteorological Institute. Nine hourly wind speed time series over 40 years were chosen for the following analysis (see Table 1). Wind speed values have been corrected for differences in measuring height and local roughness in the upstream sector (Wever and Groen, 2009). Missing data in the records were either interpolated by the cubic spine interpolation method, or just ignored if they appeared in the beginning of records.
Station T0 (day/month/year) Nm Schiphol 1/3/1950 0 De Bilt 1/1/1961 1 Soesterberg 1/3/1958 4 Leeuwarden 1/4/1961 0 Eelde 1/1/1961 1 Vlissingen 2/1/1961 0 Zestienhoven 1/10/1961 4 Eindhoven 1/1/1960 1 Beek 1/1/1962 1 Table 1. The KNMI HYDRA project data used in this paper.
${T_0}$ is the start time of data records and${N_{\rm{m}}}$ is the number of missing data. All the data end at 31/12/2006. -
A fraction of wind speed time series measured at Schiphol is plotted in Fig. 4a. The time series is then randomly shuffled (see Fig. 4b). The shuffled time series has no correlations but has the same distribution as the raw time series (Santhanam and Kantz, 2005; Liu et al., 2014). Comparing Fig. 4a and Fig. 4b, one can find that extremes in the long-term correlated wind speed time series appear in clusters, as stated by Bunde et al. (2005). If we set the same threshold in Fig. 4a and Fig. 4b (shown by dashed lines), it can be seen that the mean return period of POT extremes (i.e. peaks or values over a threshold) in raw data is larger than that of the shuffled data. It means that the extreme wind speeds will be smaller than those in the shuffled data with the same return period. That is to say, the classical method dealing with series without correlations would overestimate extreme wind speeds with long-term correlations. In this section, we propose a very simple method to improve extreme estimations in the long-term correlated wind speed time series.
-
The classical method dealing with series without correlations is briefly introduced here. More details can be referred to in the book by Coles (2001). The limiting conditional probability of POT extremes as the threshold
$v$ increases is described by the generalized Pareto distribution (GPD),where
${\rm{Pr}}\{ p|q\} $ denotes the probability of p with a condition q and the parameters$y \geqslant 0$ ,$\sigma > 0$ and$\xi \in \left( { - \infty ,\infty } \right)$ . According to Eqs. (5) and (11), one can obtain the T-year return level; that is, the value expected to be exceeded once on average every T years:where
${P_v} \equiv {\rm{Pr}}\{ V > v\} $ and l is a length of one year measured by the sampling time ∆t. For example, if$\Delta t = 1\;{\rm{h}}$ ,$l = 365 \,\times \, 24 = 8760$ . -
As discussed in section 1, the limiting distribution of POT extremes as the threshold increases is the same whether or not the series is long-term correlated. Parameters in the GPD can be estimated by the maximum likelihood method (Coles, 2001). We compare the empirical conditional probabilities of extreme wind speeds with the GPD and find that the former is well described by the latter except for very large values (see Fig. 5). Deviations at large values would be caused by unreliable statistics of limited data. In practice, the threshold v is selected by a balance between bias and variance. If v is too low, the conditional probability cannot be well approximated by the GPD. If v is too large, the variance of parameters is large due to limited data. As far as we know, there is not a well-established method for the threshold selection (Scarrott and MacDonald, 2012). The commonly used upper 10% rule is just used here (DuMouchel, 1983). According to this rule, the threshold is defined to be the 90th percentile of samples. Table 2 lists the thresholds, the maximum likelihood estimations of GPD parameters and the corresponding confidence intervals.
Station $v\;\left( {{\rm{m}}\;{{\rm{s}}^{ - 1}}} \right)$ $\xi$ ${\rm{CI}}\left( \xi \right)$ $\sigma $ ${\rm{CI}}\left( \sigma \right)$ Schiphol 9.5 −0.0893 (−0.0967, −0.0819) 2.4566 (2.4282, 2.4854) De Bilt 7.2 −0.0799 (−0.0876, −0.0722) 1.8141 (1.7913, 1.8372) Soesterberg 7.6 −0.0406 (−0.0491, −0.0320) 1.7858 (1.7630, 1.8089) Leeuwarden 9.1 −0.0921 (−0.0994, −0.0848) 2.2890 (2.2609, 2.3175) Eelde 8.3 −0.0831 (−0.0914, −0.0748) 2.0926 (2.0658, 2.1198) Vlissingen 9.5 −0.1142 (−0.1219, −0.1066) 2.3182 (2.2893, 2.3474) Zestienhoven 9.3 −0.1195 (−0.1249, −0.1141) 2.3018 (2.2759, 2.3281) Eindhoven 8.0 −0.0876 (−0.0957, −0.0796) 2.1138 (2.0869, 2.1410) Beek 8.1 −0.1065 (−0.1143, −0.0987) 2.0573 (2.0314, 2.0836) Table 2. Thresholds
$v$ and maximum likelihood estimations of the GPD parameters$\xi $ and$\sigma $ . CI denotes the 95% confidence interval for the parameter estimated.According to Eqs. (10) and (11), the T-year return level of long-term correlated wind speeds is calculated by
Comparing Eqs. (12) and (13), we have
The above equation states that the T-year return level of long-term correlated wind speeds can be simply obtained by just scaling the value of T in the classical method dealing with series without correlations. It means that the classical method, already implemented in many commercial software or open source programs, does not need to be discarded in cases with long-term correlations.
The procedure of the extreme wind speed estimations is illustrated in Fig. 6. For wind speed time series, α ≈ 0.7 (see Fig. 1). Thus,
$\kappa \approx 2\left( {1 - \alpha } \right) \approx 0.6$ and$ C_\kappa $ ≈ 3.1. The value of$C_\kappa $ is greater than 1, which means that the classical method gives a larger T-year return level than our method. This conclusion is consistent with the statement at the beginning of this section that the classical method would overestimate extremes in long-term correlated series.Figure 6. The maximum likelihood estimator of T-year return level
${\hat z_T}$ as a function of the mean return period T. Lines show the estimated return levels and dashed-dotted lines show the 95% confidence intervals. For an illustration of our method, the 50-year return level${\hat z_{50}}$ without correlations and the corresponding 50-year return level$\hat z_{50}^*$ with long-term correlations are marked by circles in the plot.
Station | T0 (day/month/year) | Nm |
Schiphol | 1/3/1950 | 0 |
De Bilt | 1/1/1961 | 1 |
Soesterberg | 1/3/1958 | 4 |
Leeuwarden | 1/4/1961 | 0 |
Eelde | 1/1/1961 | 1 |
Vlissingen | 2/1/1961 | 0 |
Zestienhoven | 1/10/1961 | 4 |
Eindhoven | 1/1/1960 | 1 |
Beek | 1/1/1962 | 1 |