Bohn T. J., M. Y. Sonessa, and D. P. Lettenmaier, 2010: Seasonal hydrologic forecasting: Do multimodel ensemble averages always yield improvements in forecast skill? J. Hydrometeorol., 11( 4), 1358- 1372.10.1175/2010JHM1267.1a7bdae61715e1cb21fd1bd6ee1a65edfhttp%3A%2F%2Fwww.cabdirect.org%2Fabstracts%2F20113069370.htmlhttp://www.cabdirect.org/abstracts/20113069370.htmlMultimodel techniques have proven useful in improving forecast skill in many applications, including hydrology. Seasonal hydrologic forecasting in large basins represents a special case of hydrologic modeling, in which postprocessing techniques such as temporal aggregation and time-varying bias correction are often employed to improve forecast skill. To investigate the effects that these techniques have on the performance of multimodel averaging, the performance of three hydrological models [[Variable Infiltration Capacity, Sacramento/Snow-17, and the Noah land surface model]] and two multimodel averages [[simple model average (SMA) and multiple linear regression (MLR) with monthly varying model weights]] are examined in three snowmelt-dominated basins in the western United States. These evaluations were performed for both simulating and forecasting [[using the Ensemble Streamflow Prediction (ESP) method]] monthly discharge, with and without monthly bias corrections. The single best bias-corrected model outperformed the multimodel averages of raw models in both retrospective simulations and ensemble mean forecasts in terms of RMSE. Forming an MLR multimodel average from bias-corrected models added only slight improvements over the best bias-corrected model. Differences in performance among all bias-corrected models and multimodel averages were small. For ESP forecasts, both bias correction and multimodel averaging generally reduced the RMSE of the ESP ensemble means at lead times of up to 6 months in months when flow is dominated by snowmelt, with the reduction increasing as lead time decreased. The primary reason for this is that aggregating simulated streamflows from daily to monthly time scales increases model cross correlation, which in turn reduces the effectiveness of multimodel averaging in reducing those components of model error that bias correction cannot address. This effect may be stronger in snowmelt-dominated basins because the interannual variability of winter precipitation is a common input to all models. It was also found that both bias correcting and multimodel averaging using monthly varying parameters yielded much greater error reductions than methods using time-invariant parameters.
Bougeault, P., Coauthors, 2010: The THORPEX interactive grand global ensemble. Bull. Amer. Meteor. Soc., 91, 1059- 1072.10.1175/2010BAMS2853.10a6b8138bcf57d302cceee1df50a2a3ehttp%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F2010BAMS...91.1059Bhttp://adsabs.harvard.edu/abs/2010BAMS...91.1059BAbstract Ensemble forecasting is increasingly accepted as a powerful tool to improve early warnings for high-impact weather. Recently, ensembles combining forecasts from different systems have attracted a considerable level of interest. The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) project, a prominent contribution to THORPEX, has been initiated to enable advanced research and demonstration of the multimodel ensemble concept and to pave the way toward operational implementation of such a system at the international level. The objectives of TIGGE are 1) to facilitate closer cooperation between the academic and operational meteorological communities by expanding the availability of operational products for research, and 2) to facilitate exploring the concept and benefits of multimodel probabilistic weather forecasts, with a particular focus on high-impact weather prediction. Ten operational weather forecasting centers producing daily global ensemble forecasts to 1-2 weeks ahead have agreed to deliver in neareal time a selection of forecast data to the TIGGE data archives at the China Meteorological Agency, the European Centre for Medium-Range Weather Forecasts, and the National Center for Atmospheric Research. The volume of data accumulated daily is 245 GB (1.6 million global fields). This is offered to the scientific community as a new resource for research and education. The TIGGE data policy is to make each forecast accessible via the Internet 48 h after it was initially issued by each originating center. Quicker access can also be granted for field experiments or projects of particular interest to the World Weather Research Programme and THORPEX. A few examples of initial results based on TIGGE data are discussed in this paper, and the case is made for additional research in several directions.
Buizza R., T. N. Palmer, 1998: Impact of ensemble size on ensemble prediction. Mon. Wea. Rev., 126, 2503- 2518.10.1175/1520-0493(1998)126<2503:IOESOE>2.0.CO;29496501b55a5609142fd4e7e92b1663bhttp%3A%2F%2Fheapol.oxfordjournals.org%2Fcontent%2F13%2F3%2F323.abstracthttp://heapol.oxfordjournals.org/content/13/3/323.abstractNot Available
Chen Q. Y., M. M. Yao, and Y. Wang, 2004: A new generation of operational medium-range weather forecast model T213L31 in National Meteorological Center. Meteorological Monthly, 30( 8), 16- 21. (in Chinese)10.1117/12.528072b3731a70701ebd50db4db4a6a56702d7http%3A%2F%2Fen.cnki.com.cn%2FArticle_en%2FCJFDTOTAL-QXXX200410003.htmhttp://en.cnki.com.cn/Article_en/CJFDTOTAL-QXXX200410003.htmThe medium-range numerical weather forecast system T213 became an operational one on 1 September 2002 in the National Meteorological Center. As the core of the new system, the global model T213L31 uses some new numerical techniques and time integration scheme, which include the introduction of the semi-Lagrangian treatment of advection, the use of a reduced Gaussian grid, improvements to the model's basic architecture, the application of distributed memory and shared memory parallelization, realizing the running of high resolution model on the computers now available in National Meteorological Center. It is even more important that, T213L31 uses some new physical parametrization schemes with more realistic physical concept, for example, the schemes for radiation, subgrid-scale orographic drag, convection, clouds and land surface parametrization, therefore overcomes a lot of problems that T106L19 suffers and enhances the forecast skill obviously.
Clark, A. J., Coauthors, 2011: Probabilistic precipitation forecast skill as a function of ensemble size and spatial scale in a convection-allowing ensemble. Mon. Wea. Rev., 139, 1410- 1418.10.1175/2010MWR3624.1093d3cf8049a4ce808b0dd110f1b486dhttp%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F2011MWRv..139.1410Chttp://adsabs.harvard.edu/abs/2011MWRv..139.1410CAbstractProbabilistic quantitative precipitation forecasts (PQPFs) from the storm-scale ensemble forecast system run by the Center for Analysis and Prediction of Storms during the spring of 2009 are evaluated using area under the relative operating characteristic curve (ROC area). ROC area, which measures discriminating ability, is examined for ensemble size n from 1 to 17 members and for spatial scales ranging from 4 to 200 km.Expectedly, incremental gains in skill decrease with increasing n. Significance tests comparing ROC areas for each n to those of the full 17-member ensemble revealed that more members are required to reach statistically indistinguishable PQPF skill relative to the full ensemble as forecast lead time increases and spatial scale decreases. These results appear to reflect the broadening of the forecast probability distribution function (PDF) of future atmospheric states associated with decreasing spatial scale and increasing forecast lead time. They also illustrate that efficient allo...
Deque M., 1997: Ensemble size for numerical seasonal forecasts. Tellus A, 49, 74- 86.10.1034/j.1600-0870.1997.00005.xdd61c92fdec81380102cd1edd7b522fahttp%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1034%2Fj.1600-0870.1997.00005.x%2Fcitedbyhttp://onlinelibrary.wiley.com/doi/10.1034/j.1600-0870.1997.00005.x/citedbyABSTRACT The predictability of 500 hPa height, 850 hPa temperature and precipitation is studied using the “perfect model” approach with ensemble numerical forecasts. The sea surface temperature interannual variability is introduced to provide some source of seasonal predictability. The mean scores of 16 winter forecasts show a large potential in the tropics, a weak one in the midlatitudes, in particular over Europe. The weakness of the scores in the midlatitudes may be partly explained by the underestimation of the amplitude of the seasonal anomalies by the model. The seasonal forecasts are based on 9 individual model integrations starting at slightly different initial conditions. The variation of the scores with the size of the ensemble is estimated empirically. It is shown that, for a perfect seasonal forecast, the size necessary to approach the saturation score is about 3 for the tropical precipitation, 20 for midlatitude height, and 40 for temperature over Europe.
Du J., S. L. Mullen, and F. Sanders, 1997: Short-range ensemble forecasting of quantitative precipitation. Mon. Wea. Rev., 125, 2427- 2459.10.1175/1520-0493(1997)1252.0.CO;29995d13df2e0b9394aa74d0ec7176fa1http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F1997MWRv..125.2427Dhttp://adsabs.harvard.edu/abs/1997MWRv..125.2427DThe impact of initial condition uncertainty (ICU) on quantitative precipitation forecasts (QPFs) is examined for a case of explosive cyclogenesis that occurred over the contiguous United States and produced widespread, substantial rainfall. The Pennsylvania State University--National Center for Atmospheric Research (NCAR) Mesoscale Model Version 4 (MM4), a limited-area model, is run at 80-km horizontal resolution and 15 layers to produce a 25-member, 36-h forecast ensemble. Lateral boundary conditions for MM4 are provided by ensemble forecasts from a global spectral model, the NCAR Community Climate Model Version 1 (CCM1). The initial perturbations of the ensemble members possess a magnitude and spatial decomposition that closely match estimates of global analysis error, but they are not dynamically conditioned. Results for the 80-km ensemble forecast are compared to forecasts from the then operational Nested Grid Model (NGM), a single 40-km/15- layer MM4 forecast, a single 80-km/29-layer MM4 forecast, and a second 25-member MM4 ensemble based on a different cumulus parameterization and slightly different unperturbed initial conditions. Large sensitivity to ICU marks ensemble QPF. Extrema in 6-h accumulations at individual grid points vary by as much as 3.000. Ensemble averaging reduces the root-mean-square error (rmse) for QPF. Nearly 90% of the improvement is obtainable using ensemble sizes as small as 8--10. Ensemble averaging can adversely affect the bias and equitable threat scores, however, because of its smoothing nature. Probabilistic forecasts for five mutually exclusive, completely exhaustive categories are found to be skillful relative to a climatological forecast. Ensemble sizes of approximately 10 can account for 90% of improvement in categorical forecasts relative to that for the average of individual forecasts. The improvements due to short-range ensemble forecasting (SREF) techniques exceed any due to doubling the resolution...
Epstein E. S., 1969: Stochastic dynamic prediction. Tellus, 21, 739- 759.10.1111/j.2153-3490.1969.tb00483.x16756145-c1da-4a52-82d5-fe4a66dd8048ba157471f442e2a9550ded8e2d6be2abhttp%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1111%2Fj.2153-3490.1969.tb00483.x%2Fcitedbyrefpaperuri:(3d8ed93bde3e1d2d9af26426cda0bef0)http://onlinelibrary.wiley.com/doi/10.1111/j.2153-3490.1969.tb00483.x/citedbyCiteSeerX - Scientific documents that cite the following paper: Stochastic dynamic prediction
Fritsch J. M., J. Hilliker, J. Ross, and R. L. Vislocky, 2000: Model consensus. Wea.Forecasting, 15, 571- 582.10.1175/1520-0434(2000)015<0571:MC>2.0.CO;20b04c0d0-3ac1-4082-8efe-17db5ec790b22bd6372086e4d229a1fb56af87412925http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F2000WtFor..15..571Frefpaperuri:(1e0b0708c3872103b51d4b7c3679ab9d)http://adsabs.harvard.edu/abs/2000WtFor..15..571FModel consensus FRITSCH J. M. Wea. Forecasting 15, 571-582, 2000
Hagedorn R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting——I. Basic concept. Tellus A, 57( 3), 219- 233.10.1111/j.1600-0870.2005.00103.x624f89c446d3172b89fc36e8b56bfd44http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1111%2Fj.1600-0870.2005.00103.x%2Fpdfhttp://onlinelibrary.wiley.com/doi/10.1111/j.1600-0870.2005.00103.x/pdfABSTRACT The DEMETER multi-model ensemble system is used to investigate the rationale behind the multi-model concept. A comprehensive documentation of the differences in the single and multi-model performance in the DEMETER hindcast data set is given. Both deterministic and probabilistic diagnostics are used and a variety of analyses demonstrate the improvements achieved by using multi-model instead of single-model ensembles. In order to understand the reason behind the multi-model superiority, basic scenarios describing how the multi-model approach can improve over single-model skill are discussed. It is demonstrated that multi-model superiority is caused not only by error compensation but in particular by its greater consistency and reliability.
Hagedorn R., R. Buizza, T. M. Hamill, M. Leutbecher, and T. N. Palmer, 2012: Comparing TIGGE multimodel forecasts with reforecast-calibrated ECMWF ensemble forecasts. Quart. J. Roy. Meteor. Soc., 138, 1814- 1827.10.1002/qj.18955e481fdbe644d3c01e846ca8cd0a23b3http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1002%2Fqj.1895%2Fpdfhttp://onlinelibrary.wiley.com/doi/10.1002/qj.1895/pdfForecasts provided by the THORPEX Interactive Grand Global Ensemble (TIGGE) project were compared with reforecast‐calibrated ensemble predictions from the European Centre for Medium‐Range Weather Forecasts (ECMWF) in extratropical regions. Considering the statistical performance of global probabilistic forecasts of 850 hPa and 2 m temperatures, a multimodel ensemble containing nine ensemble prediction systems (EPS) from the TIGGE archive did not improve on the performance of the best single‐model, the ECMWF EPS. However, a reduced multimodel system, consisting of only the four best ensemble systems, provided by Canada, the USA, the United Kingdom and ECMWF, showed an improved performance. The multimodel ensemble provides a benchmark for the single‐model systems contributing to the multimodel. However, reforecast‐calibrated ECMWF EPS forecasts were of comparable or superior quality to the multimodel predictions, when verified against two different reanalyses or observations. This improved performance was achieved by using the ECMWF reforecast dataset to correct for systematic errors and spread deficiencies. The ECMWF EPS was the main contributor for the improved performance of the multimodel ensemble; that is, if the multimodel system did not include the ECMWF contribution, it was not able to improve on the performance of the ECMWF EPS alone. These results were shown to be only marginally sensitive to the choice of verification dataset. Copyright 08 2012 Royal Meteorological Society
Hamill T. M., R. Hagedorn, and J. S. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 2620- 2632.10.1175/2007MWR2411.1017772092db591d8758a7535aac330f6http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F2008MWRv..136.2620Hhttp://adsabs.harvard.edu/abs/2008MWRv..136.2620HAs a companion to Part I, which discussed the calibration of probabilistic 2-m temperature forecasts using large training datasets, Part II discusses the calibration of probabilistic forecasts of 12-hourly precipitation amounts. Again, large ensemble reforecast datasets from the European Centre for Medium-Range Weather Forecasts (ECMWF) and the Global Forecast System (GFS) were used for testing and calibration. North American Regional Reanalysis (NARR) 12-hourly precipitation analysis data were used for verification and training. Logistic regression was used to perform the calibration, with power-transformed ensemble means and spreads as predictors. Forecasts were produced and validated for every NARR grid point in the conterminous United States (CONUS). Training sample sizes were increased by including data from 10 nearby grid points with similar analyzed climatologies. "Raw" probabilistic forecasts from each system were considered, in which probabilities were set according to ensemble relative frequency. Calibrated forecasts were also considered based on three amounts of training data: the last 30 days of forecasts (available for 2005 only), weekly reforecasts during 1982-2001, and daily reforecasts during 1979-2003 (GFS only). Several main results were found. (i) Raw probabilistic forecasts from the ensemble prediction systems' relative frequency possessed little or negative skill when skill was computed with a version of the Brier skill score (BSS) that does not award skill solely on the basis of differences in climatological probabilities among samples. ECMWF raw forecasts had larger skills than GFS raw forecasts. (ii) After calibration with weekly reforecasts, ECMWF forecasts were much improved in reliability and were moderately skillful. Similarly, GFS-calibrated forecasts were much more reliable, albeit somewhat less skillful. Nonetheless, GFS-calibrated forecasts were much more skillful than ECMWF raw forecasts. (iii) The last 30 days of training data produced calibrated forecasts of light-precipitation events that were nearly as skillful as those with weekly reforecast data. However, for higher precipitation thresholds, calibrated forecasts using the weekly reforecast datasets were much more skillful, indicating the importance of large sample size for the calibration of unusual and rare events. (iv) Training with daily GFS reforecast data provided calibrated forecasts with a skill only slightly improved relative to that from the weekly data.
Hashino T., A. A. Bradley, and S. S. Schwartz, 2007: Evaluation of bias-correction methods for ensemble streamflow volume forecasts. Hydrology and Earth System Sciences, 11( 2), 939- 950.10.5194/hess-11-939-200768ad2453e2aa9b2ecc83d395c82c2e74http%3A%2F%2Feuropepmc.org%2Fabstract%2FAGR%2FIND43988437http://europepmc.org/abstract/AGR/IND43988437Ensemble prediction systems are used operationally to make probabilistic streamflow forecasts for seasonal time scales. However, hydrological models used for ensemble streamflow prediction often have simulation biases that degrade forecast quality and limit the operational usefulness of the forecasts. This study evaluates three bias-correction methods for ensemble streamflow volume forecasts. All three adjust the ensemble traces using a transformation derived with simulated and observed flows from a historical simulation. The quality of probabilistic forecasts issued when using the three bias-correction methods is evaluated using a distributions-oriented verification approach. Comparisons are made of retrospective forecasts of monthly flow volumes for the Des Moines River, issued sequentially for each month over a 48-year record. The results show that all three bias-correction methods significantly improve forecast quality by eliminating unconditional biases and enhancing the potential skill. Still, subtle differences in the attributes of the bias-corrected forecasts have important implications for their use in operational decision-making. Diagnostic verification distinguishes these attributes in a context meaningful for decision-making, providing criteria to choose among bias-correction methods with comparable skill.
Houtekamer P. L., J. Derome, 1995: Methods for ensemble prediction. Mon. Wea. Rev., 123, 2181- 2196.10.1175/1520-0493(1995)1232.0.CO;28d3d492f-fe63-4c31-a4eb-7a8ec7a5abcad3cb0b6f8bfb538ce2cb82d66674dce0http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F1995MWRv..123.2181Hrefpaperuri:(562a29ee564ff44a61ca9c5bd4182945)http://adsabs.harvard.edu/abs/1995MWRv..123.2181HNot Available
Jeong D., Y. O. Kim, 2009: Combining single-value streamflow forecasts——A review and guidelines for selecting techniques. J. Hydrol., 377( 3-4), 284- 299.10.1016/j.jhydrol.2009.08.028c64892c31ff99aa4983fd07e882ac7e4http%3A%2F%2Fwww.sciencedirect.com%2Fscience%2Farticle%2Fpii%2FS0022169409005319http://www.sciencedirect.com/science/article/pii/S0022169409005319Choosing an appropriate method for combining single-value forecasts should depend on characteristics of the individual forecasts being combined and their relationships with each other. This study attempts to develop a guideline to choose effective combining techniques by using analytical derivations and/or hydrological experiments. The two most popular combining techniques, Simple Average (SA) ...
Krishnamurti T. N., C. M. Kishtawal, T. E. LaRow, D. R. Bachiochi, Z. Zhang, C. E. Williford, S. Gadgil, and S. Surendran, 1999: Improved weather and seasonal climate forecasts from multimodel superensemble. Science, 285( 5433), 1548- 1550.10.1126/science.285.5433.1548104775153d609d968a30935c40a287e820af3899http%3A%2F%2Fwww.jstor.org%2Fstable%2F2898111http://www.jstor.org/stable/2898111A method for improving weather and climate forecast skill has been developed. It is called a superensemble, and it arose from a study of the statistical properties of a low-order spectral model. Multiple regression was used to determine coefficients from multimodel forecasts and observations. The coefficients were then used in the superensemble technique. The superensemble was shown to outperform all model forecasts for multiseasonal, medium-range weather and hurricane forecasts. In addition, the superensemble was shown to have higher skill than forecasts based solely on ensemble averaging.
Krishnamurti T. N., C. M. Kishtawal, Z. Zhang, T. LaRow, D. Bachiochi, E. Williford, S. Gadgil, and S. Surendran, 2000: Multimodel ensemble forecasts for weather and seasonal climate. J.Climate, 13( 23), 4196- 4216.10.1175/1520-0442(2000)013<4196:MEFFWA>2.0.CO;200173e7721ec1e4081fae3f60077793fhttp%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F2000JCli...13.4196Khttp://adsabs.harvard.edu/abs/2000JCli...13.4196KIn this paper the performance of a multimodel ensemble forecast analysis that shows superior forecast skills is illustrated and compared to all individual models used. The model comparisons include global weather, hurricane track and intensity forecasts, and seasonal climate simulations. The performance improvements are completely attributed to the collective information of all models used in the statistical algorithm.The proposed concept is first illustrated for a low-order spectral model from which the multimodels and a `nature run' were constructed. Two hundred time units are divided into a training period (70 time units) and a forecast period (130 time units). The multimodel forecasts and the observed fields (the nature run) during the training period are subjected to a simple linear multiple regression to derive the statistical weights for the member models. The multimodel forecasts, generated for the next 130 forecast units, outperform all the individual models. This procedure was deployed for the multimodel forecasts of global weather, multiseasonal climate simulations, and hurricane track and intensity forecasts. For each type an improvement of the multimodel analysis is demonstrated and compared to the performance of the individual models. Seasonal and multiseasonal simulations demonstrate a major success of this approach for the atmospheric general circulation models where the sea surface temperatures and the sea ice are prescribed. In many instances, a major improvement in skill over the best models is noted.
Leith C. E., 1974: Theoretical skill of Monte Carlo forecasts. Mon. Wea. Rev., 102, 409- 418.10.1175/1520-0493(1974)1022.0.CO;286f6231bdfd8ee5b602d3893586fda21http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F1974MWRv..102..409Lhttp://adsabs.harvard.edu/abs/1974MWRv..102..409LNot Available
Ma J. H., Y. J. Zhu, R. Wobus, and P. X. Wang, 2012: An effective configuration of ensemble size and horizontal resolution for the NCEP GEFS. Adv. Atmos. Sci., 29, 782-794, doi: 10.1007/s00376-012-1249-y.10.1007/s00376-012-1249-y90e6e8ce4c094b5ec4e37be3d4d16bcbhttp%3A%2F%2Fd.wanfangdata.com.cn%2FPeriodical_dqkxjz-e201204012.aspxhttp://d.wanfangdata.com.cn/Periodical_dqkxjz-e201204012.aspxTwo important questions are addressed in this paper using the Global Ensemble Forecast System(GEFS) from the National Centers for Environmental Prediction(NCEP):(1) How many ensemble members are needed to better represent forecast uncertainties with limited computational resources?(2) What is the relative impact on forecast skill of increasing model resolution and ensemble size? Two-month experiments at T126L28 resolution were used to test the impact of varying the ensemble size from 5 to 80 members at the 500hPa geopotential height.Results indicate that increasing the ensemble size leads to significant improvements in the performance for all forecast ranges when measured by probabilistic metrics,but these improvements are not significant beyond 20 members for long forecast ranges when measured by deterministic metrics.An ensemble of 20 to 30 members is the most effective configuration of ensemble sizes by quantifying the tradeoff between ensemble performance and the cost of computational resources.Two representative configurations of the GEFS-the T126L28 model with 70 members and the T190L28 model with 20 members,which have equivalent computing costs-were compared.Results confirm that,for the NCEP GEFS,increasing the model resolution is more(less) beneficial than increasing the ensemble size for a short(long) forecast range.
Najafi M. R., H. Moradkhani, 2016: Ensemble combination of seasonal streamflow forecasts. Journal of Hydrologic Engineering, 21( 2), 04015043.10.1061/(ASCE)HE.1943-5584.0001250448e9f2132691e5b8676954973e950fbhttp%3A%2F%2Fwww.researchgate.net%2Fpublication%2F279205724_Towards_Ensemble_Combination_of_Seasonal_Streamflow_Forecastshttp://www.researchgate.net/publication/279205724_Towards_Ensemble_Combination_of_Seasonal_Streamflow_ForecastsABSTRACT Various hydrologic models with different complexities have been developed to represent the characteristics of river basins, improve streamflow forecasts such as seasonal volumetric flow predictions, and meet other demands from different stakeholders. Because no single hydrologic model is able to perfectly simulate the observed flow, multimodel combination techniques are developed to combine forecasts obtained from different models and to quantify the uncertainties with the goal of improving upon single-model performance. In this study, a comprehensive set of multimodel ensemble averaging techniques with varying complexities are investigated for operational forecasting over four river basins in the Western United States. Ensemble merging models are divided into three categories of simple, intermediate, and complex, and comparison is made between each class by using a bootstrap approach. Analysis suggests that model combination effectively improves most of the individual seasonal forecasts and can outperform the best forecast model. Simple average, median, Bates-Granger, constrained linear regression, and Bayesian model averaging optimized by expectation maximization showed better results compared with other methods over three basins. For the Rogue River basin, the intermediate and complex models outperformed most of the individual forecasts and the simple methods. Multimodeling techniques based on information criteria showed similar performances.
Raftery A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133( 3), 1155- 1174.9f24c031776f68071d0f8a656a48c7f9http%3A%2F%2Ficesjms.oxfordjournals.org%2Fexternal-ref%3Faccess_num%3D10.1175%2FMWR2906.1%26link_type%3DDOIhttp://icesjms.oxfordjournals.org/external-ref?access_num=10.1175/MWR2906.1&amp;link_type=DOI
Reifen C., R. Toumi, 2009: Climate projections: Past performance no guarantee of future skill? Geophys. Res. Lett., 36,L13704, doi: 10.1029/2009GL038082.10.1029/2009GL038082db33176e1af55dd397773a0d5a855effhttp%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1029%2F2009GL038082%2Ffullhttp://onlinelibrary.wiley.com/doi/10.1029/2009GL038082/fullThe principle of selecting climate models based on their agreement with observations has been tested for surface temperature using 17 of the IPCC AR4 models. Those models simulating global mean, Siberian and European 20th Century surface temperature with a lower error than the total ensemble for one period on average do not do so for a subsequent period. Error in the ensemble mean decreases sys...
Richardson D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc., 127, 2473- 2489.10.1002/qj.497127577157d2cc277d0c6953b9f3b6b06e9a2cb01http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1002%2Fqj.49712757715%2Ffullhttp://onlinelibrary.wiley.com/doi/10.1002/qj.49712757715/fullEnsemble forecasts provide probabilistic predictions for the future state of the atmosphere. Usually the probability of a given event E is determined from the fraction of ensemble members which predict the event. Hence there is a degree of sampling error inherent in the predictions. In this paper a theoretical study is made of the effect of ensemble size on forecast performance as measured by a reliability diagram and Brier (skill) score, and on users by using a simple cost-loss decision model. The relationship between skill and value, and a generalized skill score, dependent on the distribution of users, are discussed. The Brier skill score is reduced from its potential level for all finite-sized ensembles. The impact is most significant for small ensembles, especially when the variance of forecast probabilities is also small. The Brier score for a set of deterministic forecasts is a measure of potential predictability, assuming the forecasts are representative selections from a reliable ensemble prediction system (EPS). There is a consistent effect of finite ensemble size on the reliability diagram. Even if the underlying distribution is perfectly reliable, sampling this using only a small number of ensemble members introduces considerable unreliability. There is a consistent over-forecasting which appears as a clockwise tilt of the reliability diagram. It is important to be aware of the expected effect of ensemble size to avoid misinterpreting results. An ensemble of ten or so members should not be expected to provide reliable probability forecasts. Equally, when comparing the performance of different ensemble systems, any difference in ensemble size should be considered before attributing performance differences to other differences between the systems. The usefulness of an EPS to individual users cannot be deduced from the Brier skill score (nor even directly from the reliability diagram). An EPS with minimal Brier skill may nevertheless be of substantial value to some users, while small differences in skill may hide substantial variation in value. Using a simple cost-loss decision model, the sensitivity of users to differences in ensemble size is shown to depend on the predictability and frequency of the event and on the cost-loss ratio of the user. For an extreme event with low predictability, users with low cost-loss ratio will gain significant benefits from increasing ensemble size from 50 to 100 members, with potential for substantial additional value from further increases in number of members. This sensitivity to large ensemble size is not evident in the Brier skill score. A generalized skill score, dependent on the distribution of users, allows a summary performance measure to be tuned to a particular aspect of EPS performance.
Sanders F., 1963: On subjective probability forecasting. J. Appl. Meteor., 2, 191- 201.10.1175/1520-0450(1963)0022.0.CO;2f8111ded-a254-46c4-b40c-1e5494dc75ab0ad6b07b49dda463f10ed995ab17f923http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F1963JApMe...2..191Srefpaperuri:(d236e86874e84325447c3db6ac19516f)http://adsabs.harvard.edu/abs/1963JApMe...2..191SAbstract The subjective process of probability forecasting is analyzed. It is found to contain a sorling aspect, in which the forecaster distributes all instances into an ordered set of categories of likelihood of occurrence, and a laboling aspect, in which the forecaster assigns an anticipated relative frequency, or probability, of occurrence for each category. These two aspects are identified with the concepts of sharpness and validity, which have been introduced by other writers. The verification score proposed by Brier is shown to consist of the sum of measures of these two qualities. A satisfactory measure of synoptic skill is obtained by applying the Brier score to the synoptic probability forecast and to a control forecast of the climatological probability, and by expressing the difference as a percentage of the control score. In an analysis of a large number of short-range probability forecasts made by instructors and students in the synoptic laboratory of the Massachusetts Institute of Technology...
Su X., H. L. Yuan, Y. J. Zhu, Y. Luo, and Y. Wang, 2014: Evaluation of TIGGE ensemble predictions of Northern Hemisphere summer precipitation during 2008-2012. J. Geophys. Res. Atmos., 119, 7292- 7310.10.1002/2014JD021733840bd534c42f9554ef788f5baad97a70http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1002%2F2014JD021733%2Ffullhttp://onlinelibrary.wiley.com/doi/10.1002/2014JD021733/fullThe ensemble mean quantitative precipitation forecasts (QPFs) and probabilistic QPFs (PQPFs) from six operational global ensemble prediction systems (EPSs) in The Observing System Research and Predictability Experiment Interactive Grand Global Ensemble (TIGGE) data set are evaluated against the Tropical Rainfall Measuring Mission observations using a series of area‐weighted verification metrics during June to August 2008–2012 in the Northern Hemisphere (NH) midlatitude and tropics. Results indicate that generally the European Centre for Medium‐Range Weather Forecasts performs best while the Canadian Meteorological Centre (CMC) is relatively good for short‐range QPFs and PQPFs at light precipitation thresholds. The overall forecast skill is better in the NH midlatitude than in the NH tropics. QPFs and PQPFs from China Meteorological Administration (CMA) have very little discrimination ability of different observed rain events in the NH tropics. The day +1 QPFs from Japan Meteorological Agency have remarkably large moist biases in the NH tropics, which leads to the discontinuity of forecast performance with the lead times. Performance changes due to the major EPS upgrades during the five summers are also examined using the forecasts from CMA as the reference to eliminate the interannual variation. After the EPS upgrade, CMC improves the PQPF skill at light precipitation threshold while its excessively enlarged ensemble spread increases the overall QPF and PQPF errors.
Vislocky R. L., J. M. Fritsch, 1995: Improved model output statistics forecasts through model consensus. Bull. Amer. Meteor. Soc., 76( 5), 1157- 1164.10.1175/1520-0477(1995)0762.0.CO;29dcda0b5780ca697fee45e7ac545f686http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F1995BAMS...76.1157Vhttp://adsabs.harvard.edu/abs/1995BAMS...76.1157VConsensus forecasts are computed by averaging model output statistics (MOS) forecasts based on the limited-area fine-mesh (LFM) model and the nested grid model (NGM) for the three-year period 1990-92. The test consists of four weather elements (max/ min temperature, wind speed, probability of cloud amount, and 12-h probability of precipitation) at four projection times from each initialization (0000 and 1200 UTC) for roughly 250-350 stations. Verification results clearly indicate a substantial improvement forthe consensus MOS over both the LFM and NGM MOS forecasts for all variables and all lead times. The accuracy increase is on par with a 2-8-yr scientific advancement and a 4-12-h lead time improvement. Moreover, performance of the consensus MOS forecasts is similar to subjective forecasts issued by the National Weather Service. These results are illustrative of the broad need to adopt a strategy of statistically combining available forecast products rather than relying upon the single most superior product (such as the newest numerical model). Furthermore, there appears to be strong justification to continue support for the entire LFM MOS product both in terms of its full availability and its equation upgrade.
Vrugt J. A., M. P. Clark, C. G. H. Diks, Q. Y. Duan, and B. A. Robinson, 2006: Multi-objective calibration of forecast ensembles using Bayesian model averaging. Geophys. Res. Lett., 33(17),L19817, doi: 10.1029/2006GL027126.10.1029/2006GL0271263388e120000ad10bd988a1f366dd5af3http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1029%2F2006GL027126%2Fpdfhttp://onlinelibrary.wiley.com/doi/10.1029/2006GL027126/pdfBayesian Model Averaging (BMA) has recently been proposed as a method for statistical postprocessing of forecast ensembles from numerical weather prediction models. The BMA predictive probability density function (PDF) of any weather quantity of interest is a weighted average of PDFs centered on the bias-corrected forecasts from a set of different models. However, current applications of BMA calibrate the forecast specific PDFs by optimizing a single measure of predictive skill. Here we propose a multi-criteria formulation for postprocessing of forecast ensembles. Our multi-criteria framework implements different diagnostic measures to reflect different but complementary metrics of forecast skill, and uses a numerical algorithm to solve for the Pareto set of parameters that have consistently good performance across multiple performance metrics. Two illustrative case studies using 48-hour ensemble data of surface temperature and sea level pressure, and multi-model seasonal forecasts of temperature, show that a multi-criteria formulation provides a more appealing basis for selecting the appropriate BMA model.
Wang Y., H. Qian, J.-J. Song, and M.-Y. Jiao, 2008: Verification of the T213 global spectral model of China National Meteorology Center over the East-Asia area. J. Geophys. Res., 113,D10110, doi: 10.1029/2007JD008750.10.1029/2007JD00875081144827c1a3dbba62164220d48dd863http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1029%2F2007JD008750%2Ffullhttp://onlinelibrary.wiley.com/doi/10.1029/2007JD008750/fullFrom September 2002, the global spectral model T213L31 has been put into operational use at the National Meteorology Center of China. To acquire a comprehensive assessment of T213's performance, four verifications have been implemented, (1) temporal analysis of its forecast accuracy series; (2) spatial analysis and lag correlation analysis of the forecast accuracy; (3) precipitation verification and (4) comparison between the models of T213, T106 (prior version of T213) and ECMWF. The verification illustrates that, after adopting a finer grid and improving many physical schemes, T213 has largely enhanced its forecast accuracy over its prior version. However, its forecast is still poorer than the ECMWF model, and T213 needs to especially improve its 3-5 days' forecast performance. The precipitation verification indicates that T213's forecast for light rain is up-to-standard (0.561 for 24 h forecast), but the forecast accuracy for the larger precipitation drops rapidly. The time series verification shows that the T213's daily forecast exhibits a seasonal trend: the forecast for summer is worse than other seasons and the forecast accuracy decreases to the minimum at July, which suggests the possible impact of the moisture forecast error on the decreases of weather forecast accuracy. This impact is confirmed by the spatial analysis and lag correlation analysis, which show that the specific humidity's forecast has a lagged influence on the accuracies of both the temperature forecast and the geo-potential height forecast, and therefore indicates that the further improvement on specific humidity forecasting and the related moisture parameterization schemes are the crucial points in the future development of the T213 model.
Weigel A. P., M. A. Liniger, and C. Appenzeller, 2008: Can multi-model combination really enhance the prediction skill of probabilistic ensemble forecasts? Quart. J. Roy. Meteor. Soc., 134( 630), 241- 260.10.1002/qj.210c85d983fcffe9b1f69e9c6d1b3642e00http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1002%2Fqj.210%2Ffullhttp://onlinelibrary.wiley.com/doi/10.1002/qj.210/fullNot Available
Weisheimer A., Coauthors, 2009: ENSEMBLES: A new multi-model ensemble for seasonal-to-annual predictions——Skill and progress beyond DEMETER in forecasting tropical Pacific SSTs. Geophys. Res. Lett., 36, L21711, doi: 10.1029/2009GL040896.10.1029/2009GL0408961054de9ff812d9ba96afc3e0129d2887http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1029%2F2009GL040896%2Fpdfhttp://onlinelibrary.wiley.com/doi/10.1029/2009GL040896/pdfA new 46-year hindcast dataset for seasonal-to-annual ensemble predictions has been created using a multi-model ensemble of 5 state-of-the-art coupled atmosphere-ocean circulation models. The multi-model outperforms any of the single-models in forecasting tropical Pacific SSTs because of reduced RMS errors and enhanced ensemble dispersion at all lead-times. Systematic errors are considerably reduced over the previous generation (DEMETER). Probabilistic skill scores show higher skill for the new multi-model ensemble than for DEMETER in the 4-6 month forecast range. However, substantially improved models would be required to achieve strongly statistical significant skill increases. The combination of ENSEMBLES and DEMETER into a grand multi-model ensemble does not improve the forecast skill further. Annual-range hindcasts show anomaly correlation skill of 0.5 up to 14 months ahead. A wide range of output from the multi-model simulations is becoming publicly available and the international community is invited to explore the full scientific potential of these data. Copyright 2009 by the American Geophysical Union.
Winter C. L., D. Nychka, 2010: Forecasting skill of model averages. Stochastic Environmental Research and Risk Assessment, 24( 3), 633- 638.10.1007/s00477-009-0350-ybca12e77239c59de3d916d001a390bb8http%3A%2F%2Flink.springer.com%2F10.1007%2Fs00477-009-0350-yhttp://link.springer.com/10.1007/s00477-009-0350-yGiven a collection of science-based computational models that all estimate states of the same environmental system, we compare the forecast skill of the average of the collection to the skills of the individual members. We illustrate our results through an analysis of regional climate model data and give general criteria for the average to perform more or less skillfully than the most skillful individual model, the “best” model. The average will only be more skillful than the best model if the individual models in the collection produce very different forecasts; if the individual forecasts generally agree, the average will not be as skillful as the best model.
Yoo J. H., I. S. Kang, 2005: Theoretical examination of a multi-model composite for seasonal prediction. Geophys. Res. Lett., 32(16), L18707, doi: 10.1029/2005GL023513.10.1029/2005GL023513317ae8f134f03af925014ce6ce640945http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1029%2F2005GL023513%2Fpdfhttp://onlinelibrary.wiley.com/doi/10.1029/2005GL023513/pdfThe performance of a multi-model composite for seasonal prediction is theoretically examined in terms of a correlation skill. On the basis of theoretical analysis, we discuss the improvement of skill in the multi-model composite using the APCN multi-model seasonal prediction dataset. Although the skill of multi-model composite is generally increased by increasing the number of models, the highest skill can be obtained by selecting several skillful models which are less dependent each other.
Yuan H. L., X. G. Gao, S. L. Mullen, S. Sorooshian, J. Du, and H. M. H. Juang, 2007: Calibration of probabilistic quantitative precipitation forecasts with an artificial neural network. Wea. Forecasting, 22, 1287- 1303.10.1175/2007WAF2006114.1d2155f224d760562e562cae65592b199http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F2007WtFor..22.1287Yhttp://adsabs.harvard.edu/abs/2007WtFor..22.1287YA feed-forward neural network is configured to calibrate the bias of a high-resolution probabilistic quantitative precipitation forecast (PQPF) produced by a 12-km version of the NCEP Regional Spectral Model (RSM) ensemble forecast system. Twice-daily forecasts during the 2002–2003 cool season (1 November–31 March, inclusive) are run over four U.S. Geological Survey (USGS) hydrologic unit regions of the southwest United States. Calibration is performed via a cross-validation procedure, where four months are used for training and the excluded month is used for testing. The PQPFs before and after the calibration over a hydrological unit region are evaluated by comparing the joint probability distribution of forecasts and observations. Verification is performed on the 4-km stage IV grid, which is used as “truth.” The calibration procedure improves the Brier score (BrS), conditional bias (reliability) and forecast skill, such as the Brier skill score (BrSS) and the ranked probability skill score (RPSS), relative to the sample frequency for all geographic regions and most precipitation thresholds. However, the procedure degrades the resolution of the PQPFs by systematically producing more forecasts with low nonzero forecast probabilities that drive the forecast distribution closer to the climatology of the training sample. The problem of degrading the resolution is most severe over the Colorado River basin and the Great Basin for relatively high precipitation thresholds where the sample of observed events is relatively small.