Advanced Search
Article Contents

A Model Output Machine Learning Method for Grid Temperature Forecasts in the Beijing Area

Fund Project:

National Key Research and Development Program of China (Grant Nos. 2018YFF0300104 and 2017YFC0209804) and the National Natural Science Foundation of China (Grant No. 11421101) and Beijing Academy of Artifical Intelligence (BAAI)


doi: 10.1007/s00376-019-9023-z

  • In this paper, the model output machine learning (MOML) method is proposed for simulating weather consultation, which can improve the forecast results of numerical weather prediction (NWP). During weather consultation, the forecasters obtain the final results by combining the observations with the NWP results and giving opinions based on their experience. It is obvious that using a suitable post-processing algorithm for simulating weather consultation is an interesting and important topic. MOML is a post-processing method based on machine learning, which matches NWP forecasts against observations through a regression function. By adopting different feature engineering of datasets and training periods, the observational and model data can be processed into the corresponding training set and test set. The MOML regression function uses an existing machine learning algorithm with the processed dataset to revise the output of NWP models combined with the observations, so as to improve the results of weather forecasts. To test the new approach for grid temperature forecasts, the 2-m surface air temperature in the Beijing area from the ECMWF model is used. MOML with different feature engineering is compared against the ECMWF model and modified model output statistics (MOS) method. MOML shows a better numerical performance than the ECMWF model and MOS, especially for winter. The results of MOML with a linear algorithm, running training period, and dataset using spatial interpolation ideas, are better than others when the forecast time is within a few days. The results of MOML with the Random Forest algorithm, year-round training period, and dataset containing surrounding gridpoint information, are better when the forecast time is longer.
    摘要: 数值天气预报的预报结果可以通过天气会商来进行提高,本文提出了模式输出机器学习(MOML)方法对天气会商过程进行模拟,从而提高数值预报结果。通过天气会商,预报员利用预报经验知识结合数值预报结果和观测数据得到最终的天气预报结果。显然,利用合适的模式后处理算法模拟预报员天气会商的过程是一个有趣和重要的课题。MOML方法是一个基于机器学习的模式后处理方法,它通过一个回归函数将数值预报结果跟观测数据进行配置。对数据集和训练期采用不同的特征工程技术,我们把观测数据和模式数据处理为不同的训练集和测试集,之后再将已有的机器学习回归算法应用到处理后的数据集中,从而提高模式结果。我们把这个方法应用到北京地区2米格点地表气温的ECMWF模式后处理中来进行检验。我们设计了各种特征工程方案,得到了不同的MOML算法模型,并和ECMWF模式结果以及模式输出统计(MOS)方法进行比较。数值结果表明,MOML方法的结果比ECMWF模式结果和MOS方法更好,尤其是冬季更明显。其中最好的MOML特征工程混合方案是短期预报用线性回归、滑动训练器和基于空间插值思想的数据集的组合,中期预报用随机森林、全年训练器和包含周围格点的数据集的组合。
  • 加载中
  • Figure 1.  Diagram of a regression tree generation algorithm, where xj is the optimal splitting features and aj is the optimal splitting point.

    Figure 2.  Flow diagram of the MOML method. The blue cuboids are the original data in the Beijing area, and the green cuboids are the dataset with proper feature engineering. The yellow cuboid represents the process of machine learning, and the orange rectangle represents the output.

    Figure 3.  Diagram of datasets 1−3. Dataset 1 focuses on the fixed spatial point, and dataset 2 adds the surrounding eight grid points. Dataset 3 takes all the 30 spatial points of the Beijing area into account in a unified way.

    Figure 4.  Results of the ${\rm{lr}}\_3\_{\rm{r}}$, ${\rm{rf}}\_2\_{\rm{y}}$ and ${\rm{mos}}\_{\rm{r}}$ models, using one-year temperature grid data in the Beijing area as the test set. Left: TRMSE (RMSE; units: °C). Right: Fa (forecast accuracy; units: %). (a) shows ${\rm{lr}}\_3\_{\rm{r}}$ has obvious advantages when the forecast time is 1−9 days, and (b) shows ${\rm{rf}}\_2\_{\rm{y}}$ is superior to other models in the whole forecast period, especially in the longer period.

    Figure 5.  A feasible solution fMOML to the grid temperature correction in the Beijing area. fMOML uses the ${\rm{lr}}\_3\_{\rm{r}}$ method for days 1−6 of the forecast lead time and the ${\rm{rf}}\_2\_{\rm{y}}$ method for days 7−15, and it has a lot of advantages in the whole forecast period.

    Figure 6.  Results of grid temperature forecasts in the Beijing area in November (a), December (b), January (c) and February (d). In these months, the forecast results of the ECMWF model do not work well, and the linear methods ${\rm{lr}}\_3\_{\rm{r}}$ are better than other methods.

    Figure 7.  Results of grid temperature forecasts in the Beijing area in March (a), June (b), July (c), August (d) and October (e). In these five months, the forecast results of the ECMWF model are better than those in winter months. The linear methods are better than other methods when the forecast lead time is short, and Random Forest algorithm are better when the forecast lead time is relatively long.

    Figure 8.  Results of grid temperature forecasts in the Beijing area in April (a), May (b) and September (c). In these three months, the forecast results of the ECMWF model in these three months are better than those in the other months. The multiple linear regression algorithm is best in the first few days of the forecast period, and the Random Forest algorithm is better than for other methods in the next few days.

    Table 1.  The predictors taken from the ECMWF model and their abbreviations.

    PredictorAbbreviation
    10-m zonal wind component10U
    10-m meridional wind component10V
    2-m dewpoint temperature2D
    2-m temperature2T
    Convective available potential energyCAPE
    Maximum temperature at 2 m in the last 6 hMX2T6
    Mean sea level pressureMSL
    Minimum temperature at 2 m in the last 6 hMN2T6
    Skin temperatureSKT
    Snow depth water equivalentSD
    Snowfall water equivalentSF
    Sunshine durationSUND
    Surface latent heat fluxSLHF
    Surface net solar radiationSSR
    Surface net thermal radiationSTR
    Surface pressureSP
    Surface sensible heat fluxSSHF
    Top net thermal radiationTTR
    Total cloud coverTCC
    Total column waterTCW
    Total precipitationTP
    DownLoad: CSV

    Table 2.  List of methods used and their notation.

    MethodDatasetTraining periodNotation
    ECMWFECMWF
    Univariate linear MOSRunningmos_r
    MOML (lr)1Year-roundlr_1_y
    3Year-roundlr_3_y
    3Runninglr_3_r
    MOML (rf)2Year-roundrf_2_y
    3Year-roundrf_3_y
    DownLoad: CSV
  • Alessandrini, S., L. D. Monache, S. Sperati, and G. Cervone, 2015: An analog ensemble for short-term probabilistic solar power forecast. Applied Energy, 157, 95−110, https://doi.org/10.1016/j.apenergy.2015.08.011.
    Alpaydin, E., 2014: Introduction to Machine Learning. 3rd ed., The MIT Press, 640 pp.
    Bishop, C. M., 2006: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, 738 pp.
    Bogoslovskiy, N. N., S. I. Erin, I. A. Borodina, L. I. Kizhner, and K. A. Alipova, 2016: Satellite data assimilation in global numerical weather prediction model using kalman filter. Proceedings of SPIE 10035, 22nd International Symposium Atmospheric and Ocean Optics: Atmospheric Physics, Tomsk, Russian Federation, SPIE, 100356Z, https://doi.org/10.1117/12.2249275.
    Breiman, L., 2001a: Random forests. Machine Learning, 45(1), 5−32, https://doi.org/10.1023/A:1010933404324.
    Breiman, L., 2001b: Statistical modeling: The two cultures. Statistical Science, 16(3), 199−215.
    Buehner, M., R. McTaggart-Cowan, and S. Heilliette, 2017: An ensemble Kalman filter for numerical weather prediction based on variational data assimilation: VarEnKF. Mon. Wea. Rev., 145(2), 617−635, https://doi.org/10.1175/MWR-D-16-0106.1.
    Cabos, R., P. Hecker, N. Kneuper, and J. Schiefele, 2017: Wind forecast uncertainty prediction using machine learning techniques on big weather data. Proceedings of the 17th AIAA Aviation Technology, Integration, and Operations Conference, Denver, Colorado, AIAA.
    Cassola, F., and M. Burlando, 2012: Wind speed and wind energy forecast through Kalman filtering of numerical weather prediction model output. Applied Energy, 99, 154−166, https://doi.org/10.1016/j.apenergy.2012.03.054.
    Chattopadhyay, R., A. Vintzileos, and C. D. Zhang, 2013: A description of the Madden-Julian oscillation based on a self-organizing map. J. Climate, 26(5), 1716−1732, https://doi.org/10.1175/JCLI-D-12-00123.1.
    Cheng, W. Y. Y., and W. J. Steenburgh, 2007: Strengths and weaknesses of MOS, running-mean bias removal, and Kalman filter techniques for improving model forecasts over the western United States. Wea. Forecasting, 22(6), 1304−1318, https://doi.org/10.1175/2007WAF2006084.1.
    Delle Monache, L., T. Nipen, Y. B. Liu, G. Roux, and R. Stull, 2011: Kalman filter and analog schemes to postprocess numerical weather predictions. Mon. Wea. Rev., 139(11), 3554−3570, https://doi.org/10.1175/2011MWR3653.1.
    Domingos, P., 2012: A few useful things to know about machine learning. Communications of the ACM, 55, 78−87.
    Glahn, B., 2014: Determining an optimal decay factor for bias-correcting MOS temperature and dewpoint forecasts. Wea. Forecasting, 29(4), 1076−1090, https://doi.org/10.1175/WAF-D-13-00123.1.
    Glahn, B., M. Peroutka, J. Wiedenfeld, J. Wagner, G. Zylstra, B. Schuknecht, and B. Jackson, 2009: MOS uncertainty estimates in an ensemble framework. Mon. Wea. Rev., 137(1), 246−268, https://doi.org/10.1175/2008MWR2569.1.
    Glahn, H. R., and D. A. Lowry, 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11(8), 1203−1211, https://doi.org/10.1175/1520-0450(1972)011<1203:TUOMOS>2.0.CO;2.
    Hart, K. A., W. J. Steenburgh, D. J. Onton, and A. J. Siffert, 2003: An evaluation of mesoscale-model-based model output statistics (MOS) during the 2002 Olympic and Paralympic winter games. Wea. Forecasting, 19(2), 200−218, https://doi.org/10.1175/1520-0434(2004)019<0200:AEOMMO>2.0.CO;2.
    Haupt, S. E., and B. Kosovic, 2016: Big data and machine learning for applied weather forecasts: Forecasting solar power for utility operations. Proceedings of 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa, IEEE, 496−501, https://doi.org/10.1109/SSCI.2015.79.
    Haupt, S. E., A. Pasini, and C. Marzban, 2009: Artificial Intelligence Methods in the Environmental Sciences. Springer, https://doi.org/10.1007/978-1-4020-9119-3.
    Jacks, E., J. Brent Bower, V. J. Dagostaro, J. Paul Dallavalle, M. C. Erickson, and J. C. Su, 2009: New NGM-based MOS guidance for maximum/minimum temperature, probability of precipitation, cloud amount, and surface wind. Wea. Forecasting, 5(5), 128−138, https://doi.org/10.1175/1520-0434(1990)005<0128:NNBMGF>2.0.CO;2.
    Junk, C., L. Delle Monache, and S. Alessandrini, 2015: Analog-based ensemble model output statistics. Mon. Wea. Rev., 143(7), 2909−2917, https://doi.org/10.1175/MWR-D-15-0095.1.
    Lakshmanan, V., E. Gilleland, A. McGovern, and M. Tingley, 2015: Machine Learning and Data Mining Approaches to Climate Science. Springer International Publishing, https://doi.org/10.1007/978-3-319-17220-0.
    Marzban, C., S. Sandgathe, and E. Kalnay, 2006: MOS, perfect prog, and reanalysis. Mon. Wea. Rev., 134(2), 657−663, https://doi.org/10.1175/MWR3088.1.
    Mass, C. F., D. Ovens, K. Westrick, and B. A. Colle, 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83(3), 407−430, https://doi.org/10.1175/1520-0477(2002)083<0407:DIHRPM>2.3.CO;2.
    Mirkin, B., 2011: Data analysis, mathematical statistics, machine learning, data mining: Similarities and differences. Proceedings of 2011 International Conference on Advanced Computer Science and Information Systems, Jakarta, Indonesia, IEEE, 1−8.
    Mjolsness, E., and D. Decoste, 2001: Machine learning for science: State of the art and future prospects. Science, 293(5537), 2051−2055, https://doi.org/10.1126/science.293.5537.2051.
    Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122(529), 73−119, https://doi.org/10.1002/qj.49712252905.
    Monache, L. D., F. A. Eckel, D. L. Rife, B. Nagarajan, and K. Searight, 2013: Probabilistic weather prediction with an analog ensemble. Mon. Wea. Rev., 141(10), 3498−3516, https://doi.org/10.1175/MWR-D-12-00281.1.
    Paegle, J., Q. Yang, and M. Wang, 1997: Predictability in limited area and global models. Meteor. Atmos. Phys., 63(1−2), 53−69, https://doi.org/10.1007/BF01025364.
    Pelosi, A., H. Medina, J. Van Den Bergh, S. Vannitsem, and G. B. Chirico, 2017: Adaptive Kalman filtering for postprocessing ensemble numerical weather predictions. Mon. Wea. Rev., 145(12), 4837−4584, https://doi.org/10.1175/MWR-D-17-0084.1.
    Peng, X. D., Y. Z. Che, and J. Chang, 2013: A novel approach to improve numerical weather prediction skills by using anomaly integration and historical data. J. Geophys. Res., 118(16), 8814−8826, https://doi.org/10.1002/jgrd.50682.
    Peng, X. D., Y. Z. Che, and J. Chang, 2014: Observational calibration of numerical weather prediction with anomaly integration. Proceedings of the EGU General Assembly Conference Abstracts, Vienna, Austria, EGU.
    Plenković, I. O., L. D. Monache, K. Horvath, M. Hrastinski, and A. Bajić, 2016: Probabilistic wind speed predictions with an analog ensemble. Proceedings of the 6th EMS Annual Meeting & 11th European Conference on Applied Climatology, Trst, Italija, ECAC.
    Rudack, D. E., and J. E. Ghirardelli, 2010: A comparative verification of localized aviation model output statistics program (LAMP) and numerical weather prediction (NWP) model forecasts of ceiling height and visibility. Wea. Forecasting, 25(4), 1161−1178, https://doi.org/10.1175/2010WAF2222383.1.
    Schiller, H., and R. Doerffer, 1999: Neural network for emulation of an inverse model operational derivation of case II water properties from MERIS data. Int. J. Remote Sens., 20(9), 1735−1746, https://doi.org/10.1080/014311699212443.
    Sperati, S., S. Alessandrini, and L. Delle Monache, 2017: Gridded probabilistic weather forecasts with an analog ensemble. Quart. J. Roy. Meteor. Soc., 143, 2874−2885, https://doi.org/10.1002/qj.3137.
    Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125(12), 3297−3319, https://doi.org/10.1175/1520-0493(1997)125<3297:EFANAT>2.0.CO;2.
    Tsang, L., Z. Chen, S. Oh, R. J. Marks, and A. T. C. Chang, 1992: Inversion of snow parameters from passive microwave remote sensing measurements by a neural network trained with a multiple scattering model. IEEE Trans. Geosci. Remote Sens., 30, 1015−1024, https://doi.org/10.1109/36.175336.
    Veenhuis, B. A., 2013: Spread calibration of ensemble MOS forecasts. Mon. Wea. Rev., 141(7), 2467−2482, https://doi.org/10.1175/MWR-D-12-00191.1.
    Wilks, D. S., and T. M. Hamill, 2007: Comparison of ensemble-MOS methods using gfs reforecasts. Mon. Wea. Rev., 135(6), 2379−2390, https://doi.org/10.1175/MWR3402.1.
    Woo, W. C., and W. K. Wong, 2017: Operational application of optical flow techniques to radar-based rainfall nowcasting. Atmosphere, 8(3), 48, https://doi.org/10.3390/atmos8030048.
    Wu, J., H. Q. Pei, Y. Shi, J. Y. Zhang, and Q. H. Wang, 2007: The forecasting of surface air temperature using BP-MOS method based on the numerical forecasting results. Scientia Meteorologica Sinica, 27(4), 430−435, https://doi.org/10.3969/j.issn.1009-0827.2007.04.012. (in Chinese)
    Wu, Q., M. Han, H. Guo, and T. Su, 2016: The optimal training period scheme of MOS temperature forecast. Journal of Applied Meteorological Science, 27(4), 426−434, https://doi.org/10.11898/1001-7313.20160405. (in Chinese)
    Zhang, X. N., J. Cao, S. Y. Yang, and M. H. Qi, 2011: Multi-model compositive MOS method application of fine temperature forecast. Journal of Yunnan University, 33(1), 67−71. (in Chinese)
  • [1] Honghua Dai, 1996: Machine Learning of Weather Forecasting Rules from Large Meteorological Data Bases, ADVANCES IN ATMOSPHERIC SCIENCES, 13, 471-488.  doi: 10.1007/BF03342038
    [2] Michael B. RICHMAN, Lance M. LESLIE, Theodore B. TRAFALIS, Hicham MANSOURI, 2015: Data Selection Using Support Vector Regression, ADVANCES IN ATMOSPHERIC SCIENCES, 32, 277-286.  doi: 10.1007/s00376-014-4072-9
    [3] Keon Tae SOHN, Jeong Hyeong LEE, Young Seuk CHO, 2009: Ternary Forecast of Heavy Snowfall in the Honam Area, Korea, ADVANCES IN ATMOSPHERIC SCIENCES, 26, 327-332.  doi: 10.1007/s00376-009-0327-2
    [4] Min CHEN, Benedikt BICA, Lukas TÜCHLER, Alexander KANN, Yong WANG, 2017: Statistically Extrapolated Nowcasting of Summertime Precipitation over the Eastern Alps, ADVANCES IN ATMOSPHERIC SCIENCES, 34, 925-938.  doi: 10.1007/s00376-017-6185-4
    [5] Jiangjiang XIA, Haochen LI, Yanyan KANG, Chen YU, Lei JI, Lve WU, Xiao LOU, Guangxiang ZHU, Zaiwen Wang, Zhongwei YAN, Lizhi WANG, Jiang ZHU, Pingwen ZHANG, Min CHEN, Yingxin ZHANG, Lihao GAO, Jiarui HAN, 2020: Machine Learning−based Weather Support for the 2022 Winter Olympics, ADVANCES IN ATMOSPHERIC SCIENCES, 37, 927-932.  doi: 10.1007/s00376-020-0043-5
    [6] KE Zongjian, DONG Wenjie, ZHANG Peiqun, WANG Jin, ZHAO Tianbao, 2009: An Analysis of the Difference between the Multiple Linear Regression Approach and the Multimodel Ensemble Mean, ADVANCES IN ATMOSPHERIC SCIENCES, 26, 1157-1168.  doi: 10.1007/s00376-009-8024-8
    [7] Yongku KIM, Balaji RAJAGOPALAN, GyuWon LEE, 2016: Temporal Statistical Downscaling of Precipitation and Temperature Forecasts Using a Stochastic Weather Generator, ADVANCES IN ATMOSPHERIC SCIENCES, 33, 175-183.  doi: 10.1007/s00376-015-5115-6
    [8] Qin XU, Binbin ZHOU, 2003: Retrieving Soil Water Contents from Soil Temperature Measurements by Using Linear Regression, ADVANCES IN ATMOSPHERIC SCIENCES, 20, 849-858.  doi: 10.1007/BF02915509
    [9] SUN Guodong, MU Mu, 2009: Nonlinear Feature of the Abrupt Transitions between Multiple Equilibria States of an Ecosystem Model, ADVANCES IN ATMOSPHERIC SCIENCES, 26, 293-304.  doi: 10.1007/s00376-009-0293-8
    [10] XUE Hai-Le, SHEN Xue-Shun, CHOU Ji-Fan, 2013: A Forecast Error Correction Method in Numerical Weather Prediction by Using Recent Multiple-time Evolution Data, ADVANCES IN ATMOSPHERIC SCIENCES, 30, 1249-1259.  doi: 10.1007/BF02666548
    [11] Guifu ZHANG, Vivek N. MAHALE, Bryan J. PUTNAM, Youcun QI, Qing CAO, Andrew D. BYRD, Petar BUKOVCIC, Dusan S. ZRNIC, Jidong GAO, Ming XUE, Youngsun JUNG, Heather D. REEVES, Pamela L. HEINSELMAN, Alexander RYZHKOV, Robert D. PALMER, Pengfei ZHANG, Mark WEBER, Greg M. MCFARQUHAR, Berrien MOORE III, Yan ZHANG, Jian ZHANG, J. VIVEKANANDAN, Yasser AL-RASHID, Richard L. ICE, Daniel S. BERKOWITZ, Chong-chi TONG, Caleb FULTON, Richard J. DOVIAK, 2019: Current Status and Future Challenges of Weather Radar Polarimetry: Bridging the Gap between Radar Meteorology/Hydrology/Engineering and Numerical Weather Prediction, ADVANCES IN ATMOSPHERIC SCIENCES, 36, 571-588.  doi: 10.1007/s00376-019-8172-4
    [12] PENG Jing, DONG Wenjie, YUAN Wenping, ZHANG Yong, 2012: Responses of Grassland and Forest to Temperature and Precipitation Changes in Northeast China, ADVANCES IN ATMOSPHERIC SCIENCES, 29, 1063-1077.  doi: 10.1007/s00376-012-1172-2
    [13] Chen Jiabin, A.J. Simmons, 1990: Sensitivity of Medium-Range Weather Forecasts to the Use of Reference Atmosphere, ADVANCES IN ATMOSPHERIC SCIENCES, 7, 275-293.  doi: 10.1007/BF03179761
    [14] ZHANG Ning, ZHU Lianfang, ZHU Yan, 2011: Urban Heat Island and Boundary Layer Structures under Hot Weather Synoptic Conditions: A Case Study of Suzhou City, China, ADVANCES IN ATMOSPHERIC SCIENCES, 28, 855-865.  doi: 10.1007/s00376-010-0040-1
    [15] Qu Xiaobo, Julian Heming, 2002: The Impact of Dropsonde Data on Forecasts of Hurricane Debby by the Meteorological Office Unified Model, ADVANCES IN ATMOSPHERIC SCIENCES, 19, 1029-1044.  doi: 10.1007/s00376-002-0062-4
    [16] Myoung-Hwan AHN, Eun-Ha SOHN, Byong-Jun HWANG, Chu-Yong CHUNG, Xiangqian WU, 2006: Derivation of Regression Coefficients for Sea Surface Temperature Retrieval over East Asia, ADVANCES IN ATMOSPHERIC SCIENCES, 23, 474-486.  doi: 10.1007/s00376-006-0474-7
    [17] ZHANG Jie, ZHOU Tianjun, BAO Qing, WU Bo, 2010: The Vertical Structures of Atmospheric Temperature Anomalies Associated with El Nino Simulated by the LASG/IAP AGCM: Sensitivity to Convection Schemes, ADVANCES IN ATMOSPHERIC SCIENCES, 27, 1051-1063.  doi: 10.1007/s00376-010-9167-3
    [18] QIAN Cheng, YAN Zhongwei, Zhaohua WU, FU Congbin, TU Kai, 2011: Trends in Temperature Extremes in Association with Weather-Intraseasonal Fluctuations in Eastern China, ADVANCES IN ATMOSPHERIC SCIENCES, 28, 297-309.  doi: 10.1007/s00376-010-9242-9
    [19] Seung-Woo LEE, Dong-Kyou LEE, 2011: Improvement in Background Error Covariances Using Ensemble Forecasts for Assimilation of High-Resolution Satellite Data, ADVANCES IN ATMOSPHERIC SCIENCES, 28, 758-774.  doi: 10.1007/s00376-010-0145-6
    [20] Chaoqun MA, Tijian WANG, Zengliang ZANG, Zhijin LI, 2018: Comparisons of Three-Dimensional Variational Data Assimilation and Model Output Statistics in Improving Atmospheric Chemistry Forecasts, ADVANCES IN ATMOSPHERIC SCIENCES, 35, 813-825.  doi: 10.1007/s00376-017-7179-y

Get Citation+

Export:  

Share Article

Manuscript History

Manuscript received: 02 February 2019
Manuscript revised: 15 April 2019
Manuscript accepted: 13 May 2019
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

A Model Output Machine Learning Method for Grid Temperature Forecasts in the Beijing Area

    Corresponding author: Pingwen ZHANG, pzhang@pku.edu.cn
  • 1. School of Mathematical Sciences, Peking University, Beijing 100871, China
  • 2. Institute of Atmospheric Physics Chinese Academy of Sciences, Beijing 100029, China
  • 3. Beijing Meteorological Service, Beijing 100089, China
  • 4. School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract: In this paper, the model output machine learning (MOML) method is proposed for simulating weather consultation, which can improve the forecast results of numerical weather prediction (NWP). During weather consultation, the forecasters obtain the final results by combining the observations with the NWP results and giving opinions based on their experience. It is obvious that using a suitable post-processing algorithm for simulating weather consultation is an interesting and important topic. MOML is a post-processing method based on machine learning, which matches NWP forecasts against observations through a regression function. By adopting different feature engineering of datasets and training periods, the observational and model data can be processed into the corresponding training set and test set. The MOML regression function uses an existing machine learning algorithm with the processed dataset to revise the output of NWP models combined with the observations, so as to improve the results of weather forecasts. To test the new approach for grid temperature forecasts, the 2-m surface air temperature in the Beijing area from the ECMWF model is used. MOML with different feature engineering is compared against the ECMWF model and modified model output statistics (MOS) method. MOML shows a better numerical performance than the ECMWF model and MOS, especially for winter. The results of MOML with a linear algorithm, running training period, and dataset using spatial interpolation ideas, are better than others when the forecast time is within a few days. The results of MOML with the Random Forest algorithm, year-round training period, and dataset containing surrounding gridpoint information, are better when the forecast time is longer.

摘要: 数值天气预报的预报结果可以通过天气会商来进行提高,本文提出了模式输出机器学习(MOML)方法对天气会商过程进行模拟,从而提高数值预报结果。通过天气会商,预报员利用预报经验知识结合数值预报结果和观测数据得到最终的天气预报结果。显然,利用合适的模式后处理算法模拟预报员天气会商的过程是一个有趣和重要的课题。MOML方法是一个基于机器学习的模式后处理方法,它通过一个回归函数将数值预报结果跟观测数据进行配置。对数据集和训练期采用不同的特征工程技术,我们把观测数据和模式数据处理为不同的训练集和测试集,之后再将已有的机器学习回归算法应用到处理后的数据集中,从而提高模式结果。我们把这个方法应用到北京地区2米格点地表气温的ECMWF模式后处理中来进行检验。我们设计了各种特征工程方案,得到了不同的MOML算法模型,并和ECMWF模式结果以及模式输出统计(MOS)方法进行比较。数值结果表明,MOML方法的结果比ECMWF模式结果和MOS方法更好,尤其是冬季更明显。其中最好的MOML特征工程混合方案是短期预报用线性回归、滑动训练器和基于空间插值思想的数据集的组合,中期预报用随机森林、全年训练器和包含周围格点的数据集的组合。

    • Weather forecasting plays an important role in many fields, such as agriculture, transportation, industry, commerce, atmospheric science research, and so on. In the past, people have forecast the weather by using meteorological knowledge, statistics and the observational data collected at weather stations. Numerical weather prediction (NWP) has made great progress in the last 50 years with the development of computer technology, modeling techniques, and observations (Molteni et al., 1996; Toth and Kalnay, 1997). Nevertheless, NWP model forecasts contain systematic biases due to imperfect model physics, initial conditions, and boundary conditions (Paegle et al., 1997; Mass et al., 2002; Hart et al., 2003; Cheng and Steenburgh, 2007; Rudack and Ghirardelli, 2010).

      Because the output of NWP and observations have different systematic errors, the forecasting performance for various regions, seasons and weather processes is different. Before the release of a weather forecast, in order to further improve its accuracy, a weather consultation is indispensable. The forecasters obtain the final results during this weather consultation by combining observations with the NWP results and giving opinions based on their experience. The process of weather consultation is actually a manual process of post-processing the NWP results, and thus the professional knowledge and practical experience of individuals have a crucial impact on the forecast results. Owing to the current increase in data size and the improvement of weather forecasting requirements, the current weather consultation model cannot meet the needs of the development of weather forecasting, and so suitable post-processing algorithms are needed to help the manual process of weather consultation (Hart et al., 2003; Cheng and Steenburgh, 2007; Wilks and Hamill, 2007; Veenhuis, 2013).

      In order to remove systematic errors and improve the output from NWP models, a variety of post-processing methods have been developed for simulating weather consultation (Wilks and Hamill, 2007; Veenhuis, 2013)—for example, model output statistics (MOS) (Glahn and Lowry, 1972; Cheng and Steenburgh, 2007; Wu et al., 2007; Glahn et al., 2009; Jacks et al., 2009; Zhang et al., 2011; Glahn, 2014; Wu et al., 2016), the analog ensemble (Monache et al., 2013; Alessandrini et al., 2015; Junk et al., 2015; Plenković et al., 2016; Sperati et al., 2017), the Kalman filter (Delle Monache et al., 2011; Cassola and Burlando, 2012; Bogoslovskiy et al., 2016; Buehner et al., 2017; Pelosi et al., 2017), anomaly numerical-correction with observations (Peng et al., 2013, 2014), among which MOS is one of the most commonly used to produce unbiased forecasts (Glahn et al., 2009). MOS uses multiple linear regression to produce an improved forecast at specific locations by using model forecast variables and prior observations as predictors (Marzban et al., 2006; Cheng and Steenburgh, 2007). MOS remains a useful tool and, during the 2002 Winter Olympic Games, MM5-based MOS outperformed the native forecasts produced by MM5 and was equally or more skillful than human-generated forecasts by the Olympic Forecast Team (Hart et al., 2003). Glahn (2014) used MOS with a decay factor to predict temperature and dewpoint, and showed how different values of the decay factor affect MOS temperature and dewpoint forecasts (Glahn, 2014).

      Although machine learning and statistics both draw conclusions from the data, they belong to two different modeling cultures. Statistics assumes that the data are generated by a given stochastic data model. Statistical methods have few parameters, and the values of the parameters are estimated from the data. Machine learning uses algorithmic models and treats the data mechanism as unknown. The approach of machine learning is to find an algorithm that can be fitted to the data, and it has lots of parameters (Breiman, 2001b). Machine learning has developed rapidly in fields outside statistics. It can be used both on large complex datasets and as a more accurate and informative alternative to data modeling on smaller datasets (Mirkin, 2011). Machine learning is becoming increasingly more important to the development of science and technology (Mjolsness and Decoste, 2001). To apply machine learning to practical problems, one of the most important things is to apply feature engineering and data structures (Domingos, 2012). The quality of feature engineering directly affects the final result. For some practical problems with special data structures, targeted feature engineering is required.

      Since weather forecasts depend highly on data information and technology, how to make better use of machine learning and big-data technology to improve weather forecasts has become a research hotspot. Machine learning has been used in meteorology for decades (Haupt et al., 2009; Lakshmanan et al., 2015; Haupt and Kosovic, 2016; Cabos et al., 2017). For instance, the neural network technique was applied to the inversion of a multiple scattering model to estimate snow parameters from passive microwave measurements (Tsang et al., 1992). Schiller and Doerffer (1999) used a neural network technique for inverting a radiative transfer forward model to estimate the concentration of phytoplankton pigment from Medium Resolution Imaging Spectrometer data