Advanced Search
Article Contents

Enhancing Deep Learning Soil Moisture Forecasting Models by Integrating Physics-based Models


doi:  10.1007/s00376-023-3181-8

  • Accurate soil moisture (SM) prediction is critical for understanding hydrological processes. Physics-based (PB) models exhibit large uncertainties in SM predictions arising from uncertain parameterizations and insufficient representation of land-surface processes. In addition to PB models, deep learning (DL) models have been widely used in SM predictions recently. However, few pure DL models have notably high success rates due to lacking physical information. Thus, we developed hybrid models to effectively integrate the outputs of PB models into DL models to improve SM predictions. To this end, we first developed a hybrid model based on the attention mechanism to take advantage of PB models at each forecast time scale ( attention model). We further built an ensemble model that combined the advantages of different hybrid schemes ( ensemble model). We utilized SM forecasts from the Global Forecast System to enhance the convolutional long short-term memory (ConvLSTM) model for 1–16 days of SM predictions. The performances of the proposed hybrid models were investigated and compared with two existing hybrid models. The results showed that the attention model could leverage benefits of PB models and achieved the best predictability of drought events among the different hybrid models. Moreover, the ensemble model performed best among all hybrid models at all forecast time scales and different soil conditions. It is highlighted that the ensemble model outperformed the pure DL model over 79.5% of in situ stations for 16-day predictions. These findings suggest that our proposed hybrid models can adequately exploit the benefits of PB model outputs to aid DL models in making SM predictions.
    摘要: 准确的土壤湿度预测至关重要。由于对地表过程表征不准确等原因,基于物理模型的土壤湿度预测表现出较大的不确定性。尽管近期深度学习模型被广泛应用于土壤湿度预测,但由于缺乏物理信息,在中期预报中很少有深度学习模型能够提供令人满意的效果。我们开发了混合预报模型,能有效地将物理模型的预报信息融合到深度学习模型中,从而改进土壤湿度预测。首先,我们基于注意力机制,在不同的时空尺度上充分融合深度学习和物理模型各自的优势。并进一步结合不同混合方案的优势,构建了集合的混合预报模型。为验证所提出的模型,我们在中国区域内将 GFS 的土壤湿度预报融合入ConvLSTM模型,进行 1-16 天的土壤湿度预测。结果表明,我们所提出的混合预报模型在不同的预报时间尺度、不同土壤条件以及干旱极端事件预报中均为最优模型。我们提出的混合模式可以有效改进中期土壤湿度的预报,并能为利用物理模型信息改进深度学习预报提供可靠的范例。
  • 加载中
  • Figure 1.  The detail of the (a) ConvLSTM-ED model and (b) inner structure of ConvLSTM.

    Figure 2.  The model structures of (a) the condition model, (b) the attention model, and (c) the attention block. The dimension of each intermediate feature of the attention block is annotated. H, W and F are the sizes of the height, width and features dimensions, respectively. The abbreviations of intermediate features are the same as given in the main text. The colors in (c) indicate weights.

    Figure 3.  The mean (a) R and (b) ubRMSE of different forecast models at different forecast time scales. Dash lines denote the performance of SMAP L4 data evaluated by in-situ observations. The abbreviations of model names are the same as in section 3.

    Figure 4.  The spatial distribution of performance (R) in 1-, 7- and 16-day forecasts of different models. We used the average model as the baseline hybrid model to evaluate the performances of the different hybrid models. Panels (a–c) show the performance of the average model, while the remaining rows show the differences between the R of the target model and the R of the average model. Red points indicate that the model improved the performance compared to the average model, while blue points show a declined performance.

    Figure 5.  The R of the (a–c) GFS, (d–f) ConvLSTM-ED, and (g–i) attention models, and (j–l) the improvement of the attention model compared to the condition model.

    Figure 6.  TCA-based SNR of different models. The triplets of the TCA are [*, ERA5-Land, SoMo.ml], where * denotes the forecast models. Panels (a–c) show the average model. The remaining rows show the difference between the SNR of the target model and the average model.

    Figure 7.  TCA-based SNR of (a-c) ConvLSTM-ED, (d–f) ensemble model predictions and (h) the SMAP L4 datasets. The triplet of TCA is [*, ERA5-Land, SoMo.ml], where * denotes the forecast models and SMAP L4.

    Figure 8.  The kernel density curve of the SWDI of the in situ observations from different forecast models (lines with different colors) at the (a) week-1 and (b) week-2 forecast.

    Table 1.  The probability of an accurate drought event detection by different models over different climate regions based on in situ SM observations. The abbreviations of the model names are the same as in Fig. 1. The week 1 and week 2 columns represent the ability to forecast the 1-week and 2-week drought. n denotes the number of stations located over target climate regions.

    ModelTropical (n=16)Arid (n=91)Temperate (n=642)Cold (n=350)Polar (n=30)
    Week 1Week 2Week 1Week 2Week 1Week 2Week 1Week 2Week 1Week 2
    GFS0.5780.4930.511#0.477#0.665*0.5820.506#0.469#0.396#0.370#
    ConvLSTM0.7200.6610.573*0.5210.605#0.5600.5750.5320.6560.637
    average0.5210.4790.5360.4920.6430.5920.5420.5020.5290.502
    condition0.744*0.693*0.5430.5190.605#0.532#0.5820.5450.6400.578
    attention0.6550.6300.5700.536*0.6290.598*0.599*0.550*0.696*0.644*
    ensemble0.506#0.474#0.5510.5310.6130.5640.5710.5380.6220.577
    *Best model to detect drought events over the target climate region.
    #Worst model to detect drought events over the target climate region.
    DownLoad: CSV
  • Beck, H. E., and Coauthors, 2021: Evaluation of 18 satellite-and model-based soil moisture products using in situ measurements from 826 sensors. Hydrology and Earth System Sciences, 25, 17−40, https://doi.org/10.5194/hess-25-17-2021.
    Brooks, P. D., J. Chorover, Y. Fan, S. E. Godsey, R. M. Maxwell, J. P. McNamara, and C. Tague, 2015: Hydrological partitioning in the critical zone: Recent advances and opportunities for developing transferable understanding of water cycle dynamics. Water Resourse Research., 51, 6973−6987, https://doi.org/10.1002/2015WR017039.
    Cai, Y. L., P. R. Fan, S. Lang, M. Y. Li, Y. Muhammad, and A. X. Liu, 2022: Downscaling of SMAP soil moisture data by using a deep belief network. Remote Sensing, 14, 5681, https://doi.org/10.3390/rs14225681.
    Cho, K., B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, 2014: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv: 1406.1078, https://doi.org/10.48550/arXiv.1406.1078.
    Crow, W. T., F. Chen, R. H. Reichle, Y. Xia, and Q. Liu, 2018: Exploiting soil moisture, precipitation, and streamflow observations to evaluate soil moisture/runoff coupling in land surface models. Geophysical Research Letter, 45, 4869−4878, https://doi.org/10.1029/2018GL077193.
    Cui, Z., Y. L. Zhou, S. L. Guo, J. Wang, and C. Y. Xu, 2022: Effective improvement of multi-step-ahead flood forecasting accuracy through encoder-decoder with an exogenous input structure. Journal of Hydrology, 609, 127764, https://doi.org/10.1016/j.jhydrol.2022.127764.
    Daw, A., A. Karpatne, W. D. Watkins, J. S. Read, and V. Kumar, 2022: Physics-guided neural networks (PGNN): An application in lake temperature modeling. Knowledge Guided Machine Learning, Chapman and Hall/CRC, 353−372.
    de Rosnay, P., J. Muñoz-Sabater, C. Albergel, L. Isaksen, S. English, M. Drusch, and J. P. Wigneron, 2020: SMOS brightness temperature forward modelling and long term monitoring at ECMWF. Remote Sensing of Environment, 237, 111424, https://doi.org/10.1016/j.rse.2019.111424.
    Dharssi, I., K. J. Bovis, B. Macpherson, and C. P. Jones, 2011: Operational assimilation of ASCAT surface soil wetness at the Met Office. Hydrology and Earth System Sciences, 15, 2729−2746, https://doi.org/10.5194/hess-15-2729-2011.
    Dorigo, W., and Coauthors, 2017: ESA CCI Soil Moisture for improved Earth system understanding: State-of-the art and future directions. Remote Sensing of Environment, 203, 185−215, https://doi.org/10.1016/j.rse.2017.07.001.
    Dorigo, W. A., and Coauthors, 2013: Global automated quality control of in situ soil moisture data from the international soil moisture network. Vadose Zone Journal, 12, 1−21, https://doi.org/10.2136/vzj2012.0097.
    ElSaadani, M., E. Habib, A. M. Abdelhameed, and M. Bayoumi, 2021: Assessment of a spatiotemporal deep learning approach for soil moisture prediction and filling the gaps in between soil moisture observations. Frontiers in Artificial Intelligence, 4, 636234, https://doi.org/10.3389/frai.2021.636234.
    Entekhabi, D., R. H. Reichle, R. D. Koster, and W. T. Crow, 2010: Performance metrics for soil moisture retrievals and application requirements. Journal of Hydrometeorology, 11, 832−840, https://doi.org/10.1175/2010JHM1223.1.
    Esit, M., S. Kumar, A. Pandey, D. M. Lawrence, I. Rangwala, and S. Yeager, 2021: Seasonal to multi-year soil moisture drought forecasting. npj Climate and Atmospheric Science, 4, 16, https://doi.org/10.1038/s41612-021-00172-z.
    Fan, Y., and H. van den Dool, 2011: Bias correction and forecast skill of NCEP GFS ensemble week-1 and week-2 precipitation, 2-m surface air temperature, and soil moisture forecasts. Weather and Forecasting, 26, 355−370, https://doi.org/10.1175/WAF-D-10-05028.1.
    Fang, K., and C. P. Shen, 2020: Near-real-time forecast of satellite-based soil moisture using long short-term memory with an adaptive data integration kernel. Journal of Hydrometeorology, 21, 399−413, https://doi.org/10.1175/JHM-D-19-0169.1.
    Fang, K., M. Pan, and C. P. Shen, 2019: The value of SMAP for long-term soil moisture estimation with the help of deep learning. IEEE Transactions on Geoscience and Remote Sensing, 57, 2221−2233, https://doi.org/ 10.1109/TGRS.2018.2872131.
    Fang, K., C. P. Shen, D. Kifer, and X. Yang, 2017: Prolongation of SMAP to spatiotemporally seamless coverage of continental U.S. using a deep learning neural network. Geophys. Res. Lett., 44 , 11 030−11 039, https://doi.org/10.1002/2017GL075619.
    Feng, D. P., J. T. Liu, K. Lawson, and C. P. Shen, 2022: Differentiable, learnable, regionalized process-based models with multiphysical outputs can approach state-of-the-art hydrologic prediction accuracy. Water Resour. Res., 58, e2022WR032404, https://doi.org/10.1029/2022WR032404.
    Ford, T. W., and S. M. Quiring, 2019: Comparison of contemporary in situ, model, and satellite remote sensing soil moisture with a focus on drought monitoring. Water Resour. Res., 55, 1565−1582, https://doi.org/10.1029/2018WR024039.
    Gruber, A., C. H. Su, S. Zwieback, W. Crow, W. Dorigo, and W. Wagner, 2016: Recent advances in (soil moisture) triple collocation analysis. International Journal of Applied Earth Observation and Geoinformation, 45, 200−211, https://doi.org/10.1016/j.jag.2015.09.002.
    Heimhuber, V., M. G. Tulbure, and M. Broich, 2017: Modeling multidecadal surface water inundation dynamics and key drivers on large river basin scale using multiple time series of earth-observation and river flow data. Water Resour. Res., 53, 1251−1269, https://doi.org/10.1002/2016WR019858.
    Huang, F. N., W. Shangguan, Q. L. Li, L. Li, and Y. Zhang, 2023: Beyond prediction: An integrated post-hoc approach to interpret complex model in hydrometeorology. Environmental Modelling & Software, 167, 105762, https://doi.org/10.1016/j.envsoft.2023.105762.
    Kanamitsu, M., C.-H. Lu, J. Schemm, and W. Ebisuzaki, 2003: The predictability of soil moisture and near-surface temperature in Hindcasts of the NCEP seasonal forecast model. J. Climate, 16, 510−521, https://doi.org/10.1175/1520-0442(2003)016<0510:TPOSMA>2.0.CO;2.
    Kannan, A., G. Tsagkatakis, R. Akbar, D. Selva, V. Ravindra, R. Levinson, S. Nag, and M. Moghaddam, 2022: Forecasting soil moisture using a deep learning model integrated with passive microwave retrieval. Preprints, IGARSS 2022−2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, IEEE, 6112−6114, https://doi.org/10.1109/IGARSS46834.2022.9883245.
    Karthikeyan, L., and A. K. Mishra, 2021: Multi-layer high-resolution soil moisture estimation using machine learning over the United States. Remote Sensing of Environment, 266, 112706, https://doi.org/10.1016/j.rse.2021.112706.
    Kim, H., and Coauthors, 2020: Global scale error assessments of soil moisture estimates from microwave-based active and passive satellites and land surface models over forest and mixed irrigated/dryland agriculture regions. Remote Sensing of Environment, 251, 112052, https://doi.org/10.1016/j.rse.2020.112052.
    Kingma, D. P., and J. Ba, 2017: Adam: A method for stochastic optimization. arXiv:1412.6980, https://doi.org/10.48550/arXiv.1412.6980.
    Klocek, S., and Coauthors, 2022: MS-nowcasting: Operational precipitation nowcasting with convolutional LSTMs at Microsoft weather. arXiv:2111.09954, https://doi.org/10.48550/arXiv.2111.09954.
    Lawston, P. M., J. A. Santanello Jr., and S. V. Kumar, 2017: Irrigation signals detected from SMAP soil moisture retrievals. Geophys. Res. Lett., 44 , 11 860−11 867, https://doi.org/10.1002/2017GL075733.
    Lee, J., S. Park, J. Im, C. Yoo, and E. Seo, 2022: Improved soil moisture estimation: Synergistic use of satellite observations and land surface models over CONUS based on machine learning. J. Hydrol., 609, 127749, https://doi.org/10.1016/j.jhydrol.2022.127749.
    Li, L., Y. J. Dai, W. Shangguan, N. Wei, Z. W. Wei, and S. Gupta, 2022a: Multistep forecasting of soil moisture using spatiotemporal deep encoder–decoder networks. Journal of Hydrometeorology, 23, 337−350, https://doi.org/10.1175/JHM-D-21-0131.1.
    Li, L., Y. J. Dai, W. Shangguan, Z. W. Wei, N. Wei, and Q. L. Li, 2022b: Causality-structured deep learning for soil moisture predictions. Journal of Hydrometeorology, 23, 1315−1331, https://doi.org/10.1175/JHM-D-21-0206.1.
    Li, Q. L., Z. Y. Wang, W. Shangguan, L. Li, Y. F. Yao, and F. H. Yu, 2021: Improved daily SMAP satellite soil moisture prediction over China using deep learning model with transfer learning. J. Hydrol., 600, 126698, https://doi.org/10.1016/j.jhydrol.2021.126698.
    Li, Y., S. Grimaldi, V. R. N. Pauwels, and J. P. Walker, 2018: Hydrologic model calibration using remotely sensed soil moisture and discharge measurements: The impact on predictions at gauged and ungauged locations. J. Hydrol., 557, 897−909, https://doi.org/10.1016/j.jhydrol.2018.01.013.
    Liu, L. C., and Coauthors, 2022: KGML-ag: A modeling framework of knowledge-guided machine learning to simulate agroecosystems: A case study of estimating N2O emission using data from mesocosm experiments. Geoscientific Model Development, 15, 2839−2858, https://doi.org/10.519 4/gmd-15-2839-2022.
    Liu, W. B., T. Yang, F. B. Sun, H. Wang, Y. Feng, and M. Y. Du, 2021: Observation-constrained projection of global flood magnitudes with anthropogenic warming. Water Resour. Res., 57, e2020WR028830, https://doi.org/10.1029/2020WR028830.
    Luo, L. F., E. F. Wood, and M. Pan, 2007: Bayesian merging of multiple climate model forecasts for seasonal hydrological predictions. J. Geophys. Res.: Atmos., 112, D10102, https://doi.org/10.1029/2006JD007655.
    Maggioni, V., E. N. Anagnostou, and R. H. Reichle, 2012: The impact of model and rainfall forcing errors on characterizing soil moisture uncertainty in land surface modeling. Hydrology and Earth System Sciences, 16, 3499−3515, https://doi.org/10.5194/hess-16-3499-2012.
    Martínez-Fernández, J., A. González-Zamora, N. Sánchez, and A. Gumuzzio, 2015: A soil water based index as a suitable agricultural drought indicator. J. Hydrol., 522, 265−273, https://doi.org/10.1016/j.jhydrol.2014.12.051.
    Mishra, A., T. Vu, A. V. Veettil, and D. Entekhabi, 2017: Drought monitoring with soil moisture active passive (SMAP) measurements. J. Hydrol., 552, 620−632, https://doi.org/10.1016/j.jhydrol.2017.07.033.
    Muñoz-Sabater, J., H. Lawrence, C. Albergel, P. Rosnay, L. Isaksen, S. Mecklenburg, Y. Kerr, and M. Drusch, 2019: Assimilation of SMOS brightness temperatures in the ECMWF integrated forecasting system. Quart. J. Roy. Meteor. Soc., 145, 2524−2548, https://doi.org/10.1002/qj.3577.
    Muñoz-Sabater, J., and Coauthors, 2021: ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth System Science Data, 13, 4349−4383, https://doi.org/10.5194/essd-13-4349-2021.
    O, S., and R. Orth, 2021: Global soil moisture data derived through machine learning trained with in-situ measurements. Scientific Data, 8, 170, https://doi.org/10.1038/s41597-021-00964-1.
    Peng, J., and Coauthors, 2021: A roadmap for high-resolution satellite soil moisture applications–confronting product characteristics with user requirements. Remote Sensing of Environment, 252, 112162, https://doi.org/10.1016/j.rse.2020.112162.
    Read, J. S., and Coauthors, 2019: Process-guided deep learning predictions of lake water temperature. Water Resour. Res., 55, 9173−9190, https://doi.org/10.1029/2019WR024922.
    Reichle, R. H., and Coauthors, 2017: Assessment of the SMAP Level-4 surface and root-zone soil moisture product using in situ measurements. Journal of Hydrometeorology, 18, 2621−2645, https://doi.org/10.1175/JHM-D-17-0063.1.
    Santanello, J. A. Jr., P. Lawston, S. Kumar, and E. Dennis, 2019: Understanding the impacts of soil moisture initial conditions on NWP in the context of land–atmosphere coupling. Journal of Hydrometeorology, 20, 793−819, https://doi.org/10.1175/JHM-D-18-0186.1.
    Seneviratne, S. I., T. Corti, E. L. Davin, M. Hirschi, E. B. Jaeger, I. Lehner, B. Orlowsky, and A. J. Teuling, 2010: Investigating soil moisture–climate interactions in a changing climate: A review. Earth-Science Reviews, 99, 125−161, https://doi.org/10.1016/j.earscirev.2010.02.004.
    Slater, L. J., and Coauthors, 2023: Hybrid forecasting: Blending climate predictions with AI models. Hydrology and Earth System Sciences, 27, 1865−1889, https://doi.org/10.5194/hess-27-1865-2023.
    Speight, L. J., M. D. Cranston, C. J. White, and L. Kelly, 2021: Operational and emerging capabilities for surface water flood forecasting. WIREs Water, 8, e1517, https://doi.org/10.1002/wat2.1517.
    Stoffelen, A., 1998: Toward the true near-surface wind speed: Error modeling and calibration using triple collocation. J. Geophys. Res.: Oceans, 103, 7755−7766, https://doi.org/10.1029/97JC03180.
    Wigneron, J. P., and Coauthors, 2018: SMOS-IC: Current status and overview of soil moisture and VOD applications. Preprints, IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symp., Valencia, Spain, IEEE, 1451−1453, https://doi.org/10.1109/IGARSS.2018.8519382.
    Willard, J., X. W. Jia, S. M. Xu, M. Steinbach, and V. Kumar, 2022a: Integrating scientific knowledge with machine learning for engineering and environmental systems. ACM Computing Surveys, ACM Computing Surveys, 55, 66, https://doi.org/10.1145/3514228.
    Wood, A. W., and D. P. Lettenmaier, 2006: A test bed for new seasonal hydrologic forecasting approaches in the western United States. Bull. Amer. Meteor. Soc., 87, 1699−1712, https://doi.org/10.1175/BAMS-87-12-1699.
    Xia, Y. L., J. Sheffield, M. B. Ek, J. R. Dong, N. Chaney, H. L. Wei, J. Meng, and E. F. Wood, 2014: Evaluation of multi-model simulated soil moisture in NLDAS-2. J. Hydrol., 512, 107−125, https://doi.org/10.1016/j.jhydrol.2014.02.027.
    Yamazaki, D., and Coauthors, 2017: A high-accuracy map of global terrain elevations. Geophys. Res. Lett., 44, 5844−5853, https://doi.org/10.1002/2017GL072874.
    Yang, H. C., H. X. Wang, G. B. Fu, H. M. Yan, P. P. Zhao, and M. H. Ma, 2017: A modified soil water deficit index (MSWDI) for agricultural drought monitoring: Case study of Songnen Plain, China. Agricultural Water Management, 194, 125−138, https://doi.org/10.1016/j.agwat.2017.07.022.
    Yin, J. F., C. R. Hain, X. W. Zhan, J. R. Dong, and M. Ek, 2019: Improvements in the forecasts of near-surface variables in the Global Forecast System (GFS) via assimilating ASCAT soil moisture retrievals. J. Hydrol., 578, 124018, https://doi.org/10.1016/j.jhydrol.2019.124018.
    Zhang, R. Q., and Coauthors, 2021: Assessment of agricultural drought using soil water deficit index based on ERA5-land soil moisture data in four southern provinces of China. Agriculture, 11, 411, https://doi.org/10.3390/agriculture1105 0411.
  • [1] Yunqing LIU, Lu YANG, Mingxuan CHEN, Linye SONG, Lei HAN, Jingfeng XU, 2024: A Deep Learning Approach for Forecasting Thunderstorm Gusts in the Beijing–Tianjin–Hebei Region, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-023-3255-7
    [2] Tingyu WANG, Ping HUANG, Xianke YANG, 2024: Understanding the Low Predictability of the 2015/16 El Niño Event Based on a Deep Learning Model, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-024-3238-3
    [3] Hanxiao Yuan, Yang Liu, Qiuhua TANG, Jie LI, Guanxu CHEN, Wuxu CAI, 2024: ST-LSTM-SA:A new ocean sound velocity fields prediction model based on deep learning, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-024-3219-6
    [4] Lei HAN, Mingxuan CHEN, Kangkai CHEN, Haonan CHEN, Yanbiao ZHANG, Bing LU, Linye SONG, Rui QIN, 2021: A Deep Learning Method for Bias Correction of ECMWF 24–240 h Forecasts, ADVANCES IN ATMOSPHERIC SCIENCES, 38, 1444-1459.  doi: 10.1007/s00376-021-0215-y
    [5] Jiang HUANGFU, Zhiqun HU, Jiafeng ZHENG, Lirong WANG, Yongjie ZHU, 2024: Study on Quantitative Precipitation Estimation by Polarimetric Radar Using Deep Learning, ADVANCES IN ATMOSPHERIC SCIENCES, 41, 1147-1160.  doi: 10.1007/s00376-023-3039-0
    [6] D. R. Johnson, Zhuojian Yuan, 1998: The Development and Initial Tests of an Atmospheric Model Based on a Vertical Coordinate with a Smooth Transition from Terrain Following to Isentropic Coordinates, ADVANCES IN ATMOSPHERIC SCIENCES, 15, 283-299.  doi: 10.1007/s00376-998-0001-0
    [7] Kanghui ZHOU, Jisong SUN, Yongguang ZHENG, Yutao ZHANG, 2022: Quantitative Precipitation Forecast Experiment Based on Basic NWP Variables Using Deep Learning, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 1472-1486.  doi: 10.1007/s00376-021-1207-7
    [8] Xiaoran DONG, Yafei NIE, Jinfei WANG, Hao LUO, Yuchun GAO, Yun WANG, Jiping LIU, Dake CHEN, Qinghua YANG, 2024: Deep Learning Shows Promise for Seasonal Prediction of Antarctic Sea Ice in a Rapid Decline Scenario, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-024-3380-y
    [9] Pumeng LYU, Tao TANG, Fenghua LING, Jing-Jia LUO, Niklas BOERS, Wanli OUYANG, Lei BAI, 2024: ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-024-3316-6
    [10] Chentao SONG, Jiang ZHU, Xichen LI, 2024: Assessments of Data-Driven Deep Learning Models on One-Month Predictions of Pan-Arctic Sea Ice Thickness, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-023-3259-3
    [11] Tingyu WANG, Ping HUANG, 2024: Superiority of a Convolutional Neural Network Model over Dynamical Models in Predicting Central Pacific ENSO, ADVANCES IN ATMOSPHERIC SCIENCES, 41, 141-154.  doi: 10.1007/s00376-023-3001-1
    [12] Ya WANG, Gang HUANG, Baoxiang PAN, Pengfei LIN, Niklas BOERS, Weichen TAO, Yutong CHEN, BO LIU, Haijie LI, 2024: Correcting Climate Model Sea Surface Temperature Simulations with Generative Adversarial Networks: Climatology, Interannual Variability, and Extremes, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-024-3288-6
    [13] Gang HUANG, Ya WANG, Yoo-Geun HAM, Bin MU, Weichen TAO, Chaoyang XIE, 2024: Toward a Learnable Climate Model in the Artificial Intelligence Era, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-024-3305-9
    [14] Temesgen Gebremariam ASFAW, Jing-Jia LUO, 2024: Downscaling Seasonal Precipitation Forecasts over East Africa with Deep Convolutional Neural Networks, ADVANCES IN ATMOSPHERIC SCIENCES, 41, 449-464.  doi: 10.1007/s00376-023-3029-2
    [15] Hui LIU, Bo HU, Yuesi WANG, Guangren LIU, Liqin TANG, Dongsheng JI, Yongfei BAI, Weikai BAO, Xin CHEN, Yunming CHEN, Weixin DING, Xiaozeng HAN, Fei HE, Hui HUANG, Zhenying HUANG, Xinrong LI, Yan LI, Wenzhao LIU, Luxiang LIN, Zhu OUYANG, Boqiang QIN, Weijun SHEN, Yanjun SHEN, Hongxin SU, Changchun SONG, Bo SUN, Song SUN, Anzhi WANG, Genxu WANG, Huimin WANG, Silong WANG, Youshao WANG, Wenxue WEI, Ping XIE, Zongqiang XIE, Xiaoyuan YAN, Fanjiang ZENG, Fawei ZHANG, Yangjian ZHANG, Yiping ZHANG, Chengyi ZHAO, Wenzhi ZHAO, Xueyong ZHAO, Guoyi ZHOU, Bo ZHU, 2017: Two Ultraviolet Radiation Datasets that Cover China, ADVANCES IN ATMOSPHERIC SCIENCES, 34, 805-815.  doi: 10.1007/s00376-017-6293-1
    [16] Ruian TIE, Chunxiang SHI, Gang WAN, Xingjie HU, Lihua KANG, Lingling GE, 2022: CLDASSD: Reconstructing Fine Textures of the Temperature Field Using Super-Resolution Technology, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 117-130.  doi: 10.1007/s00376-021-0438-y
    [17] Jinhe YU, Lei BI, Wei HAN, Xiaoye ZHANG, 2022: Application of a Neural Network to Store and Compute the Optical Properties of Non-Spherical Particles, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 2024-2039.  doi: 10.1007/s00376-021-1375-5
    [18] LIU Shikuo, LIU Shida, FU Zuntao, SUN Lan, 2005: A Nonlinear Coupled Soil Moisture-Vegetation Model, ADVANCES IN ATMOSPHERIC SCIENCES, 22, 337-342.  doi: 10.1007/BF02918747
    [19] DAN Li, JI Jinjun, ZHANG Peiqun, 2005: The Soil Moisture of China in a High Resolution Climate-Vegetation Model, ADVANCES IN ATMOSPHERIC SCIENCES, 22, 720-729.  doi: 10.1007/BF02918715
    [20] Binghao JIA, Longhuan WANG, Yan WANG, Ruichao LI, Xin LUO, Jinbo XIE, Zhenghui XIE, Si CHEN, Peihua QIN, Lijuan LI, Kangjun CHEN, 2021: CAS-LSM Datasets for the CMIP6 Land Surface Snow and Soil Moisture Model Intercomparison Project, ADVANCES IN ATMOSPHERIC SCIENCES, 38, 862-874.  doi: 10.1007/s00376-021-0293-x

Get Citation+

Export:  

Share Article

Manuscript History

Manuscript received: 15 August 2023
Manuscript revised: 11 October 2023
Manuscript accepted: 16 November 2023
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Enhancing Deep Learning Soil Moisture Forecasting Models by Integrating Physics-based Models

    Corresponding author: Yongjiu DAI, daiyj6@mail.sysu.edu.cn
  • 1. School of Atmospheric Sciences, Sun Yat-sen University, Guangzhou 510275, China
  • 2. Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangzhou 510275, China
  • 3. Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, Guangzhou 510275, China
  • 4. Institute of Surface-Earth System Science, School of Earth System Science, Tianjin University, Tianjin 300072, China
  • 5. College of Computer Science and Technology, Changchun Normal University, Changchun 130123, China

Abstract: Accurate soil moisture (SM) prediction is critical for understanding hydrological processes. Physics-based (PB) models exhibit large uncertainties in SM predictions arising from uncertain parameterizations and insufficient representation of land-surface processes. In addition to PB models, deep learning (DL) models have been widely used in SM predictions recently. However, few pure DL models have notably high success rates due to lacking physical information. Thus, we developed hybrid models to effectively integrate the outputs of PB models into DL models to improve SM predictions. To this end, we first developed a hybrid model based on the attention mechanism to take advantage of PB models at each forecast time scale ( attention model). We further built an ensemble model that combined the advantages of different hybrid schemes ( ensemble model). We utilized SM forecasts from the Global Forecast System to enhance the convolutional long short-term memory (ConvLSTM) model for 1–16 days of SM predictions. The performances of the proposed hybrid models were investigated and compared with two existing hybrid models. The results showed that the attention model could leverage benefits of PB models and achieved the best predictability of drought events among the different hybrid models. Moreover, the ensemble model performed best among all hybrid models at all forecast time scales and different soil conditions. It is highlighted that the ensemble model outperformed the pure DL model over 79.5% of in situ stations for 16-day predictions. These findings suggest that our proposed hybrid models can adequately exploit the benefits of PB model outputs to aid DL models in making SM predictions.

摘要: 准确的土壤湿度预测至关重要。由于对地表过程表征不准确等原因,基于物理模型的土壤湿度预测表现出较大的不确定性。尽管近期深度学习模型被广泛应用于土壤湿度预测,但由于缺乏物理信息,在中期预报中很少有深度学习模型能够提供令人满意的效果。我们开发了混合预报模型,能有效地将物理模型的预报信息融合到深度学习模型中,从而改进土壤湿度预测。首先,我们基于注意力机制,在不同的时空尺度上充分融合深度学习和物理模型各自的优势。并进一步结合不同混合方案的优势,构建了集合的混合预报模型。为验证所提出的模型,我们在中国区域内将 GFS 的土壤湿度预报融合入ConvLSTM模型,进行 1-16 天的土壤湿度预测。结果表明,我们所提出的混合预报模型在不同的预报时间尺度、不同土壤条件以及干旱极端事件预报中均为最优模型。我们提出的混合模式可以有效改进中期土壤湿度的预报,并能为利用物理模型信息改进深度学习预报提供可靠的范例。

    • Soil moisture (SM) plays an important role in climate and hydrological systems by balancing the interaction of water and energy exchange processes (Seneviratne et al., 2010; Crow et al., 2018). Thus, accurate predictions of SM could improve various crucial applications, e.g., drought monitoring and water resource management (Dorigo et al., 2017).

      Physics-based (PB) models have long been used to forecast SM (Kanamitsu et al., 2003; Wood and Lettenmaier, 2006), and show excellent potential in simulating SM dynamics (Xia et al., 2014; Esit et al., 2021). Therefore, PB models have been widely used for early warning of drought (Luo et al., 2007) and for identifying SM–precipitation feedback (Santanello et al., 2019). However, current PB models still have some flaws, such as uncertainties in model parameters and insufficient representation of land-surface processes (Brooks et al., 2015). On the other hand, deep learning (DL) models are well known for their ability to learn nonlinear mapping relationships and show a remarkable capability in short-term SM forecasting (Fang et al., 2017; Fang and Shen, 2020; Li et al., 2022a). For example, Fang et al. (2017) utilized the long short-term memory (LSTM) model to extrapolate Soil Moisture Active Passive (SMAP) SM data spatiotemporally, based on meteorological forcing, and concluded that LSTM outperformed traditional machine learning models (e.g., Random Forest). In addition, Li et al. (2021) utilized the convolutional LSTM (ConvLSTM) model, which can capture the information of both temporal and spatial dimensions, to forecast SM and showed that the ConvLSTM model outperformed the LSTM model in short-term SM forecasting.

      To take advantage of both PB and DL models, various studies have developed so-called hybrid methods to incorporate physical information into DL models. Hybrid models have many categories [see Slater et al. (2023) for a comprehensive study], and one popular and effective approach is to utilize the output of PB models to enhance DL models. There are two main methods for achieving this. (1) Averaging the outputs of the PB and DL models is the most commonly used hybrid method because of its simplicity. Fang et al. (2019) averaged the SM simulated by LSTM and the Noah land surface model (LSM) and confirmed that this model combination could effectively improve the long-term simulation of SM. However, simple averaging can transfer the system error of PB models into DL models, which is not the optimal way to exploit the value of both models. (2) Another straightforward way is to feed the output of PB models as features into DL models. For example, Daw et al. (2022) enhanced the ability of the DL model to predict lake temperature by using the output of the general lake model as input features for the neural network. Cui et al. (2022) proposed an LSTM-based encoder–decoder model that improved the accuracy of multi-step-ahead flood forecasting by adding the output of the Xinanjiang hydrological model as input features into the decoder. Similarly, Klocek et al. (2022) used the output of the HRRR (high-resolution rapid refresh) model as an exogenous input into the decoder of the ConvLSTM-based encoder–decoder model, and significantly improved the long-term predictability of precipitation.

      However, previous hybrid models have predominantly depended on DL models to automatically extract significant features from PB model outputs without fully capturing the essential spatiotemporal features in different regions and forecast time scales. The attention mechanism is typically used to extract important spatiotemporal patterns from various features for SM predictions (Li et al., 2022a). However, to the best of our knowledge, the attention mechanism has not been used to incorporate physical information into DL models for improving SM forecasts. By relying on the attention mechanism, it is possible to adaptively learn important spatiotemporal patterns from PB model outputs to aid DL models in making predictions at each forecast time scale.

      Moreover, while various hybrid methods have been widely used and demonstrated promising performance in hydrological forecasting (Slater et al., 2023), including flood (Speight et al., 2021), streamflow (Liu et al., 2021), and precipitation (Klocek et al., 2022), no attempts have been undertaken to develop a model that leverages the benefits of these hybrid methods specifically for SM predictions. Ensemble methods have been proven to combine the strengths of different DL models and enhance predictability regarding each DL model (Lee et al., 2022). Therefore, ensembling the forecasts of different hybrid models can unify their predictions in a single model and reduce the errors from every single model.

      In the present study, we aimed to effectively incorporate the outputs of PB models into DL models for accurate multi-step-ahead (1–16 days) SM predictions. To achieve this, we first developed an attention-based hybrid model to leverage the benefits of PB models at each forecast time scale. Then, we further developed an ensemble model that combined the advantages of different hybrid models to achieve exceptional multi-step-ahead SM predictions. More specifically, we (1) verified whether attention mechanisms can adaptively take advantage of PB models at different forecast time scales and regions to improve SM predictions spatiotemporally; (2) verified whether ensemble methods can utilize the advantages of different hybrid models to create an optimal model among all forecast models; and (3) thoroughly analyzed and discussed the pros and cons of different hybrid SM forecasting methods for multi-timescale predictions.

      The remainder of the paper is organized as follows. Section 2 describes the data used in the study and the preprocessing procedures. Section 3 describes the proposed hybrid models and existing practical hybrid models. The evaluation processes, based on conventional metrics, triple collocation, and extreme indices are also shown in this section. Section 4 presents the evaluation results with respect to in situ and gridded data, along with some further discussion of our findings. Lastly, section 5 summarizes our conclusions.

    2.   Data
    • The training target surface SM (0–5 cm) was obtained from the SMAP L4 dataset (Reichle et al., 2017), which is commonly used for SM simulation based on DL models (Karthikeyan and Mishra, 2021; Cai et al., 2022; Kannan et al., 2022; Li et al., 2022b). The SMAP satellite measures surface SM globally using a passive L-band radiometer (Entekhabi et al., 2010), and it has been the most promising satellite for SM monitoring due to its higher capacity for penetrating vegetation at the L-band compared to that at the C- or X-band (Wigneron et al., 2018). The SMAP L4 data are obtained by merging SMAP observations with the estimates from the Catchment LSM through the ensemble Kalman filter (Entekhabi et al., 2010), and provides surface SM at 3-hourly and 9 km resolution globally. The SMAP L4 data were averaged into a daily time scale.

      Meteorological forcing (i.e., precipitation, longwave radiation, specific humidity, surface pressure, downward shortwave radiation, surface temperature, and wind speed), and static physiographic attributes (i.e., land cover, elevation) were used as input features fed into DL models. Climate forcing was derived from the SMAP L4 dataset [note that the climate forcing in this dataset is not from the SMAP satellite but is generated from NASA’s GEOS (Goddard Earth Observing System) model]; the land-cover data were obtained from the United States Geological Survey (USGS); and the elevation data were from MERIT DEM (Yamazaki et al., 2017). The physiographic attributes were spatially mapped at 9 km resolution for consistency with the SMAP data. We used input features with a previous time of 1–7 days and predicted the SM for the next 1–16 days.

    • The PB model for the SM forecast was obtained from Global Forecasting System (GFS). The GFS forecast products are publicly distributed by NOAA (the National Oceanic and Atmospheric Administration). The core LSM in GFS is the Noah model, and the GFS surface SM (0–10cm) forecast is widely adopted by different national organizations (e.g., the USGS and the National Meteorological Centers in Asia). GFS provides up to 16-day forecasts at a 0.25° spatial resolution with forecast time steps of 3 h from 0 to 240 h, and 12 h from 240 to 384 h. GFS SM forecasts were temporally averaged into a daily time scale and bilinearly interpolated into a 9 km resolution.

    • In situ SM observations used for evaluation were obtained from the China Meteorological Administration (CMA). The observations contain 2966 stations over China at an hourly temporal resolution from 1 January 2010 to 31 December 2018. Unrealistic data (e.g., discontinuities and unchanged data) were first labeled (Dorigo et al., 2013), and then observational stations with more than 5% missing data were excluded from our analysis. These requirements led to a reduction in the number of sites from 2966 to 1164 for further analysis [Fig. S1 in the electronic supplementary material (ESM)]. The stations were densely distributed in southeastern China but sparsely distributed in western China. In total, the selected stations covered all five major climate zones (i.e., tropical, arid, temperate, cold, and polar).

      Figure 1.  The detail of the (a) ConvLSTM-ED model and (b) inner structure of ConvLSTM.

      Two gridded datasets used for triple collocation analysis (TCA) were obtained from ERA5-Land and SoMo.ml. We selected these two datasets because of their high quality and because the different types and sources of these two datasets led to error independence that could ensure the assumption of TCA (Stoffelen, 1998). The hourly, 9 km surface SM (0–7cm) obtained from ERA5-Land (Muñoz-Sabater et al., 2021) was averaged into a daily time scale for further analysis. Evaluation against in situ observations shows the value of ERA5-Land in the description of the hydrological cycle, particularly with SM (Beck et al., 2021; Muñoz-Sabater et al., 2021). SoMo.ml data, on the other hand, were derived through a DL model based on the training dataset from in situ observations (O and Orth, 2021) and provided daily surface SM (0–10 cm) at a 0.25° spatial resolution from 2000 to 2019. SoMo.ml data show excellent performance, particularly in terms of temporal dynamics, compared with existing SM data (e.g., ERA5).

    3.   Methods
    • The ConvLSTM-based encoder–decoder model (ConvLSTM-ED) was used to provide the next 1–16 days of SM predictions, and the hybrid models were then built to incorporate the SM forecasts of the GFS model into the ConvLSTM-ED model.

    • The encoder–decoder model consists of two neural networks designed for multi-step-ahead predictions. During the encoding process, critical information from the input features is compressed into an encoding vector. The decoding process then converts the encoding vector into the target forecasts. The encoder–decoder model can consider the temporal association among sequences (Cui et al., 2022), making it suitable and extensively used for multi-step-ahead SM forecasting (El Saadani et al., 2021; Li et al., 2021).

      We used ConvLSTM in a recursive encoder–decoder structure (Cho et al., 2014; Cui et al., 2022) to predict SM in the next 1–16 days (Fig. 1a). In the decoder, the input of each time step is the output from the previous time step. Through the recursive process, the decoder can receive not only the important information extracted from the encoder but also the changing information from the previous time step. Both the encoder and decoder adopted ConvLSTM, and the principle of the ConvLSTM cell (Fig. 1b) is shown as follows:

      where $ t $ represents the $ t $th step of the input timesteps; $ {\mathbf{X}}_{t} $ denotes the input features; $ {\mathbf{C}}_{t} $ is the cell state and $ {\mathbf{H}}_{t} $ is the hidden state; $ * $ denotes the convolution operator; and $ \cdot $ denotes the dot product. Notably, compared to the LSTM model, the ConvLSTM model adds the convolution operator into the state-to-state and input-to-state transitions to further encode spatial information. $ \sigma $ is the $ \mathrm{s}\mathrm{i}\mathrm{g}\mathrm{m}\mathrm{o}\mathrm{i}\mathrm{d} $ function, and $ \mathrm{t}\mathrm{a}\mathrm{n}\mathrm{h} $ represents the hyperbolic tangent function. $ {\mathbf{W}}_{\mathrm{x}\sim} $, $ {\mathbf{W}}_{\mathrm{c}\sim} $ and $ {\mathbf{W}}_{\mathrm{h}\sim} $ are 2-dimensional convolution kernels, and $ {\mathbf{b}}_{\sim} $ is bias.

    • Two widely used hybrid models were adopted for benchmarking our proposed hybrid models. Firstly, SM was forecasted by simply averaging the SM forecasts for the next 1–16 days of both GFS and ConvLSTM-ED [namely, the average model, from Fang et al. (2019)]. Secondly, the GFS SM forecasts were fed into the decoder of ConvLSTM-ED to make the next 1–16 days of forecasts [namely, the condition model, from Cui et al. (2022) and Klocek et al. (2022)] (Fig. 2a).

      Figure 2.  The model structures of (a) the condition model, (b) the attention model, and (c) the attention block. The dimension of each intermediate feature of the attention block is annotated. H, W and F are the sizes of the height, width and features dimensions, respectively. The abbreviations of intermediate features are the same as given in the main text. The colors in (c) indicate weights.

      Based on the condition model, we developed the attention model (Fig. 2b). We used an attention block (Fig. 2c) to adaptively extract the important patterns from GFS SM forecasts at different forecast time scales. The extracted patterns were then used as the input for the decoder to create SM forecasts. The attention block was calculated through the following four steps. (1) For each forecast timestep t, the input tensors $ {\mathbf{U}}_{t} $ were first averaged over their spatial dimensions and reduced to a vector of features $ {\mathrm{z}}_{t} $ ($ {\mathbf{F}}_{\mathrm{s}\mathrm{q}} $). (2) A fully connected layer was applied to obtain the vector of learned weights $ {\mathrm{s}}_{\mathrm{t}} $ of the compressed tensor $ {\mathrm{z}}_{\mathrm{t}} $, representing the reassigned importance of the original input features ($ {\mathbf{F}}_{\mathrm{e}\mathrm{x}} $). (3) The learned weights were multiplied by the input features to obtain a final importance-reassigned tensor $ {\tilde{\mathbf{U}}}_{t} $ ($ {\mathbf{F}}_{\mathrm{s}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{e}} $). (4) The final step is the output process, in which we send the input tensors $ {\mathbf{U}}_{t} $ and importance-reassigned tensor $ {\tilde{\mathbf{U}}}_{t} $ into a fully connected layer to obtain the final output ($ {{\mathbf{U}}^{\mathbf{*}}}_{t} $) of the attention block. For the ensemble model, we ensemble-averaged the forecasts of the average, condition and attention models to utilize the advantages of different hybrid methods. Notably, the ensemble model was constructed by simple averaging rather than using weighted ensemble methods (e.g., Bayesian model average). The aim of our study is to emphasize the benefits of integrating different hybrid methods (even using the simplest ensemble methods), rather than exploring the advantages of using more advanced ensemble methods.

    • The domain of this study was restricted to China (14.7°–53.5°N, 72.3°–135°E). We divided this region into 24 patches, and each patch contained 112 × 112 pixels. A local DL model was trained for each patch, and the training period was from 31 May 2015 to 31 December 2017 (80% and 20% of the dataset were used for training and validation, respectively), and the testing period was from 1 January 2018 to 31 December 2018. Before the training process of models, we normalized all input features to speed up the convergence. The backward propagation was based on the Adam gradient descent method (Kingma and Ba, 2017). The loss function was set as the mean squared error (MSE) between the observations and predictions.

      Two types of hyperparameters were tuned, including the parameters that determine the structure of DL models (i.e., the number of hidden cells), and the parameters that optimize DL models during the training process (i.e., batch size, epochs, learning rate) (the detail is shown in Text S1 in the ESM). We tuned the hidden sizes of [8, 16, 32, 64] and the batch sizes of [16, 32, 64], and selected the “best” model structure with the minimal MSE on the validation dataset. After tuning, the number of hidden cells of the ConvLSTM models was set to 16, and the batch size was set to 32. The learning rate was set to 0.001 and was adaptively updated by multiplying the reducing factor (set to 0.1) when validation loss ceased improving. DL models were optimized for 50 epochs by selecting the “best” model with the lowest validation loss. Our study indicated that 50 epochs are sufficient for finding the optimal model as the validation loss stopped decreasing.

      The DL models were trained on a single NVIDIA A100 GPU, and the training processes of each hybrid model cost about 120 h (for nearly 3 years of training data). While training requires a one-time effort, it significantly reduces the runtime during inference. For real-time applications, the model can generate 16-day SM forecasts over China in just a few minutes using one A100 GPU.

    • Two performance assessment criteria were used for in situ evaluation: the Pearson correlation coefficient (R) and unbiased root MSE (ubRMSE) (Text S2 in the ESM). R measures the correspondence between forecast and in situ time series variables in terms of temporal variability. ubRMSE is a scaled error criterion for describing the differences between observed and predicted time series. We evaluated the performance using the SM forecast time series of the nearest grid of each in situ station.

      TCA (Gruber et al., 2016) was further used to evaluate the error of the forecast SM toward an unknown truth using collocated datasets (the detail is shown in Text S3 in the ESM). TCA can directly estimate the sensitivity of data using covariance and variance. Here, we used the TCA-based signal-to-noise ratio (SNR) to represent the error of SM forecast models. The TCA-based SNR is calculated using the following equation (Kim et al., 2020):

      where $ {\mathrm{\sigma }}_{\mathrm{\Phi }}^{2} $ is the variance of the unknown truth of SM, and $ {\mathrm{\sigma }}_{{{\varepsilon }}_{\mathrm{i}}}^{2} $ is the variance of the random error of dataset i. The TCA-based SNR is symmetric around zero, and the zero value indicates that the SM signal ($ {{\beta }_{\mathrm{i}}}^{2}{\sigma }_{\mathrm{\Phi }}^{2} $) of forecast data is identical to the noise ($ {\sigma }_{{{\varepsilon }}_{\mathrm{i}}}^{2} $).

      The soil wetness deficit index (SWDI) is used to estimate the severity of drought (Text S4). The SWDI is commonly used in agricultural drought assessment (Martínez-Fernández et al., 2015; Yang et al., 2017; Li et al., 2022b) and is calculated as

      where $ \theta $ is the forecasted SM, and $ {\theta }_{\mathrm{F}\mathrm{C}} $ and $ {\theta }_{\mathrm{W}\mathrm{P}} $ are the field capacity and wilting point, respectively. In our study, the 5th and 95th quantile values of the historical SM time series were used to represent the wilting point and field capacity. We defined SWDI > 0 as no drought, −5 < SWDI < 0 as moderate drought, and SWDI < −5 as extreme drought, based on Zhang et al. (2021). The probability of detection (POD) metric (Ford and Quiring, 2019) was applied to answer how well the model correctly identifies drought events. POD is calculated as follows:

      where “hits” represents the number of observed drought events that the SM forecast models correctly depicted (i.e., we observed extreme drought based on in situ observations and also predicted it in the forecast models) and “misses” represents the number of observed drought events that SM forecast models do not correctly depict (i.e., we observed extreme drought based on in situ observations but did not predict it, or predicted only moderate drought, in the forecast models). POD ranges from 0 to 1, with 1 representing a perfect score.

    4.   Results and discussion
    • We first evaluated GFS, ConvLSTM-ED, and different hybrid models against in situ observations (Figs. 3, 4). The ConvLSTM-ED model performed reasonably well in short-term predictions, with comparable predictability to SMAP L4 data (i.e., training target SM). However, its performance degraded dramatically as the forecast time scale increased (Fig. 3), particularly in southeastern China (Figs. 4gi). The GFS model showed inferior performance compared to the ConvLSTM-ED model during all forecast time scales, especially for short-term forecasting (Fig. 3). Although the ConvLSTM-ED model outperformed the GFS model for different forecast time scales, it showed degraded performance in wet regions (e.g., southeastern China) compared to the GFS model (Figs. 4df, Figs. 4gi). Notably, we cannot conclusively determine which model (GFS or ConvLSTM-ED) was superior based on the aforementioned results. Moreover, this determination was beyond the scope of our paper (the reasons are shown in Text S5).

      Figure 3.  The mean (a) R and (b) ubRMSE of different forecast models at different forecast time scales. Dash lines denote the performance of SMAP L4 data evaluated by in-situ observations. The abbreviations of model names are the same as in section 3.

      Figure 4.  The spatial distribution of performance (R) in 1-, 7- and 16-day forecasts of different models. We used the average model as the baseline hybrid model to evaluate the performances of the different hybrid models. Panels (a–c) show the performance of the average model, while the remaining rows show the differences between the R of the target model and the R of the average model. Red points indicate that the model improved the performance compared to the average model, while blue points show a declined performance.

      The average model dramatically improved the ConvLSTM-ED model at nearly all forecast time scales, demonstrating the benefit of adding physical information into DL models. However, the performance of the average model still dropped dramatically for long-term forecasting (Fig. 3), particularly in southeastern China (Figs. 4ac), consistent with the property of the ConvLSTM-ED model (i.e., decreasing performance in southeastern China as the forecast time scale increased). Moreover, the average model performed poorly in northern China, caused by introducing the bias of the GFS model in this region. These results indicate that the simple averaging method could not fully exploit the benefits of both the PB and DL models.

      The condition model greatly improved the long-term predictability compared to the average model over nearly all stations (Fig. 3, Figs. 4jl), demonstrating that adding GFS SM forecasts as the exogenous inputs into the decoder of the ConvLSTM-ED model could significantly improve long-term predictions. However, the performance of the condition model decreased significantly in short-term predictions compared to the average model (Fig. 3). In short-term predictions, we observed that the underperforming regions in the condition model (blue dots in Fig. S3b) were consistent with those in the GFS model (red dots in Fig. S3a in the ESM) when compared to the ConvLSTM-ED model. This result highlighted a problem with the condition model, i.e., it introduced the biases from the short-term predictions provided by the GFS model into the DL models. Notably, although the condition model could propagate short-term forecast errors, incorporating the PB SM evolution (i.e., sharpened predictions from GFS) in the decoder could still improve the performance of the ConvLSTM-ED model significantly. This emphasizes the significance of integrating physically consistent predictions into pure DL models.

      The attention model further improved the short-term predictability and had equal long-term predictability when compared to the condition model (Fig. 3). Spatially, the attention model retained the high predictability of the ConvLSTM-ED model in northern China and effectively overcame its deficiencies in southeastern China (Figs. 5gi). Moreover, the attention model outperformed the condition model in most regions (Figs. 5jl). In particular, it showed significant improvement in short-term predictions where the condition model introduced the bias of the GFS model (Fig. 5). These findings indicate that the attention mechanism can adaptively learn to exploit the benefits of the ConvLSTM-ED and GFS models for different forecast time scales and soil conditions, thereby significantly improving the model performance spatiotemporally.

      Figure 5.  The R of the (a–c) GFS, (d–f) ConvLSTM-ED, and (g–i) attention models, and (j–l) the improvement of the attention model compared to the condition model.

      The ensemble model outperformed all other hybrid models (in terms of both R and ubRMSE) at all different time scales (Fig. 3), especially in long-term predictions; for example, the ensemble model improved the mean R by 65% (from 0.205 to 0.340) compared to the ConvLSTM-ED model for the 16-day predictions. Moreover, the ensemble model was able to further reduce the bias of the predictions in southeastern China (provided by the ConvLSTM-ED model) compared to the attention model (Figs. 4pr), and the ensemble model outperformed the ConvLSTM-ED model over 79.5% of the in situ stations. These results underline the value of ensemble methods and emphasize the exceptional spatiotemporal predictability of the ensemble model.

    • We further evaluated the predictability using gridded datasets by TCA, which could evaluate the performance toward an unknown truth. The spatial distribution of SNR is shown in Fig. 6. The results were similar to those of the in situ evaluation, and also demonstrated the superior performance of the proposed hybrid models (attention and ensemble models). For example, the condition model enhanced the long-term predictability but decreased the short-term predictability (Figs. 6jl to Figs. 6pr). In addition, the attention model was able to further correct the bias of the condition model in short-term predictions (Figs. 6jl to Figs. 6mo), whereas the ensemble model achieved the best performance among all hybrid models (Fig. 6). We further used another collocated dataset (i.e., SMOS data, in situ data from the CMA) and evaluated the predictability by TCA. The result was also consistent with that of the in situ evaluation (Text S6, Fig. S4 in the ESM).

      Figure 6.  TCA-based SNR of different models. The triplets of the TCA are [*, ERA5-Land, SoMo.ml], where * denotes the forecast models. Panels (a–c) show the average model. The remaining rows show the difference between the SNR of the target model and the average model.

      Figure 7 shows the SNR of SMAP L4 data (i.e., the training target of the DL models) and the ConvLSTM-ED and ensemble models. The ConvLSTM-ED model showed inferior performance compared with SMAP L4 data in most regions. Additionally, the regions where the ConvLSTM-ED model underperformed were consistent with those of SMAP L4 data. It was found that the performance of the ConvLSTM-ED model depended entirely upon the quality of the SMAP L4 data and was limited by the SMAP L4 data as a performance ceiling. Notably, the ensemble model outperformed the SMAP L4 data in most regions for short-term predictions (Fig. 7), particularly in drought-prone areas (e.g., the North China Plain), suggesting that the ensemble of different hybrid models could “break” the performance ceiling constrained by the training data in some areas. This is attributable to the introduction of physical information into the pure DL models. The in situ validation of the SMAP L4 data and the ensemble model further confirmed this result (Fig. S5 in the ESM). However, the long-term predictability of the ensemble model was still far inferior to the SMAP L4 data. Moreover, all forecast models still did not show satisfying performance (i.e., signal larger than noise, SNR > 0) in more than half of the regions (Fig. S6c in the ESM) in long-term predictions, indicating the challenge of long-term SM forecasting, which necessitates further investigation.

      Figure 7.  TCA-based SNR of (a-c) ConvLSTM-ED, (d–f) ensemble model predictions and (h) the SMAP L4 datasets. The triplet of TCA is [*, ERA5-Land, SoMo.ml], where * denotes the forecast models and SMAP L4.

    • We further evaluated the drought predictability of the different forecast models. Figure 8 illustrates the kernel density curves of the SWDI of the in situ observations and different models. Surprisingly, the SWDI of the in situ observations contained two peaks located at the SWDI values of nearly −10 and −2. We further used more stringent quality-control processes for the in situ observations (Dorigo et al., 2013) and found the same two peak structures (Fig. S7 in the ESM). It is noted that these two peaks may be a unique property of the in situ observation datasets used in our study. We found that the GFS and average models tended to forecast the right peak of the in situ observations (i.e., they gave relatively stable predictions and were unaware of some extreme events such as SWDI < −10). On the contrary, the attention model tended to simulate the left peak (i.e., extreme drought events) better than the other hybrid models, showing the effectiveness of the attention mechanism for extreme drought forecasting. Furthermore, although the ensemble model provided the best general performance, it tended to forecast the mean SWDI of observations. This result emphasizes that ensemble methods could provide a more stable prediction by correcting the bias of each model but may also “remove” some extreme events, which may not be suitable for drought forecasting.

      Figure 8.  The kernel density curve of the SWDI of the in situ observations from different forecast models (lines with different colors) at the (a) week-1 and (b) week-2 forecast.

      We further evaluated the fractions of observed drought events that were correctly depicted by the forecast models. Table 1 summarizes the POD values of the different models over different Köppen–Geiger major climate zones. Generally, the attention model was able to accurately detect 60.6% and 56.8% of drought events at 1- and 2-week forecasts and achieved the best detection over arid, temperate, cold, and polar regions. Moreover, the ensemble-average operation always yielded an average prediction of drought events among ensemble members (see average and ensemble models), reinforcing the prior results. It is noted that the GFS model excelled in temperate regions but performed the worst over arid, cold, and polar regions among all the forecast models, indicating a poor representation of SM dynamics over these regions.

      ModelTropical (n=16)Arid (n=91)Temperate (n=642)Cold (n=350)Polar (n=30)
      Week 1Week 2Week 1Week 2Week 1Week 2Week 1Week 2Week 1Week 2
      GFS0.5780.4930.511#0.477#0.665*0.5820.506#0.469#0.396#0.370#
      ConvLSTM0.7200.6610.573*0.5210.605#0.5600.5750.5320.6560.637
      average0.5210.4790.5360.4920.6430.5920.5420.5020.5290.502
      condition0.744*0.693*0.5430.5190.605#0.532#0.5820.5450.6400.578
      attention0.6550.6300.5700.536*0.6290.598*0.599*0.550*0.696*0.644*
      ensemble0.506#0.474#0.5510.5310.6130.5640.5710.5380.6220.577
      *Best model to detect drought events over the target climate region.
      #Worst model to detect drought events over the target climate region.

      Table 1.  The probability of an accurate drought event detection by different models over different climate regions based on in situ SM observations. The abbreviations of the model names are the same as in Fig. 1. The week 1 and week 2 columns represent the ability to forecast the 1-week and 2-week drought. n denotes the number of stations located over target climate regions.

    • It was found in this study that embedding physical information in the DL models through useful hybrid methods dramatically improved the SM predictability compared to using pure DL models, and this could be attributed to several possible reasons. Firstly, it is well known that pure DL models may produce unrealistic predictions because of a lack of physical consistency (e.g., mass and energy balance). For example, Fang et al. (2019) found that pure DL models provided highly fluctuating simulations non-physically. Thus, physical information provided by PB models that obeys physical laws can be used to correct the non-physical predictions of pure DL models. Secondly, pure DL models can benefit from the assimilation of high-quality observations in PB models (Fang and Shen, 2020); for example, pure DL models cannot predict the corresponding SM variation if a rainfall event is missing in the forcing data. However, data assimilation can remedy the forcing errors with high-quality observations, resulting in better temporal representation of SM dynamics. One benefit of using the GFS (including data assimilation) forecasts in our study was to help the pure DL models in correcting the bias induced by the forcing errors. Thirdly, Daw et al. (2022) pointed out that pure DL models rely heavily on the training data quality and can only depict the evolution of existing SM (Klocek et al., 2022). This may lead to significant biases over regions with poor-quality data (e.g., wet regions in SMAP L4 data). On the contrary, PB models can depict the dynamics of SM over different soil conditions (e.g., precipitation infiltrates more easily in regions with high soil porosity), and can provide stable and realistic simulations with high-quality rainfall forcing [e.g., wet regions; see Maggioni et al. (2012)]. In addition, the GFS model can simulate SM in different water states (e.g., solid, liquid) through soil dynamics, which pure DL models struggle to do accurately because of the poor quality of the training datasets during the freezing period. Thus, incorporating physical information into pure DL models might help to overcome the deficiencies derived from data (Daw et al., 2022).

      Although we introduced PB features to improve the model, the proposed hybrid models still inherited the uncertainties from the supervised DL models, i.e., the uncertainty from the training data. In addition, another source of uncertainty came from the selection of hybrid schemes, as demonstrated in section 4. Furthermore, the quality of the PB models also contributed to the uncertainty. Parameterizations and inadequate representation of land processes can introduce uncertainties in hybrid models. However, when compared with the PB models, the hybrid models benefited from the fitting ability of the DL algorithm and the vast amount of data, which could partially correct systematic errors. Moreover, the introduction of PB features also alleviated the limitation of the training data when compared to the pure DL models. These findings suggest that hybrid models are a promising way of enhancing the prediction skill for meteorological and hydroclimatic variables (Slater et al., 2023).

      The potential applications of SM forecasting models have been comprehensively discussed in Peng et al. (2021), and we highlighted two important application directions. Firstly, the proposed model could provide accurate initializations of land-surface conditions for numerical weather prediction (NWP) systems. Indeed, the integration of SM into several NWP models has been found to improve forecasts of atmospheric variables (Dharssi et al., 2011; Muñoz-Sabater et al., 2019; De Rosnay et al., 2020). Secondly, accurate predictions of SM could be utilized for monitoring, analyzing and providing early warnings of hydrometeorological disasters, including agricultural drought (Mishra et al., 2017) and floods (Li et al., 2018). Additionally, these predictions could inform decision-making processes, such as in watershed management (Heimhuber et al., 2017) and irrigation water management (Lawston et al., 2017).

      In our study, we aimed to investigate the benefits of incorporating physical information into DL models, but exploring the interpretability of the proposed models is beyond the scope of the present paper. However, these complex hybrid models may have low interpretability and should be used with caution in practical applications. Explainable artificial intelligence (XAI) provides tools to aid in decision-making processes when applying DL models in real-world applications. Several studies have explored the interpretability of DL SM forecasting models using XAI tools. For example, Huang et al. (2023) adopted various post-hoc interpretation methods to assess the feature effects on SM predictions and showed that a comprehensive understanding of the relationship between input features and predicted SM could be achieved. Different interpretation methods used in their study, such as “shapely values” and “partial dependence plots”, could be used to investigate the contributions of different features (e.g., GFS forecasted values) to our proposed models, which deserves further exploration.

      We end our discussion by pointing out some limitations of our study. Firstly, we did not provide the “best” hybrid schemes to achieve the “best” forecast (i.e., general performance and drought predictability) at different forecast time scales and spatial regions. For example, the ensemble model achieved the best general performance at all forecast time scales (section 4.1), but the ensemble method may “remove” some extreme drought events (section 4.3). Therefore, we highlight that the choice of different hybrid methods might depend on the different applications, such as the ensemble model is suited for long-term, stable predictions, which mainly focus on the average state of SM, while the attention model is suited for forecasting extreme drought events. Secondly, we integrated GFS with the ConvLSTM-ED models because of its efficiency and widespread use (Fan and van den Dool, 2011; Yin et al., 2019). However, the GFS and ConvLSTM-ED models were both not the “best” PB and DL models for SM prediction. Thus, the result of our study may not fully represent the properties of PB and DL models. Nonetheless, we showed improvements in the different hybrid methods based on these two widely used models. Thirdly, hybrid models have different framework, e.g., physically guided DL (Willard et al., 2022a), or differentiable programming (Feng et al., 2022). In this study, we only focused on using the PB model outputs and observational features in a hybrid modeling setup to generate strong-performing SM predictions. We did not introduce any physical laws and principles to guide the DL models. Several “deep” hybrid frameworks have been developed (Read et al., 2019; Liu et al., 2022), which can “force” DL models to forecast based on physical consistency, thereby possibly providing more realistic and stable predictions (Willard et al., 2022a). Moreover, pre-training DL models using PB model outputs and fine-tuning them in the target data (i.e., transfer learning) may also utilize the physical information.

    5.   Conclusion
    • In this paper, we first propose an attention hybrid model based on condition hybrid schemes and an attention mechanism to utilize the advantages of both PB and DL models over different forecast timescales and regions. An ensemble model is then further proposed by averaging the outputs of two existing practical hybrid models (average and condition) and the proposed attention model. To the best of our knowledge, this is the first study taking both the attention mechanism and ensemble methods to integrate PB and DL models for SM forecasting. We thoroughly assessed the predictability of the two proposed hybrid models (i.e., attention and ensemble) and two existing hybrid models (i.e., average, and condition) based on in situ and gridded data evaluation. Generally, the proposed hybrid models outperformed the two existing hybrid models, and could greatly improve the long-term predictability, and predictability of drought events, compared to pure DL models. The main conclusions were as follows:

      (1) The proposed ensemble hybrid model achieved the best general performance among all hybrid models under different soil conditions over all forecast timescales (from 1 to 16 days), especially for long-term forecasting. Notably, the ensemble hybrid model improved 65% of the mean values of R and 6% of the mean values of ubRMSE for the 16-day forecast compared to the ConvLSTM-ED model, and the ensemble hybrid model outperformed the ConvLSTM-ED model over 79.5% of the validation stations.

      (2) The proposed attention hybrid model achieved the best drought predictability among all hybrid models. This model could accurately detect 60.6% and 56.8% of drought events for 1- and 2-week forecasts, respectively, and had generally the best drought detection ability over arid, temperate, cold, and polar regions. The attention hybrid model was able to detect an additional 2.4% and 3% of drought events compared with the ConvLSTM-ED model for 1- and 2-week forecasts, respectively.

      (3) Different hybrid schemes had their pros and cons, and our proposed model solved the problems encountered by existing hybrid methods to some extent. For example, the average model is simple and effective, but the performance significantly degraded as the forecast time scale increased. The condition model could significantly improve the long-term predictability of SM but involves the bias of the GFS model in short-term predictions. The proposed attention model solves the problem of the condition model and is suitable for forecasting extreme drought events. The proposed ensemble method performed best among all hybrid models, and is suitable for long-term, stable predictions, which mainly focus on the average state of SM.

      Finally, we provide some future ideas for improving SM forecasting by relying on hybrid models that integrate PB and DL models together. Firstly, DL and PB models should be further improved for SM forecasting separately. For example, DL models should be trained with multi-source data based on multimodal learning to avoid reliance on single datasets. Secondly, it is best to use some physical laws to guide the design of the structure, the parameter initializations, and the loss functions of DL models, which could help provide physically consistent predictions. Thirdly, the attention mechanism shows an excellent ability to utilize the benefits of different input features. The attention mechanism used in this study only focuses on the channel features. Therefore, axial attention modules should be developed to adaptively extract essential features of PB and DL models over different axials, which may further enhance the ability to identify valuable information.

      Acknowledgements. Lu LI was supported by the Natural Science Foundation of China (Grant Nos. 42088101 and 42205149); Zhongwang WEI was supported by the Natural Science Foundation of China (Grant No. 42075158); Wei SHANGGUAN was supported by the Natural Science Foundation of China (Grant No. 41975122); and Yonggen ZHANG was supported by the National Natural Science Foundation of Tianjin (Grant No. 20JCQNJC01660). All data, source codes and example codes are available at https://github.com/leelew/HybridHydro.

      Electronic supplementary material: Supplementary material is available in the online version of this article at https://doi.org/10.1007/s00376-023-3181-8.

Reference

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return