Advanced Search
Article Contents

Predictor Selection for CNN-based Statistical Downscaling of Monthly Precipitation


doi: 10.1007/s00376-022-2119-x

  • Convolutional neural networks (CNNs) have been widely studied and found to obtain favorable results in statistical downscaling to derive high-resolution climate variables from large-scale coarse general circulation models (GCMs). However, there is a lack of research exploring the predictor selection for CNN modeling. This paper presents an effective and efficient greedy elimination algorithm to address this problem. The algorithm has three main steps: predictor importance attribution, predictor removal, and CNN retraining, which are performed sequentially and iteratively. The importance of individual predictors is measured by a gradient-based importance metric computed by a CNN backpropagation technique, which was initially proposed for CNN interpretation. The algorithm is tested on the CNN-based statistical downscaling of monthly precipitation with 20 candidate predictors and compared with a correlation analysis-based approach. Linear models are implemented as benchmarks. The experiments illustrate that the predictor selection solution can reduce the number of input predictors by more than half, improve the accuracy of both linear and CNN models, and outperform the correlation analysis method. Although the RMSE (root-mean-square error) is reduced by only 0.8%, only 9 out of 20 predictors are used to build the CNN, and the FLOPs (Floating Point Operations) decrease by 20.4%. The results imply that the algorithm can find subset predictors that correlate more to the monthly precipitation of the target area and seasons in a nonlinear way. It is worth mentioning that the algorithm is compatible with other CNN models with stacked variables as input and has the potential for nonlinear correlation predictor selection.
    摘要: 统计降尺度是从低分辨率的全球气候模式中推导高分辨率气候变量的技术。卷积神经网络(CNN)已在统计降尺度任务中被广泛研究,并取得了良好的效果。然而,目前仍缺乏探索用于CNN建模的预测因子选择研究。本文提出了一种有效且高效的贪婪消除算法来解决该问题。算法主要有三个按顺序迭代执行的步骤组成:预测因子重要性度量、预测因子消除和CNN再训练,其中单个预测因子的重要性度量是通过用于CNN模型解释的反向传播方法计算的。所提出的算法在含有20个候选因子的月降水统计降尺度任务上进行了测试。实验表明,本文的预测因子选择方案可以将输入预测因子的数量减少一半以上,提高线性和CNN模型的精度,并优于基于相关性分析的预测因子选择方法。在仅使用20个候选因子中的9个来构建的CNN模型,虽然其均方根误差只减少了0.8%,但是其浮点数运算量降低了20.4%。结果说明该算法可以以一种非线性的方式找到与目标区域或不同季节的月降水高相关的子集预测因子。特别地,该算法与其它以叠加变量为输入的CNN模型兼容,并具有非线性相关预测因子选择的潜力。
  • 加载中
  • Figure 1.  The downscaling region and its average daily total precipitation (mm d−1) from 1981 to 2010. The black dots represent gridded predictand with a resolution of $ 0.5^\circ \times 0.5^\circ $ within the region, and gridded predictors with a resolution of $ 2.5^\circ \times 2.5^\circ $ are plotted in red dots.

    Figure 2.  The CNN10 architecture for statistical downscaling in South China, where the number is the size of the tensor or vector, and FC denotes fully-connected

    Figure 3.  The mean scores of CNN (convolutional neural network) and LR (linear regression) models throughout the predictor elimination procedures. The x-axis is the number of predictors. For each type of score, the CNNs of better predictions are highlighted with different markers, among which the BEST and LEAST are specialized.

    Figure 4.  Box plots of evaluation scores of reference, BEST, and LEAST CNN models. A six-number summary of the scores is displayed. Box and whiskers cover the 25−75th and 5−95th percentile ranges, respectively. Median and mean are plotted with an orange line and a green triangle, respectively. The mean value is shown on the top of the boxes.

    Figure 5.  The geographic distributions of ATCC of the reference CNN model (1st column) and ATCC bias between the BEST (LEAST) and reference CNN models [2nd (3rd) column].

    Figure 6.  Same as Fig. 3. except that this is for reverse predictor elimination procedures.

    Figure 7.  Same as Fig. 3. except that this is for predictor elimination procedures based on the correlation analysis.

    Figure 8.  (Left): Bar plot of the importance metrics (coefficient coefficients) of predictors calculated using the correlation analysis method. The predictors of the same variable are rendered in the same color. The indices of predictors in the elimination sequences under the correlation-analysis-based method are labeled on the right of bars. (Right): The squared root of contribution metrics of all predictors in the CNN models of different numbers of input predictors (x-axis) throughout the selection procedures

    Figure 9.  Heatmaps of three predictors' scaled correlation coefficients (left) and gradients (right). The selected grid of interest is highlighted with cyan dots. Red and black dots are grids of predictors and predictand, respectively.

    Figure 10.  Comparisons of ATCC scores in warm and cold seasons.

    Table 1.  Greedy predictor elimination with predictor contribution calculation

    Algorithm 1 Calculation of predictor contributions
    1: procedure PREDICTORCONTRIBUTION ($ F $, $ X $) $\triangleright F$ and $ X $ are fitted model and validation set
    2: $ N \gets \text{Length}(X) $ $\triangleright $ Number of samples
    3: $ A \gets (0,0,\dots,0) $ $\triangleright $ of length $ C $
    4: for $ n=1,\dots,N $ do
    5: $ x_0 \gets X[n] $ $\triangleright $ The $ n $-th sample
    6: $ y = F(x_0) $ $\triangleright $ Forward pass of CNN
    7: $ \omega \gets \left(\left.\frac{\partial y_1}{\partial x} \right|_{x=x_0}, \left.\frac{\partial y_2}{\partial x} \right|_{x=x_0}, \cdots, \left.\frac{\partial y_D}{\partial x} \right|_{x=x_0} \right) $ $\triangleright $ Compute gradients with guided-backpropagation
    8: $ A^\prime \gets \left(\displaystyle_{d=1}^D \displaystyle_{p=1}^{P} \displaystyle_{q=1}^{Q} \left|(\omega_{d})_{1,p,q}\right|, \cdots, \displaystyle_{d=1}^D \displaystyle_{p=1}^{P} \displaystyle_{q=1}^{Q} \left|(\omega_{d})_{C,p,q}\right| \right) $
    9: $ A \gets A + A^\prime $ $\triangleright $ Accumulate the contribution metric
    10: end for
    11: return $ A/N $ $\triangleright $ Average and return
    12: end procedure
    DownLoad: CSV
    Algorithm 2 Greedy predictor elimination algorithm
    1: Initialization: $S=\{1,2,\dots,C\},\; S^\prime=\varnothing$ $\triangleright $Sets of indices to candidate and eliminated predictors
    2: $ S^* \gets S \setminus S^\prime $ $\triangleright $Set of indices to remaining predictors
    3: while $ |S^*| \geq 1 $ do $\triangleright $$ |S^*| $ is the cardinality of set $ S^* $
    4: $ A \gets (0,0,\dots,0) $ $\triangleright $of length $ C $
    5: for $ k=1, 2, \dots, 6 $ do
    6: $ X^{k} \gets $ Validation set in fold $ k $
    7: $ F^{k} \gets $ Fitted model trained using predictors in $ S^* $ $\triangleright $Multiple-run
    8: $ A^\prime \gets \text{PREDICTORCONTRIBUTION}(F^{k},\; X^{k}) $ $\triangleright $ Multiple-run and average
    9: $ A \gets A + A^\prime $
    10: end for11: $ A \gets A / 6 $
    12: $ i \gets $ Index of predictor whose contribution is $ \min(A) $
    13: $ S^\prime \gets S^\prime \cup \{i\} $
    14: $ S^* \gets S \setminus S^\prime $
    15: end while
    DownLoad: CSV

    Table 2.  Comparisons between the models constructed using nine (2nd row) and seven (3rd row) predictors to the reference model (1st row). The data in parentheses are differential percentages of the corresponding model compared to the reference one.

    Predictors RMSEs CCs ATCCs Parameters FLOPs
    20 1.793 0.633 0.577 98,102 1,163,677
    9 1.779(−0.8%) 0.641(+1.3%) 0.592(+1.7%) 93,152(−5.0%) 926,077(−20.4%)
    7 1.790(−0.2%) 0.637(+0.6%) 0.583(+0.7%) 92,257(−6.0%) 882,877(−24.1%)
    DownLoad: CSV
  • Ancona, M., E. Ceolini, C. Öztireli, and M. Gross, 2018: Towards better understanding of gradient-based attribution methods for Deep Neural Networks. Proc. 6th International Conf. on Learning Representations, Vancouver, ICLR.
    Bach, S., A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, 2015: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One, 10(7), e0130140, https://doi.org/10.1371/journal.pone.0130140.
    Baño-Medina, J., R. Manzanas, and J. M. Gutiérrez, 2020: Configuration and intercomparison of deep learning neural models for statistical downscaling. Geoscientific Model Development, 13(4), 2109−2124, https://doi.org/10.5194/gmd-13-2109-2020.
    Battiti, R., 1994: Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4), 537−550, https://doi.org/10.1109/72.298224.
    Bukovsky, M. S., and D. J. Karoly, 2011: A regional modeling study of climate change impacts on warm-season precipitation in the central United States. J. Climate, 24(7), 1985−2002, https://doi.org/10.1175/2010JCLI3447.1.
    Chandrashekar, G., and F. Sahin, 2014: A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16−28, https://doi.org/10.1016/j.compeleceng.2013.11.024.
    Chen, H., C.-Y. Xu, and S. L. Guo, 2012: Comparison and evaluation of multiple GCMs, statistical downscaling and hydrological models in the study of climate change impacts on runoff. J. Hydrol., 434—435, 36−45,
    Chen, J., F. P. Brissette, and R. Leconte, 2011: Uncertainty of downscaling method in quantifying the impact of climate change on hydrology. J. Hydrol., 401(3−4), 190−202, https://doi.org/10.1016/j.jhydrol.2011.02.020.
    Chen, M. Y., W. Shi, P. P. Xie, V. B. S. Silva, V. E. Kousky, R. W. Higgins, and J. E. Janowiak, 2008: Assessing objective techniques for gauge-based analyses of global daily precipitation. J. Geophys. Res., 113(D4), D04110, https://doi.org/10.1029/2007JD009132.
    Davis, C. A., K. W. Manning, R. E. Carbone, S. B. Trier, and J. D. Tuttle, 2003: Coherence of warm-season continental rainfall in numerical weather prediction models. Mon. Wea. Rev., 131(11), 2667−2679, https://doi.org/10.1175/1520-0493(2003)131<2667:COWCRI>2.0.CO;2.
    Glorot, X., A. Bordes, and Y. Bengio, 2011: Deep sparse rectifier neural networks. Proc. Fourteenth International Conf. on Artificial Intelligence and Statistics, Fort Lauderdale, AISTATS, 315−323.
    Gutiérrez, J. M., and Coauthors, 2019: An intercomparison of a large ensemble of statistical downscaling methods over Europe: Results from the VALUE perfect predictor cross-validation experiment. International Journal of Climatology, 39(9), 3750−3785, https://doi.org/10.1002/joc.5462.
    Gutowski, W. J. Jr., F. O. Otieno, R. W. Arritt, E. S. Takle, and Z. T. Pan, 2004: Diagnosis and attribution of a seasonal precipitation deficit in a U.S. regional climate simulation. Journal of Hydrometeorology, 5(1), 230−242, https://doi.org/10.1175/1525-7541(2004)005<0230:DAAOAS>2.0.CO;2.
    Ham, Y.-G., J.-H. Kim, and J.-J. Luo, 2019: Deep learning for multi-year ENSO forecasts. Nature, 573(7775), 568−572, https://doi.org/10.1038/s41586-019-1559-7.
    Harpham, C., and R. L. Wilby, 2005: Multi-site downscaling of heavy daily precipitation occurrence and amounts. J. Hydrol., 312(1−4), 235−255, https://doi.org/10.1016/j.jhydrol.2005.02.020.
    He, S. J., X. Y. Li, T. DelSole, P. Ravikumar, and A. Banerjee, 2021: Sub-seasonal climate forecasting via machine learning: Challenges, analysis, and advances. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1), 169−177, https://doi.org/10.1609/aaai.v35i1.16090.
    Hessami, M., P. Gachon, T. B. M. J. Ouarda, and A. St-Hilaire, 2008: Automated regression-based statistical downscaling tool. Environmental Modelling & Software, 23(6), 813−834, https://doi.org/10.1016/j.envsoft.2007.10.004.
    Hochba, D. S., 1997: Approximation algorithms for NP-hard problems. ACM SIGACT News, 28(2), 40−52, https://doi.org/10.1145/261342.571216.
    Hu, Y. M., D. Si, Y. J. Liu, and L. Zhao, 2016: Investigations on moisture transports, budgets and sources responsible for the decadal variability of precipitation in southern China. Journal of Tropical Meteorology, 22(3), 402−412, https://doi.org/10.16555/j.1006-8775.2016.03.014.
    Hughes, J. P., P. Guttorp, and S. P. Charles, 1999: A non-homogeneous hidden Markov model for precipitation occurrence. Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(1), 15−30,
    Jaagus, J., A. Briede, E. Rimkus, and K. Remm, 2010: Precipitation pattern in the Baltic countries under the influence of large-scale atmospheric circulation and local landscape factors. International Journal of Climatology, 30(5), 705−720, https://doi.org/10.1002/joc.1929.
    Jonah, K., and Coauthors, 2021: Spatiotemporal variability of rainfall trends and influencing factors in Rwanda. Journal of Atmospheric and Solar-Terrestrial Physics, 219, 105631, https://doi.org/10.1016/j.jastp.2021.105631.
    Joshi, D., A. St-Hilaire, T. Ouarda, and A. Daigle, 2015: Statistical downscaling of precipitation and temperature using sparse Bayesian learning, multiple linear regression and genetic programming frameworks. Canadian Water Resources Journal / Revue Canadienne Des Ressources Hydriques, 40(4), 392−408, https://doi.org/10.1080/07011784.2015.1089191.
    Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-year reanalysis project. Bull. Amer. Meteor. Soc., 77(3), 437−472, https://doi.org/10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.
    LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521(7553), 436−444, https://doi.org/10.1038/nature14539.
    Li, X. H., H. Y. Xiong, X. J. Li, X. Y. Wu, X. Zhang, J. Liu, J. Bian, and D. J. Dou, 2021: Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond. arXiv: 2103.10689. https://arxiv.org/abs/2103.10689https://link.springer.com/article/10.1007/s10115-022-01756-8.
    Liu, Z. F., Z. X. Xu, S. P. Charles, G. B. Fu, and L. Liu, 2011: Evaluation of two statistical downscaling models for daily precipitation over an arid basin in China. International Journal of Climatology, 31(13), 2006−2020, https://doi.org/10.1002/joc.2211.
    Maier, H. R., G. C. Dandy, and M. D. Burch, 1998: Use of artificial neural networks for modelling cyanobacteria Anabaena spp. in the River Murray, South Australia. Ecological Modelling, 105(2−3), 257−272, https://doi.org/10.1016/S0304-3800(97)00161-0.
    Manzanas, R., A. Lucero, A. Weisheimer, and J. M. Gutiérrez, 2018: Can bias correction and statistical downscaling methods improve the skill of seasonal precipitation forecasts? Climate Dyn., 50(3), 1161−1176, https://doi.org/10.1007/s00382-017-3668-z.
    Maraun, D., M. Widmann, and J. M. Gutiérrez, 2019: Statistical downscaling skill under present climate conditions: A synthesis of the VALUE perfect predictor experiment. International Journal of Climatology, 39(9), 3692−3703, https://doi.org/10.1002/joc.5877.
    May, R., G. Dandy, and H. Maier, 2011: Review of input variable selection methods for artificial neural networks. Artificial Neural Networks-Methodological Advances and Biomedical Applications, K. Suzuki, Ed., InTech, 16004 pp.
    Najafi, M. R., H. Moradkhani, and S. A. Wherry, 2011: Statistical downscaling of precipitation using machine learning with optimal predictor selection. Journal of Hydrologic Engineering, 16(8), 650−664, https://doi.org/10.1061/(ASCE)HE.1943-5584.0000355.
    Nie, W. L., Y. Zhang, and A. Patel, 2018: A theoretical explanation for perplexing behaviors of backpropagation-based visualizations. Proc. 35th International Conf. on Machine Learning, Stockholm, ICML, 3806−3815.
    Pan, X., Y. H. Lu, K. Zhao, H. Huang, M. J. Wang, and H. N. Chen, 2021: Improving nowcasting of convective development by incorporating polarimetric radar variables into a deep-learning model. Geophys. Res. Lett., 48(21), e2021GL095302, https://doi.org/10.1029/2021GL095302.
    Paszke, A., and Coauthors, 2019: PyTorch: An imperative style, high-performance deep learning library. Proc. 33rd International Conf. on Neural Information Processing Systems, Vancouver, Curran Associates Inc., 8026−8037.
    Pedregosa, F., and Coauthors, 2011: Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12, 2825−2830.
    Ramseyer, C. A., and T. L. Mote, 2016: Atmospheric controls on Puerto Rico precipitation using artificial neural networks. Climate Dyn., 47(7), 2515−2526, https://doi.org/10.1007/s00382-016-2980-3.
    Retsch, M. H., C. Jakob, and M. S. Singh, 2022: Identifying relations between deep convection and the large-scale atmosphere using explainable artificial intelligence. J. Geophy. Res., 127(3), e2021JD035388, https://doi.org/10.1029/2021JD035388.
    Ribeiro, M. T., S. Singh, and C. Guestrin, 2016: “Why should i trust you?”: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, California, Association for Computing Machinery, 1135−1144,
    Rodrigues, E. R., I. Oliveira, R. Cunha, and M. Netto, 2018: DeepDownscale: A deep learning strategy for high-resolution weather forecast. Proc. 2018 IEEE 14th International Conf. on E-Science (e-Science), Amsterdam, IEEE, 415−422,
    Sachindra, D. A., K. Ahmed, M. Rashid, S. Shahid, and B. J. C. Perera, 2018: Statistical downscaling of precipitation using machine learning techniques. Atmospheric Research, 212, 240−258, https://doi.org/10.1016/j.atmosres.2018.05.022.
    Shrikumar, A., P. Greenside, and A. Kundaje, 2017: Learning important features through propagating activation differences. Proc. 34th International Conf. on Machine Learning, Sydney, JMLR.org, 3145−3153.
    Simonyan, K., A. Vedaldi, and A. Zisserman, 2014: Deep inside convolutional networks: Visualising image classification models and saliency maps. Proc. 2nd International Conf. on Learning Representations, Banff, ICLR.
    Sivagaminathan, R. K., and S. Ramakrishnan, 2007: A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Systems with Applications, 33(1), 49−60, https://doi.org/10.1016/j.eswa.2006.04.010.
    Song, Z. X., and J. Li, 2021: Variable selection with false discovery rate control in deep neural networks. Nature Machine Intelligence, 3(5), 426−433, https://doi.org/10.1038/s42256-021-00308-z.
    Springenberg, J. T., A. Dosovitskiy, T. Brox, and M. Riedmiller, 2015: Striving for simplicity: The all convolutional net. Proc. 3rd International Conf. on Learning Representations, San Diego, ICLR.
    Štrumbelj, E., and I. Kononenko, 2014: Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647−665, https://doi.org/10.1007/s10115-013-0679-x.
    Sun, L., and Y. F. Lan, 2021: Statistical downscaling of daily temperature and precipitation over China using deep learning neural models: Localization and comparison with other methods. International Journal of Climatology, 41(2), 1128−1147, https://doi.org/10.1002/joc.6769.
    Sundararajan, M., A. Taly, and Q. Q. Yan, 2017: Axiomatic attribution for deep networks. Proc. 34th International Conf. on Machine Learning, Sydney, JMLR.org, 3319−3328.
    Toğaçar, M., Z. Cömert, and B. Ergen, 2020: Classification of brain MRI using hyper column technique with convolutional neural network and feature selection method. Expert Systems with Applications, 149, 113274, https://doi.org/10.1016/j.eswa.2020.113274.
    Tong, D. L., and R. Mintram, 2010: Genetic Algorithm-Neural Network (GANN): A study of neural network activation functions and depth of genetic algorithm search applied to feature selection. International Journal of Machine Learning and Cybernetics, 1(1), 75−87, https://doi.org/10.1007/s13042-010-0004-x.
    Vandal, T., E. Kodra, and A. R. Ganguly, 2019: Intercomparison of machine learning methods for statistical downscaling: The case of daily and extreme precipitation. Theor. Appl. Climatol., 137(1−2), 557−570, https://doi.org/10.1007/s00704-018-2613-3.
    Vandal, T., E. Kodra, S. Ganguly, A. Michaelis, R. Nemani, and A. R. Ganguly, 2017: DeepSD: Generating high resolution climate change projections through single image super-resolution. Proc. 23rd ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, Halifax, Association for Computing Machinery, 1663−1672,
    Werner, A. T., and A. J. Cannon, 2016: Hydrologic extremes – an intercomparison of multiple gridded statistical downscaling methods. Hydrology and Earth System Sciences, 20(4), 1483−1508, https://doi.org/10.5194/hess-20-1483-2016.
    Wilks, D. S., and R. L. Wilby, 1999: The weather generation game: A review of stochastic weather models. Progress in Physical Geography: Earth and Environment, 23(3), 329−357, https://doi.org/10.1177/030913339902300302.
    Woo, S., J. Park, J.-Y. Lee, and I. S. Kweon, 2018: CBAM: Convolutional block attention module. Proc. 15th European Conf. on Computer Vision, Munich, Springer, 3−19,
    Xie, P., M. Chen, and W. Shi, 2010: CPC unified gauge-based analysis of global daily precipitation. Preprints, 24th Conf. on Hydrology, Atlanta, Amer. Meteor. Soc. https://ams.confex.com/ams/90annual/techprogram/paper_163676.htm.
    Xie, P. P., M. Y. Chen, S. Yang, A. Yatagai, T. Hayasaka, Y. Fukushima, and C. M. Liu, 2007: A gauge-based analysis of daily precipitation over east Asia. Journal of Hydrometeorology, 8(3), 607−626, https://doi.org/10.1175/JHM583.1.
    Ye, M., and Y. Sun, 2018: Variable selection via penalized neural network: A drop-out-one loss approach. Proc. 35th International Conf. on Machine Learning, Stockholm, PMLR, 5616−5625.
  • [1] FAN Lijun, Deliang CHEN, FU Congbin, YAN Zhongwei, 2013: Statistical downscaling of summer temperature extremes in northern China, ADVANCES IN ATMOSPHERIC SCIENCES, 30, 1085-1095.  doi: 10.1007/s00376-012-2057-0
    [2] Kun-Hui YE, Chi-Yung TAM, Wen ZHOU, Soo-Jin SOHN, 2015: Seasonal Prediction of June Rainfall over South China: Model Assessment and Statistical Downscaling, ADVANCES IN ATMOSPHERIC SCIENCES, 32, 680-689.  doi: 10.1007/s00376-014-4047-x
    [3] Deliang CHEN, Christine ACHBERGER, Jouni R¨AIS¨ANEN, Cecilia HELLSTR¨OM, 2006: Using Statistical Downscaling to Quantify the GCM-Related Uncertainty in Regional Climate Change Scenarios: A Case Study of Swedish Precipitation, ADVANCES IN ATMOSPHERIC SCIENCES, 23, 54-60.  doi: 10.1007/s00376-006-0006-5
    [4] CHEN Hua, GUO Jing, XIONG Wei, GUO Shenglian, Chong-Yu XU, 2010: Downscaling GCMs Using the Smooth Support Vector Machine Method to Predict Daily Precipitation in the Hanjiang Basin, ADVANCES IN ATMOSPHERIC SCIENCES, 27, 274-284.  doi: 10.1007/s00376-009-8071-1
    [5] Yunqing LIU, Lu YANG, Mingxuan CHEN, Linye SONG, Lei HAN, Jingfeng XU, 2024: A Deep Learning Approach for Forecasting Thunderstorm Gusts in the Beijing–Tianjin–Hebei Region, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-023-3255-7
    [6] Ruian TIE, Chunxiang SHI, Gang WAN, Xingjie HU, Lihua KANG, Lingling GE, 2022: CLDASSD: Reconstructing Fine Textures of the Temperature Field Using Super-Resolution Technology, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 117-130.  doi: 10.1007/s00376-021-0438-y
    [7] Cecilia HELLSTR?M, Deliang CHEN, 2003: Statistical Downscaling Based on Dynamically Downscaled Predictors: Application to Monthly Precipitation in Sweden, ADVANCES IN ATMOSPHERIC SCIENCES, 20, 951-958.  doi: 10.1007/BF02915518
    [8] Temesgen Gebremariam ASFAW, Jing-Jia LUO, 2024: Downscaling Seasonal Precipitation Forecasts over East Africa with Deep Convolutional Neural Networks, ADVANCES IN ATMOSPHERIC SCIENCES, 41, 449-464.  doi: 10.1007/s00376-023-3029-2
    [9] Tingyu WANG, Ping HUANG, 2024: Superiority of a Convolutional Neural Network Model over Dynamical Models in Predicting Central Pacific ENSO, ADVANCES IN ATMOSPHERIC SCIENCES, 41, 141-154.  doi: 10.1007/s00376-023-3001-1
    [10] Leilei KOU, Zhuihui WANG, Fen XU, 2018: Three-dimensional Fusion of Spaceborne and Ground Radar Reflectivity Data Using a Neural Network-Based Approach, ADVANCES IN ATMOSPHERIC SCIENCES, 35, 346-359.  doi: 10.1007/s00376-017-6334-9
    [11] Yang XIA, Bin WANG, Lijuan LI, Li LIU, Jianghao LI, Li DONG, Shiming XU, Yiyuan LI, Wenwen XIA, Wenyu HUANG, Juanjuan LIU, Yong WANG, Hongbo LIU, Ye PU, Yujun HE, Kun XIA, 2024: A Neural-Network-Based Alternative Scheme to Include Nonhydrostatic Processes in an Atmospheric Dynamical Core, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-023-3119-1
    [12] Haibo ZOU, Shanshan WU, Miaoxia TIAN, 2023: Radar Quantitative Precipitation Estimation Based on the Gated Recurrent Unit Neural Network and Echo-Top Data, ADVANCES IN ATMOSPHERIC SCIENCES, 40, 1043-1057.  doi: 10.1007/s00376-022-2127-x
    [13] JIN Long, JIN Jian, YAO Cai, 2005: A Short-Term Climate Prediction Model Based on a Modular Fuzzy Neural Network, ADVANCES IN ATMOSPHERIC SCIENCES, 22, 428-435.  doi: 10.1007/BF02918756
    [14] Yan Shaojin, Peng Yongqing, Quo Guang, 1995: Monthly Mean Temperature Prediction Based on a Multi-level Mapping Model of Neural Network BP Type, ADVANCES IN ATMOSPHERIC SCIENCES, 12, 225-232.  doi: 10.1007/BF02656835
    [15] Yongku KIM, Balaji RAJAGOPALAN, GyuWon LEE, 2016: Temporal Statistical Downscaling of Precipitation and Temperature Forecasts Using a Stochastic Weather Generator, ADVANCES IN ATMOSPHERIC SCIENCES, 33, 175-183.  doi: 10.1007/s00376-015-5115-6
    [16] Jianfeng WANG, Ricardo M. FONSECA, Kendall RUTLEDGE, Javier MARTÍN-TORRES, Jun YU, 2020: A Hybrid Statistical-Dynamical Downscaling of Air Temperature over Scandinavia Using the WRF Model, ADVANCES IN ATMOSPHERIC SCIENCES, 37, 57-74.  doi: 10.1007/s00376-019-9091-0
    [17] ZHU Congwen, Chung-Kyu PARK, Woo-Sung LEE, Won-Tae YUN, 2008: Statistical Downscaling for Multi-Model Ensemble Prediction of Summer Monsoon Rainfall in the Asia-Pacific Region Using Geopotential Height Field, ADVANCES IN ATMOSPHERIC SCIENCES, 25, 867-884.  doi: 10.1007/s00376-008-0867-x
    [18] Peipei YU, Chunxiang SHI, Ling YANG, Shuai SHAN, 2020: A New Temperature Channel Selection Method Based on Singular Spectrum Analysis for Retrieving Atmospheric Temperature Profiles from FY-4A/GIIRS, ADVANCES IN ATMOSPHERIC SCIENCES, 37, 735-750.  doi: 10.1007/s00376-020-9249-9
    [19] Su Jeong LEE, Myoung-Hwan AHN, Yeonjin LEE, 2016: Application of an Artificial Neural Network for a Direct Estimation of Atmospheric Instability from a Next-Generation Imager, ADVANCES IN ATMOSPHERIC SCIENCES, 33, 221-232.  doi: 10.1007/s00376-015-5084-9
    [20] Jinhe YU, Lei BI, Wei HAN, Xiaoye ZHANG, 2022: Application of a Neural Network to Store and Compute the Optical Properties of Non-Spherical Particles, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 2024-2039.  doi: 10.1007/s00376-021-1375-5

Get Citation+

Export:  

Share Article

Manuscript History

Manuscript received: 17 May 2022
Manuscript revised: 28 September 2022
Manuscript accepted: 24 October 2022
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Predictor Selection for CNN-based Statistical Downscaling of Monthly Precipitation

    Corresponding author: Yamin HU, huym@gd121.cn
    Corresponding author: Xinru LIU, liuxinru@csu.edu.cn
  • 1. Central South University, Changsha 410083, China
  • 2. Guangdong Climate Center, Guangzhou 510610, China
  • 3. Jieyang Meteorological Bureau, Jieyang 522031, China
  • 4. State Key Laboratory of Numerical Modeling for Atmosphere Sciences and Geophysical Fluid Dynamics (LASG), Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China

Abstract: Convolutional neural networks (CNNs) have been widely studied and found to obtain favorable results in statistical downscaling to derive high-resolution climate variables from large-scale coarse general circulation models (GCMs). However, there is a lack of research exploring the predictor selection for CNN modeling. This paper presents an effective and efficient greedy elimination algorithm to address this problem. The algorithm has three main steps: predictor importance attribution, predictor removal, and CNN retraining, which are performed sequentially and iteratively. The importance of individual predictors is measured by a gradient-based importance metric computed by a CNN backpropagation technique, which was initially proposed for CNN interpretation. The algorithm is tested on the CNN-based statistical downscaling of monthly precipitation with 20 candidate predictors and compared with a correlation analysis-based approach. Linear models are implemented as benchmarks. The experiments illustrate that the predictor selection solution can reduce the number of input predictors by more than half, improve the accuracy of both linear and CNN models, and outperform the correlation analysis method. Although the RMSE (root-mean-square error) is reduced by only 0.8%, only 9 out of 20 predictors are used to build the CNN, and the FLOPs (Floating Point Operations) decrease by 20.4%. The results imply that the algorithm can find subset predictors that correlate more to the monthly precipitation of the target area and seasons in a nonlinear way. It is worth mentioning that the algorithm is compatible with other CNN models with stacked variables as input and has the potential for nonlinear correlation predictor selection.

摘要: 统计降尺度是从低分辨率的全球气候模式中推导高分辨率气候变量的技术。卷积神经网络(CNN)已在统计降尺度任务中被广泛研究,并取得了良好的效果。然而,目前仍缺乏探索用于CNN建模的预测因子选择研究。本文提出了一种有效且高效的贪婪消除算法来解决该问题。算法主要有三个按顺序迭代执行的步骤组成:预测因子重要性度量、预测因子消除和CNN再训练,其中单个预测因子的重要性度量是通过用于CNN模型解释的反向传播方法计算的。所提出的算法在含有20个候选因子的月降水统计降尺度任务上进行了测试。实验表明,本文的预测因子选择方案可以将输入预测因子的数量减少一半以上,提高线性和CNN模型的精度,并优于基于相关性分析的预测因子选择方法。在仅使用20个候选因子中的9个来构建的CNN模型,虽然其均方根误差只减少了0.8%,但是其浮点数运算量降低了20.4%。结果说明该算法可以以一种非线性的方式找到与目标区域或不同季节的月降水高相关的子集预测因子。特别地,该算法与其它以叠加变量为输入的CNN模型兼容,并具有非线性相关预测因子选择的潜力。

    • Statistical downscaling is a widely studied technique to derive finer-scale local or regional climate variables from low-resolution general circulation models (GCMs) to meet the requirements of decision-makers in various sectors (Hessami et al., 2008; Joshi et al., 2015; Sachindra et al., 2018). Traditionally, there are three main categories of statistical downscaling approaches (Gutiérrez et al., 2019; Sun and Lan, 2021): (a) perfect prognosis (PP) approaches which establish statistical relationships between informative large-scale atmospheric variables (predictors) and local variables of interest (predictands) (Maraun et al., 2019), (b) model output statistics (MOS) approaches which use statistical techniques to correct the systematic bias of the outputs from a GCM against observations (Werner and Cannon, 2016), and (c) weather generators (WG) approaches which predict the local temporal and marginal climate properties according to historical climatological statistics (Wilks and Wilby, 1999; Hughes et al., 1999) using stochastic models like the Markov. Under the PP methods, the models are learned between simultaneous observations of the “perfect” predictors (typically from a reanalysis) and predictands (historical local or gridded observations), and they can be subsequently transferred onto the simulated outputs of GCMs to obtain future predictions (Manzanas et al., 2018). Standard PP techniques are mainly regression models or analog methods (Maraun et al., 2019). See Gutiérrez et al. (2019) for an intercomparison of traditional PP methods for downscaling experiments over Europe. Recently, with the success of deep learning models in multiple fields (LeCun et al., 2015), convolutional neural networks (CNNs) have also been introduced to address the PP statistical downscaling problem and obtain encouraging results (Vandal et al., 2017, 2019; Baño-Medina et al., 2020; Sun and Lan, 2021).

      Feature selection is the process of reducing the number of input variables when developing a predictive model, and it is considered as a key step for the construction of deep neural networks, including CNNs (Toğaçar et al., 2020; Song and Li, 2021). Due to the availability of data with many climatological GCM variables in a large spatial-temporal space leading to data with very high dimensions in climate science, plenty of feature selection methods have been presented in the prior literature (Harpham and Wilby, 2005; Hessami et al., 2008; Liu et al., 2011; Chen et al., 2011, 2012), including those for statistical downscaling (Najafi et al., 2011). However, with the popularity of CNN models in pertinent climate applications, fewer specialized studies have focused on the selection of predictors for CNNs, which usually input stacked climatological GCM variables. For instance, Baño-Medina et al. (2020) and Sun and Lan (2021) stack 20 predictors as input to conduct temperature and precipitation downscaling, where the 20 predictors are selected empirically based on prior literature.

      Deep neural networks, including CNNs, are a popular machine learning technique, and their feature selection is thought to be a key step for model construction (Chandrashekar and Sahin, 2014). The existing feature selection methods for neural network models, similar to the ones for other machine learning techniques, can be broadly classified into three types: filters, embedded methods, and wrappers (May et al., 2011; Song and Li, 2021). The filters perform preceding analysis (e.g., correlation analysis) and feature selection through the relationship between input and output data. These are applicable to all machine learning models without requiring the involvement of the model training process (Battiti, 1994; Retsch et al., 2022). Embedded methods involve embedding a submodule in the neural network to weight the input features and suppress, instead of eliminate, unimportant ones (Woo et al., 2018; Song and Li, 2021; Pan et al., 2021). In contrast, wrappers select important features and remove the unimportant ones with the participation of the training process of neural networks (Ye and Sun, 2018), and they should be the most appropriate methods for achieving predictor selection of CNNs.

      A simple wrappers approach is to train the model with all available combinations of features and then select the subset features that achieve the best accuracy. However, the processes are computationally expensive, especially under the context of high dimensionality and deep models usually having a large number of parameters. More efficient feature selection wrappers search for the optimal subset of features from the candidate feature set using heuristic algorithms, such as ant colony optimization (Sivagaminathan and Ramakrishnan 2007), and genetic algorithms (Tong and Mintram, 2010). Still, such methods require training the neural network model more than linear multiples of the number of candidate features, and the computation cost of the heuristic search is also pretty high. By contrast, the greedy algorithm (Maier et al., 1998; Ye and Sun, 2018) only needs to train deep models for linear multiple times the number of features, and no extra search algorithm is required.

      Considering computational efficiency and method effectiveness, this study uses the greedy version of wrappers to solve the predictor selection problem of CNNs, performing the statistical downscaling tasks. Specifically, our method is inspired by Ye and Sun (2018) and combines features into a feature group. We treat the features of the same predictor (for instance, the 850-hPa zonal wind) as a group and aggregate the importance metrics of features as the contribution measure of the predictor to the model. Attributing importance metrics to the input features is a popular research topic in interpretable deep learning (Li et al., 2021) and is represented by LIME (Ribeiro et al., 2016), perturbation (Štrumbelj and Kononenko, 2014), gradient (Simonyan et al., 2014), and layer-wise relevance propagation (Bach et al., 2015; Shrikumar et al., 2017). In this study, the feature importance is calculated by the gradient-based guided-backpropagation method (Springenberg et al., 2015) due to its simplicity and comprehensibility, and it is initially proposed for interpreting which part of the input image mainly contributes to the vision CNN model. The developed greedy predictor selection algorithm for CNNs is applied and tested on the statistical downscaling of monthly precipitation over South China using the CNN model comprehensively studied in Baño-Medina et al. (2020) and Sun and Lan (2021), which initially has 20 input predictors.

      To the best of our knowledge, this is the first investigation of predictor selection methods for CNN-based climate models. We tackle the problem by attributing a gradient-based contribution measure to each input predictor in the CNN model and greedily eliminating the predictor with the smallest contribution. The experiments demonstrate that the proposed method effectively reduces input predictors without sacrificing model accuracy. Moreover, the contribution measures defined in this paper provide a better metric of the importance of the predictors in the CNN model to the output than the traditional correlation coefficients. Notably, the presented solution to the predictor selection problem in CNN model construction is also ready to be applied in other CNN-based climate applications that input an indeterminate number of stacked predictors (Ham et al., 2019; He et al., 2021; Rocha Rodrigues et al., 2018). The code that can reproduce all the experiments offered in this paper is publicly hosted on Github.

      The rest of the paper is structured as follows. Section 2 introduces the target monthly statistical downscaling application and models. In section 3, the greedy elimination algorithm for predictor selection of CNN models is presented. The experimental results and discussions are presented in section 4. Section 5 summarizes our work.

    2.   Statistical downscaling application
    • Before presenting the predictor selection solution, we need to introduce the application of statistical downscaling of monthly precipitation over South China and the model architecture that we used for performing the task.

    • This study uses the CPC Global Unified Gauge-Based Analysis of Precipitation dataset (Xie et al., 2010) provided by the NOAA Climate Prediction Center as the downscaling target (predictand). The dataset exploits a dense network of rain gauges to provide quality-controlled high-resolution ($ 0.5^\circ \times 0.5^\circ $) gridded daily precipitation data from 1979 to the present. Advanced interpolation (Xie et al., 2007) and evaluation (Chen et al., 2008) algorithms make the data more trustworthy than other high-resolution gridded precipitation datasets, which usually have high uncertainties due to a lack of gauge-based observations, poor interpolation, and quality assessment processes (Vandal et al., 2019). Meanwhile, the coarse input (predictors) is selected from the NCEP/NCAR Reanalysis 1 dataset with a resolution of $ 2.5^\circ \times 2.5^\circ $. The dataset employs a state-of-the-art analysis/forecast system to perform data assimilation using historical data from 1948 to the current date (Kalnay et al., 1996).

      Both predictors and predictand are acquired from the public PSL (Physical Sciences Laboratory) daily datasets. The monthly predictors and predictand are then calculated by averaging the daily data within the given months. Specifically, the candidate predictor set has five large-scale variables (zonal and meridional winds, geopotential height, air temperature, and specific humidity), all at four pressure layers (1000 850 700, and 500 hPa), as described in Baño-Medina et al. (2020). The NCEP/NCAR dataset denotes the five variables as uwnd, vwnd, air, and shum, respectively. We further combine the variable and pressure layer to distinguish individual predictors. For instance, air500 represents the predictor of air temperature at 500 hPa. The common date span of the predictors and predictand is from 1 January 1979 to 30 September 2021 (totalling 43 years/513 months/15 614 days).

      The target downscaling region is chosen as South China, consisting of Guangdong, Hainan, and Guangxi, due to its relatively high rainfall. Figure 1 shows the average daily total precipitation (mm d−1) of the 30 yr period 1981−2010) and the gridded points of the predictors and predictand over South China. The input latitude and longitude intervals are specified to $15^\circ - 27.5^\circ\text{N}$ and $102.5^\circ-120^\circ\text{E}$, respectively. Note that we deliberately enlarge the input range to a multiple of $ 2.5^\circ $ based on the output because the input predictors have a resolution of $ 2.5^\circ $. After the latitude and longitude ranges are specified, the input grid size is $ 6 \times 8 $, and the output is $ 25 \times 35 $, in which $ 157 $ grid points are on the land within the region border.

      Figure 1.  The downscaling region and its average daily total precipitation (mm d−1) from 1981 to 2010. The black dots represent gridded predictand with a resolution of $ 0.5^\circ \times 0.5^\circ $ within the region, and gridded predictors with a resolution of $ 2.5^\circ \times 2.5^\circ $ are plotted in red dots.

    • Recently, Baño-Medina et al. (2020) did a comprehensive study on the structure design of the CNN model for the statistical downscaling problem. A series of CNN architectures, named CNN1, CNN10, CNNdense, etc., are designed and compared in the statistical downscaling of daily temperature and precipitation over Europe. For convenience, we directly use the CNN10 as the primary CNN model in this study because it obtains the best accuracy compared to others in our experiments.

      Specifically, the selected CNN10 model (see Fig. 2) has three sequentially connected convolutional layers with depths 50, 25, and 10 for feature extraction, followed by a fully connected layer that maps the extracted feature to targets (gridded precipitation in this case). Moreover, each convolutional layer has an ReLU (Glorot et al., 2011) activation function. A more detailed introduction of the model can be found in Baño-Medina et al. (2020).

      Figure 2.  The CNN10 architecture for statistical downscaling in South China, where the number is the size of the tensor or vector, and FC denotes fully-connected

    • Similar to Baño-Medina et al. (2020), we also apply linear methods for the statistical downscaling tasks. Unlike CNNs that use a single model to predict the precipitation data at all grids, this approach requires fitting a linear model for each grid point, where the inputs to each model are the four closest predictor grids to that target location. Here, these linear models are used, on the one hand, as a benchmark for CNN models and, on the other hand, to discuss the effect of the predictor selection results on the linear models.

    • A cross-validation strategy is applied to train and evaluate the models, meanwhile performing contribution measures of predictors. Specifically, the 43-yr data set is divided into seven folds of six years each, except for the fourth fold of seven years. Then, for each $ i \in {1, 2, \dots, 6} $, the $ i $-th and $ (i+1) $-th folds are used as validation and test sets, respectively, while the other five folds are used as the training set. Furthermore, for each fold, the input and output data are standardized by subtracting the means and dividing by the standard deviations, where the means and standard deviations are calculated grid-wise on the training set. In particular, when training the CNN models, we augment the training set by treating an arbitrary 30 consecutive days as a month and construct the sample by averaging the predictors and predictand within the period. It is worth mentioning that the augmentation can increase the amount of data from approximately the number of months to approximately the number of days within the training period, which effectively improves the performance of the constructed CNN models.

      To overcome the randomness of the fitted CNN model due to random initialization parameters, we further adopt a multiple-run strategy. In brief, when constructing a CNN model, we train it multiple times with the same hyperparameters and training set but with random initial free parameters. Then, the predictor importance metric and model's evaluation scores are aggregated and averaged over all runs. Specifically, we performed 10 runs in all our experiments.

      Notably, the multiple-run strategy is only needed for CNNs because the linear models are deterministic, while the k-fold cross-validation is used for both CNN and linear models. Thanks to the adoption of cross-validation, the test dataset has 441 months, which can provide a trustworthy evaluation of the constructed models. The multiple-run strategy can further reduce the effect brought by randomness. Hyperparameters are set to be the same in all CNN models. Specifically, the batch size is set to 1024; sample shuffle is set to true, which means that for each epoch, the training dataset is randomly shuffled to draw batches; learning rate is 0.001; the optimization algorithm is Adam; and the CNN models are trained for at most 200 epochs with an early-stop strategy. Moreover, the implementation is based on Pytorch (Paszke et al., 2019), and other parameters are not especially changed. Additionally, the implementation of the linear method is based on the library scikit-learn (Pedregosa et al., 2011), and default parameters are used.

    3.   Predictor selection algorithm
    • Let $ S = \{1,2,\dots,C\} $ be the indices of $ C $ candidate predictors, and let$ y \in\mathbb{R}^D $ be the target. The goal of predictor selection is to identify an index subset $ S^* \subset S $ of predictors that can train a model of better prediction from given samples $ (x^{(i)}, y^{(i)})_{i = 1,\dots,n} $, where a model is said to be of better prediction if it uses fewer input predictors and obtains better accuracy than the model that inputs all candidate predictors. We will denote the model that inputs all candidate predictors as a reference model.

    • Our solution to the predictor selection problem is inspired by Ye and Sun (2018), in which the feature selection for deep neural networks (not CNNs) is performed by iterative elimination. However, the procedures are also computationally expensive because the contributions need to be calculated for each feature (or group) separately using a modified model over the whole training dataset. Under this context, our solution is to compute the importance metrics of input predictors using gradient-based feature attribution, in which only one traversal calculation is required over the validation dataset. In fact, gradient is one of the simplest and most efficient methods of measure featuring importance in differentiable models (Ancona et al., 2018), and the main reason we choose this method over others, such as layer-wise relevance propagation, is that the downscaling model used in this study has a simple architecture (containing only sequentially connected convolutional layers and ReLU activation), and the gradient-based method is easier to understand and has a better theoretical explanation for simple CNN models. The computed gradients are then aggregated to define the contributions of a single predictor. The procedures are detailed as follows.

      Let function $ y = F(x)\in \mathbb{R}^D $ denote the (fitted) CNN model that maps a stack of predictors $ x\in \mathbb{R}^{C \times P \times Q} $ to the target $ y $. $ C $ is the number of selected predictors. In our statistical downscaling application, $ P,\; Q $, and $ D $ are 6, 8, and 157, respectively. Given an input $ x_0 $, we can approximate $ y_d $ with a linear function near $ x_0 $ by computing the first-order Taylor expansion:

      where $ \circ $ represents the Hadamard product operator, $ \sum $ means the sum of all elements in the matrix, and $ \omega_d \in \mathbb{R}^{C \times P \times Q} $ is the gradient of $ y_d $ with respect to the input at the point $ x_0 $:

      Equation 1 implies that $ y_d $ is approximately proportional to the entries in $ \omega_d $ around $ x_0 $. Therefore, a reasonable definition of the $ c $-th predictor's contribution $ A_c $ at a given point $ x_0 $ is the sum of all absolute values of gradients corresponding to the predictor over all entries of $ y $:

      The gradients $ {\boldsymbol{\omega}}_d $ of deep neural networks, including CNNs, can be efficiently computed using the backpropagation algorithm (LeCun et al., 2015). There are several variants of backpropagation approaches, such as guided backpropagation (guided-backpropagation) (Springenberg et al., 2015), integrated gradient (Sundararajan et al., 2017), etc. This study uses guided-backpropagation because it is more robust to noise than standard backpropagation results (Nie et al., 2018) and more efficient than integrated gradient. The contribution of the individual predictors to the CNN model is averaged over the whole validation set. See Algorithm 1 (Table 1) for an outline of the computation procedures, where lines 7 and 8 correspond to Eqs. 2 and 3, respectively.

      Algorithm 1 Calculation of predictor contributions
      1: procedure PREDICTORCONTRIBUTION ($ F $, $ X $) $\triangleright F$ and $ X $ are fitted model and validation set
      2: $ N \gets \text{Length}(X) $ $\triangleright $ Number of samples
      3: $ A \gets (0,0,\dots,0) $ $\triangleright $ of length $ C $
      4: for $ n=1,\dots,N $ do
      5: $ x_0 \gets X[n] $ $\triangleright $ The $ n $-th sample
      6: $ y = F(x_0) $ $\triangleright $ Forward pass of CNN
      7: $ \omega \gets \left(\left.\frac{\partial y_1}{\partial x} \right|_{x=x_0}, \left.\frac{\partial y_2}{\partial x} \right|_{x=x_0}, \cdots, \left.\frac{\partial y_D}{\partial x} \right|_{x=x_0} \right) $ $\triangleright $ Compute gradients with guided-backpropagation
      8: $ A^\prime \gets \left(\displaystyle_{d=1}^D \displaystyle_{p=1}^{P} \displaystyle_{q=1}^{Q} \left|(\omega_{d})_{1,p,q}\right|, \cdots, \displaystyle_{d=1}^D \displaystyle_{p=1}^{P} \displaystyle_{q=1}^{Q} \left|(\omega_{d})_{C,p,q}\right| \right) $
      9: $ A \gets A + A^\prime $ $\triangleright $ Accumulate the contribution metric
      10: end for
      11: return $ A/N $ $\triangleright $ Average and return
      12: end procedure

      Table 1.  Greedy predictor elimination with predictor contribution calculation

      Based on the definition of the contribution metrics, the overall greedy elimination algorithm to the predictor selection problem is summarized in Algorithm 2 (Table 1). Notably, the predictor contributions are calculated and averaged on multiple-run models, and the evaluation scores, which will be presented next, are also calculated in this way. Furthermore, there is no termination condition like the one that stops the iteration if the accuracy drops. Instead, the optimal subset of candidate predictors is determined after evaluating all the constructed CNN models. If multiple-run and cross-validation are not counted, CNN models need to be trained and evaluated for a number of times exactly equal to the number of candidate predictors, which is more efficient than the greedy algorithm presented in Ye and Sun (2018).

      Algorithm 2 Greedy predictor elimination algorithm
      1: Initialization: $S=\{1,2,\dots,C\},\; S^\prime=\varnothing$ $\triangleright $Sets of indices to candidate and eliminated predictors
      2: $ S^* \gets S \setminus S^\prime $ $\triangleright $Set of indices to remaining predictors
      3: while $ |S^*| \geq 1 $ do $\triangleright $$ |S^*| $ is the cardinality of set $ S^* $
      4: $ A \gets (0,0,\dots,0) $ $\triangleright $of length $ C $
      5: for $ k=1, 2, \dots, 6 $ do
      6: $ X^{k} \gets $ Validation set in fold $ k $
      7: $ F^{k} \gets $ Fitted model trained using predictors in $ S^* $ $\triangleright $Multiple-run
      8: $ A^\prime \gets \text{PREDICTORCONTRIBUTION}(F^{k},\; X^{k}) $ $\triangleright $ Multiple-run and average
      9: $ A \gets A + A^\prime $
      10: end for11: $ A \gets A / 6 $
      12: $ i \gets $ Index of predictor whose contribution is $ \min(A) $
      13: $ S^\prime \gets S^\prime \cup \{i\} $
      14: $ S^* \gets S \setminus S^\prime $
      15: end while
    4.   Results and Discussions
    • According to Algorithm 2, a succession of models with different numbers of input predictors can be constructed and evaluated. Three scores are implemented to measure the accuracy. They are RMSE (root-mean-square error), CC (Correlation Coefficient), and ATCC (Anomaly Temporal Correlation Coefficient), where RMSE and CC are calculated by months and used for measuring the spatial errors and Pearson correlations between the predictions and observations. ATCCs are Pearson correlation coefficients calculated between the predicted and observed grid-wise time series. In particular, when calculating ATCCs, the time series are first filtered for seasonal oscillations by subtracting the climatological average of observations calculated in 12 calendar months using a 30-yr period (1981−2010) of data. Note that as mentioned before, the scores are computed and averaged in multiple runs. In addition to quantities that measure model accuracy, we use a common quantity FLOPs (Floating Point Operations), which computes the theoretical amount of multiply-add operations in CNNs to estimate the computation cost of CNN models. FLOPs are a deterministic quantity, independent of the dataset.

      Each experiment in our implementation takes about two and a half hours to run, including the CNN and linear models training and evaluation, and we can infer that about 15 minutes are needed if it is done without a multiple-run. Running the program uses 1 GPU (NVIDIA RTX 2080Ti 11G) and 1 CPU (i9-7920X, 2.90GHz).

    • It is imaginable that there are models of better prediction than the reference models if candidate-predictor redundancy exists and the defined contribution metric is indicative enough. To verify this expectation, we applied the constructed CNN and linear models to the test set and compared the outputs with the observations. Note that all model outputs need to be anti-standardized by multiplying them with the standard deviations and adding the means. Due to the use of cross-validation, the test set consists of 441 months, and there are 157 target grid points in the region; therefore, for each model to be evaluated, a total of 441 RMSE and CC scores and 157 ATCC scores can be obtained.

      Figure 3 shows the mean evaluation scores of CNN models during the whole predictor selection procedure. Meanwhile, the scores of linear models constructed using the same selected predictors are also displayed. Specifically, RMSE, CC, and ATCC are plotted in Figs. 3a, 3b, and 3c, respectively. For CNN models, it can be found that the accuracy remains stable at first as the predictors are removed one by one, and even better prediction models exist. After seven predictors are left, further elimination leads to a significant degradation of the model performance (see Figs. 3a and 3a). Although the CC score starts to drop after only four predictors are left (see Fig. 3b), seven should be the least number of input predictors for all three evaluation metrics. When 9 predictors remain, both RMSE and CC are at their best, and ATCC is also close to its optimal case (11 predictors). In the following, we will refer to the CNN models with nine and seven input predictors as BEST and LEAST, respectively. For linear models, the effect of predictor removal is greater, especially according to RMSE and CC. However, when there are 10 predictors left, continued removal will lead to a significant decrease in ATCC. Therefore, we assert that in this experiment, 10 predictors should be the best choice for the linear models. The results are in line with our expectations: for both the CNN model and linear method, better prediction models with fewer input predictors exist compared to the reference models.

      Figure 3.  The mean scores of CNN (convolutional neural network) and LR (linear regression) models throughout the predictor elimination procedures. The x-axis is the number of predictors. For each type of score, the CNNs of better predictions are highlighted with different markers, among which the BEST and LEAST are specialized.

      Comparing results of the CNN and linear models, we can find that the CNN model always performs better than the linear method, which shows the advantage of the CNN. The large difference in the effect of predictor elimination indicates that the CNN model is more robust to data redundancy, while the linear model may be more prone to overfitting when the input dimension is too high. Furthermore, the minimum number of input predictors for CNN is seven, three less than the linear method. It can be inferred from this fact that the CNN model is more capable of nonlinear feature extraction, which is consistent with the highly nonlinear characteristics of CNN. In particular, redundancy exists mainly because the input predictors are not completely independent of each other. Therefore, as some predictors are removed, their contribution to the model can be replaced by others. As shown in Fig. 3a, the RMSE is stable at first as predictors are removed and increases significantly and rapidly after seven predictors. The remaining predictors become more independent of each other and thus cannot compensate for the loss in accuracy caused by removing any predictor. Not quite as expected, the experimental results show that as the predictors are eliminated, the model accuracy does not strictly monotonically increase first and then decrease at some point. This is mainly because the evaluation results are influenced by the generalization ability of models; after all, the model is built on the training and validation sets, while the evaluation is performed on the test set.

      In addition to showing the mean values of each evaluation metric in Fig. 3, we analyzed the distribution of each metric. Figure 4 shows the box plots of RMSE (Fig. 4a), CC (Fig. 4b), and ATCC (Fig. 4c) scores of the reference, BEST, and LEAST CNN models. From the figures, it can be seen that the distribution of each score of the three models is very close. Specifically, the BEST model scores outperform the other two models in terms of the mean and the individual score quartiles. Note that the BEST model only uses one less than half of the predictors of the reference model. The LEAST model was chosen in such a way that it is the closest to the reference model, but it uses only 7 input predictors, which is 13 predictors less than the reference model.

      Figure 4.  Box plots of evaluation scores of reference, BEST, and LEAST CNN models. A six-number summary of the scores is displayed. Box and whiskers cover the 25−75th and 5−95th percentile ranges, respectively. Median and mean are plotted with an orange line and a green triangle, respectively. The mean value is shown on the top of the boxes.

      Additionally, the geographical distributions of the ATCC score of the reference model and the scoring bias of the BEST and LEAST models compared to it are presented in Fig. 5. The deviation distributions show that, firstly, the variation of the scores of the BEST and LEAST models with respect to the reference model is tiny (specifically between −0.04 and 0.04). Secondly, the regions where the scores increase account for the vast majority of the entire region, especially for the BEST model. The findings suggest that the greedy predictor elimination algorithm improves the accuracy of the model not only in the average statistical sense but also in the overall distributions. Although decreased ATCC occurs in small areas after predictor removal, the magnitude is within an acceptable range, and the range is mainly caused by model randomness. Accordingly, the rationale for improving model performance through predictor removal is the inference that the elimination of highly correlated or redundant variables provides improvement since they are considered detrimental to the CNN. Exceptionally, despite the reduction being slight, the ATCC in Hainan (South of the region) becomes worse under both BEST and LEAST models. This implies that some removed variables are not important enough for downscaling precipitation in all of South China but may have more impact on some local regions. Therefore, we believe that the factor selection algorithm proposed in this paper should be used for relatively small regional downscaling tasks since the factors affecting precipitation differ from region to region (Jaagus et al., 2010; Jonah et al., 2021).

      Figure 5.  The geographic distributions of ATCC of the reference CNN model (1st column) and ATCC bias between the BEST (LEAST) and reference CNN models [2nd (3rd) column].

      Although the improvement in model performance is not significant through predictor selection, it can improve our understanding of the data. The above experiments showed that about half or less than half of the candidate variables are the most relevant to the regional monthly precipitation in the studied area. In addition, CNN models with fewer variables can reduce model parameters, lower FLOPs, and thus improve computational efficiency. Specifically, Table 2 presents the changes in average scores, the number of model parameters, and FLOPs of the models constructed with 9 and 7 predictors and the reference model, which inputs 20 predictors. Compared with the reference model, although the RMSE is reduced by only 0.8%, only 9 out of 20 predictors are used to build the CNN, and the FLOPS decrease by 20.4%, with better model performance. When using seven predictors, the number of CNN parameters reduces by 6.0%, and the FLOPs decrease by 24.1%, without accuracy loss.

      Predictors RMSEs CCs ATCCs Parameters FLOPs
      20 1.793 0.633 0.577 98,102 1,163,677
      9 1.779(−0.8%) 0.641(+1.3%) 0.592(+1.7%) 93,152(−5.0%) 926,077(−20.4%)
      7 1.790(−0.2%) 0.637(+0.6%) 0.583(+0.7%) 92,257(−6.0%) 882,877(−24.1%)

      Table 2.  Comparisons between the models constructed using nine (2nd row) and seven (3rd row) predictors to the reference model (1st row). The data in parentheses are differential percentages of the corresponding model compared to the reference one.

    • Next, we performed a reverse predictor selection experiment to demonstrate that the performance variation of the model is not solely determined by the number of predictors. That is, in line 12 of Algorithm 2, the predictor to be eliminated from the computation is set to be the predictor with the maximal rather than minimal contribution, and the rest of the procedure remains unchanged. Figure 6 shows the reverse experimental results as in Fig. 3. As can be seen from the figure, the performance of both the CNN and linear models shows a significant decrease as the important predictors indicated by the metrics are removed. This suggests that the defined predictor importance metric is indicative. Since the predictors are not independent of each other, the effect of reverse removal on the model is relatively small at first and then becomes larger, but in general, the effect of removing significant predictors is negative for the model accuracy. Consequently, we can infer that the contribution of the removed predictor in each step is too significant to be entirely replaced by the remaining predictor. Contrary to the results of the normal greedy predictor elimination experiment, the model performance of the CNN in this reverse experiment tends to vary more significantly than the linear models. This may be due to the fact that the generalization performance of the linear method improves as the input dimension decreases, leading to better accuracy. However, the accuracy of the linear approach also decreases with the removal of significant predictors in this reverse experiment, and the two effects cancel each other out, resulting in a less pronounced change of the linear model than the CNN.

      Figure 6.  Same as Fig. 3. except that this is for reverse predictor elimination procedures.

    • Since deep neural networks can learn highly nonlinear and complex relationships between inputs and outputs, we believe that the traditional variable selection method, such as correlation analysis, is not adequate for deep neural network models. To prove this point, we conducted another greedy predictor elimination experiment for comparison. Specifically, Pearson correlation coefficients of individual predictors are calculated grid-wise concerning the total regional precipitation. Then, importance metrics of each predictor are set as the average absolute values of correlation coefficients associated with the predictor in $ 6 \times 8 $ grids. The time range is set to be 1981−2010 for a total of 360 months. The predictors and total regional precipitation are standardized month by month to remove the seasonal variation. All candidate predictors are sorted according to the associated importance metrics in ascending order, which is also the order of their elimination. We used the new predictor elimination sequence and applied the same cross-validation and multiple-run strategies to train and evaluate the CNN models. The elimination order will be presented later.

      Score results are shown in Fig. 7. Note that the reference model is completely the same as the ones in Figs. 3 and 6, so no new training of the reference model is needed. It can be observed that our method is advantageous in several aspects. First and foremost, the BEST (or LEAST) CNN models have five (four), four (three), and four (one) fewer input predictors, according to RMSE, CC, and ATCC, respectively. Also, the linear models with the least input predictors that achieve better prediction than the linear reference model have three, three, and four fewer input predictors than the correlation-analysis-based method. Third, there are more better-prediction CNN models compared to the reference one. Specifically, under RMSE (CC, or ATCC), our method finds 9 (16, or 12) CNN models obtaining better prediction, while the correlation-analysis-based approach discovers only 6 (13, or 12). This comparison shows that the correlation coefficient does not adequately represent the contribution of different input variables in the CNN model compared to the gradient-based importance measure defined in this paper. After all, CNNs are considered complex black-box models, and the advantage of the metric definition in this paper is that it uses the backpropagation of the CNN model itself.

      Figure 7.  Same as Fig. 3. except that this is for predictor elimination procedures based on the correlation analysis.

    • The above experiments have demonstrated that the defined gradients-based predictor importance metric is representative in measuring the predictor contributions in a CNN model and instructive for predictor selection. Figure 8 (right) shows the squared root of the contribution metrics for each predictor in different CNN models. The squared root of the contribution metric is calculated for better visual contrast. The vertical coordinates from top to bottom in the figure are the orders of predictor elimination under our method. Note that the magnitude of each predictor's metric in the same model measures the importance of that predictor. Moreover, although the size of the metrics for the specific predictor in different models is not highly comparable, it is evident that as some predictors are eliminated, the remaining predictors provide increasingly important contributions to the model. Additionally, the left subplot of Fig. 8 displays a bar plot of the importance metrics (averaged correlation coefficients) of candidate predictors computed in the correlation analysis experiment. Their removal indices are labeled on the right of each bar (from 1 to 20). By comparing the two elimination sequences, it is hard to find a meaningful relationship between them, except that some removals of predictors of the same variable follow about the same orders. For example, shum500 is the last specific humidity predictor, and vwnd850 is the final zonal wind component predictor to be eliminated under both approaches.

      Figure 8.  (Left): Bar plot of the importance metrics (coefficient coefficients) of predictors calculated using the correlation analysis method. The predictors of the same variable are rendered in the same color. The indices of predictors in the elimination sequences under the correlation-analysis-based method are labeled on the right of bars. (Right): The squared root of contribution metrics of all predictors in the CNN models of different numbers of input predictors (x-axis) throughout the selection procedures

      Specifically, the results under both predictor selection schemes suggest that the humidity and wind components are the most critical variables to the precipitation. This is to some extent consistent with the investigation in Ramseyer and Mote (2016), which pointed out that lower-tropospheric humidity and wind are the more important among 37 predictors for precipitation in a neural network model. The same conclusion was made in Hu and Zhao (2016), which investigated the primary influence of moisture transport driven by wind and humidity on precipitation over South China. However, some aspects of the predictor removal process are difficult to fully explain, such as which predictors share the contribution of the removed ones in the new model; why the order of the magnitude of the contribution of some variables changes across models; etc. These will be our future work.

      Next, we select one grid of interest from the 157 grids in the downscaling region and compute the correlation coefficients between the precipitation of the grid and gridded predictors, as well as the gradients of the grid with respect to the input predictors (average $ \omega_d $ s in Eq. (2) over validation set) calculated in the reference model using the guided-backpropagation technique. Note both correlation coefficients and gradients are tensors of shape $ 6 \times 8 \times 20 $, where 20 is the number of candidate predictors. To obtain comparable visualization results, we scaled both tensors to be between −1 and 1 by dividing the gradients and correlation coefficients by their respective absolute maximum values. The heatmaps of scaled correlation coefficients and gradients of three predictors are shown in Fig. 9. The selected predictors are from three different circulation variables, all of which are in the input predictors of the LEAST model.

      Figure 9.  Heatmaps of three predictors' scaled correlation coefficients (left) and gradients (right). The selected grid of interest is highlighted with cyan dots. Red and black dots are grids of predictors and predictand, respectively.

      It can be seen that, overall, the data of the two importance measures show similar spatial distribution patterns. This shows that although we use the backpropagation algorithm of the black-box model, the results are not completely incomprehensible. Specifically, the importance of shum500 (Fig. 9a) on the grid of interest decreases in the southwest and northeast directions; uwnd500 and vwnd1000 (Figs. 9b and 9c) both show significant north−south and east−west differences. Of course, there are also many differences between correlation coefficients and gradients, such as relative data magnitude and the location of the positive and negative data boundaries. Moreover, in terms of importance, it can be found that vwnd1000 has the largest correlation coefficients (darkest red shade) among the three predictors, while the gradient shows that shum500 has the most significant gradient values. The similarity of the distribution type between the correlation coefficient and the gradient somehow justifies the definition of the gradient-based variable importance measure. Moreover, the difference between them is mainly due to the fact that the correlation coefficients are linear and are calculated independently without considering the interactions between predictors. In contrast, the gradient is calculated by the backpropagation algorithm in a CNN model built with all predictors, which is a highly nonlinear relationship and utilizes the interactions between variables.

    • The above experiments and discussion illustrate that the method proposed in this paper can find a better subset of the candidate variables for the statistical downscaling of monthly precipitation where the models are built on a year-round dataset so that the selected variables can be considered as the most relevant factors affecting precipitation in a given region throughout the year. However, the physical factors affecting precipitation in the cold and warm seasons are often different (Gutowski et al., 2004). Therefore, here we consider applying the greedy predictor selection algorithm to analyze the main contemporaneous factors affecting precipitation in South China during the cold and warm seasons, respectively. To achieve this goal, we divide the augmented dataset into two parts according to months, where April to September are the warm season and the remaining six months are the cold season. The two sub-datasets are each about half of the original dataset. Then, the greedy predictor selection algorithm was applied to each of the two sub-datasets. Generally, the warm season in southern China has more precipitation and higher temperature than the cold season.

      Figure 10 shows the comparison of the ATCC scores of the resulting CNN models in the warm and cold seasons. Figure 10a shows the change in the ATCC scores of the constructed CNN models on the test set as the predictors are removed, and it can be seen that as the variables are removed, the change in model accuracy for the cold and warm seasons follows a similar trend as the results for the whole year (see Fig. 3), i.e., it starts to gradually increase and then decreases. Note that although the same horizontal coordinate in Fig. 10a means that the same number of predictors are used as model inputs for both the cold and warm seasons, the combination of predictors may be different. As seen in the results, the models perform better in the cold season than in the warm season, and the number of predictors used in the BEST and LEAST models is higher in the warm season (12 and 7) than in the cold season (9 and 5). This indicates that the warm-season precipitation in South China is more difficult to simulate and has more influencing factors. This is consistent with the findings of past studies for other regions (Davis et al., 2003; Bukovsky and Karoly, 2011). The box plots of ATCC scores for the cold-season and warm-season reference, LEAST, and BEST models shown in Fig. 10b illustrate that although the cold-season models have higher mean scores, their scores are relatively less stable (with a wider range of box quantile distributions). This may be due to the low precipitation in the cold season, where small changes in prediction bias may cause large fluctuations in scores.

      Figure 10.  Comparisons of ATCC scores in warm and cold seasons.

      In particular, the main factors affecting precipitation in the cold and warm seasons are very similar to those in the full year. Taking the LEAST model in Fig. 10 as an example, there are five main factors: shum1000, shum700, uwnd1000, uwnd500, and vwnd700 for the cold season; the warm season has shum1000, shum500, uwnd500, uwnd700, vwnd1000, vwnd700, and vwnd850, totaling seven main factors. It can be seen that the most relevant factors for precipitation in both cold and warm seasons are still wind and humidity variables. Moreover, the set of main factors for both seasons is a subset of the top 10 main factors (see Fig. 8), but there are some differences. For instance, shum700 and uwnd1000, which are considered as main factors in the cold season, are not so important in the warm season. In addition, the top 10 year-round principal factor uwnd850 was not selected in the LEAST model for both seasons, but it was one of the 12 predictors in the BEST model for the warm season. The results show that the wind field components (uwnd and vwnd) that mainly affect precipitation may not come from the same pressure layer due to the fact that uwnd and vwnd (i.e., east−west and north−south winds) are viewed as different predictors in this study. Moreover, the wind field of a particular pressure layer may be important, but it is possible that only one of the wind directions (e.g., north−south winds) mainly drives the north−south water vapor transport to form precipitation.

      The experiments on cold and warm seasons demonstrate that the predictor selection algorithm proposed in this paper can be applied to the cold and warm seasons separately to filter out the main factors affecting precipitation in different seasons from the candidate variables. The main factors can also be used to build more accurate statistical downscaling CNN models. Of course, the dataset can be further divided into four seasons (such as spring, summer, autumn, and winter) or 12 months to explore the principal factors affecting precipitation in different seasons or months. We have not explored this in the present study because too many divisions may lead to a decrease in sample size and affect the generalization ability of the established CNN models.

    5.   Conclusion
    • In the context of the popularity of CNN models in climate science applications and the lack of dedicated CNN predictor selection studies, this paper proposes a greedy algorithm-based CNN predictor selection algorithm and applies it to the CNN-based statistical downscaling task of monthly precipitation over South China. Experiments demonstrated that the approach can reduce the number of predictors by more than half and reduce the parameters and computational effort of the CNN model without compromising the accuracy.

      The analysis and comparison results show that the proposed method can find a subset of predictors correlated to the monthly precipitation in a nonlinear way and performs better than a linear correlation analysis approach. Specifically, selection results suggest that the humidity and wind components are the most critical variables to the precipitation in South China, which is consistent with some conclusions in current research work. The proposed method was also used to analyze the main factors affecting precipitation in South China during the cold and warm seasons, further demonstrating that the method can improve the accuracy of downscaling models and distinguish different main factors affecting precipitation in South China during cold and warm seasons. Additionally, the proposed method has great potential to be used in other CNN-based climate applications or as a tool to select correlation predictors for climate variables of interest because it takes advantage of the ability of the CNN model to establish complex nonlinear relationships.

      It is well known that choosing $ m $ subset variables from $n (m\leqslant n)$ variables to build an optimal model is an np-hard problem (Hochba, 1997). For example, there are 125 970 possible combinations of 8 from 20 predictors, not to mention all possibilities of selecting any number of predictors from 20. Therefore, it is almost impossible to exhaust every combination to build a CNN model and find the optimal subset from it. The greedy algorithm adopted considering the computational efficiency may only be able to find a suboptimal subset of predictors, and it is not an optimal solution to the predictor selection problem of CNN models. Other search algorithms like heuristic algorithms (Sivagaminathan and Ramakrishnan, 2007; Tong and Mintram, 2010) are also suboptimal approaches but are thought to be more powerful than greedy methods without considering computation cost. Therefore, it is possible that better predictor selection results obtained using heuristic algorithms rather than greedy ones could come from future studies. Furthermore, although the result of the guided-backpropagation method was proven to be more visually understandable to humans in the vision field, this study does not explore whether it is the best for calculating predictor contributions. Using other attribution methods like layer-wise relevance propagation should also be feasible. Apart from the research for better contribution measures, using the proposed method to study the difference of important factors in different regions is also an interesting and meaningful topic. Besides, the relationship between precipitation and other more predictors, including resultant wind, is worth studying in the future.

      In order to facilitate the application and development of related methods, we have made the code used in this study publicly available on Github. In addition, we show more experimental results in the code repository, and the code project also provides flexible command-line scripts to support more experiments, such as changing the study region and CNN model structure.

      Acknowledgements. This study is supported by the following grants: National Basic R&D Program of China (2018YFA0606203); Strategic Priority Research Program of Chinese Academy of Sciences (XDA23090102 and XDA20060501); Guangdong Major Project of Basic and Applied Basic Research (2020B0301030004); Special Fund of China Meteorological Administration for Innovation and Development (CXFZ2021J026); and Special Fund for Forecasters of China Meteorological Administration (CMAYBY2020-094).

Reference

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return