-
The backbone framework of our model uses VDSD (Mao, 2019). The very deep statistical downscaling model (VDSD) contains a large number of local skip connections. We have improved it, adding a global skip connection and attention mechanism, and call it the CLDAS Statistical Downscaling Model (CLDASSD). The input of CLDASSD is a coarse-resolution temperature field (0.0625° × 0.0625°) and fine-resolution DEM data (0.01° × 0.01°). After model training, the output is the temperature field magnified 6.25-fold (0.01° × 0.01°). CLDASSD mainly contains convolutional layers, pooling layers, rectified linear unit layers (ReLU), and sigmoid layers. The overall model design is shown in Fig. 4.
-
CLDASSD follows the pre-sampling structure of SRCNN (Dong et al., 2016), which locates the upsampling layer at the model's head. The upsampling method uses bicubic linear interpolation (Keys, 1981), within which we do not need to manually set any parameters for the models with different scale factors, which improves the model reusability. Furthermore, this approach can avoid the adverse effects of some learnable upsampling methods, such as a checkerboard effect from transposed convolution (Odena et al., 2016).
-
Suppose the model is
$ \mathcal{F}\left(*\right) $ , the input low-resolution temperature field denotes$ {T}_{\mathrm{c}\mathrm{o}\mathrm{a}\mathrm{r}\mathrm{s}\mathrm{e}} $ , the high-resolution DEM denotes$ H $ , and the output high-resolution temperature field denotes$ {T}_{\mathrm{f}\mathrm{i}\mathrm{n}\mathrm{e}} $ ; then,Here,
$ x $ is the residual error between the high- and low-resolution fields following the Gaussian distribution. The model fits a sparse Gaussian distribution, which is much easier than directly learning the mapping from low-resolution fields to high-resolution fields (Drozdzal et al., 2016). This connection method is widely used in single-image super-resolution (Kim et al., 2016; Tai et al., 2017a, b; Ahn et al., 2018).Therefore, we directly add the low-resolution field after the upsampling layer to the end of the model in CLDASSD (see Fig. 4a), which avoids learning the mapping of the entire temperature field to the entire temperature field and dramatically reduces the difficulty of model learning.
-
Our motive for using the attention mechanism is to be able to anticipate that the model can effectively extract critical information and suppress useless information during the training process. The design inspiration of CLDASSD is influenced by a channel-wise and spatial feature modulation network (CSFM, Hu et al., 2020), so we add an attention unit at the end of the standard residual structure.
The details of the attention unit are shown in Fig. 4b. We call the residual structure containing the attention unit ResAttentionBlock. The attention unit of each ResAttentionBlock contains two branches: a channel attention branch and a spatial attention branch. The spatial attention branch is used for pixel-level feature map processing: a two-dimensional weight vector can be obtained to suppress the low contribution area on each feature map. The channel attention branch is used to perform channel-level feature map processing: a one-dimensional weight vector can be obtained, and the feature map with a low contribution is directly assigned a lower weight. ResAttentionBlock then fuses the feature maps of the two branches after attention weight processing.
-
The formula for the vanilla L1 loss and its derivative is
$ {\mathcal{L}}_{\mathrm{L}1} $ is not derivable at 0. However, when the model fits the residual mentioned in section 3.2.1, there will be many zero values in the residual spatial distribution, making the model's output unstable. Therefore, we use Charbonnier loss (Lai et al., 2017), a kind of improved vanilla L1 loss, to serve as the model's loss function. The formula iswhere
$ {O}_{i} $ denotes the model output grid i,$ {G}_{i} $ denotes the HRCLDAS data grid i, and$ n $ denotes the number of grids.$ \epsilon $ is a constant, generally set to$ {10}^{-3} $ . This improvement makes the loss function derivable everywhere, and the model training process is more stable than with the vanilla L1 loss. -
CLDASSD is built on TensorFlow 1.4, and the entire model is trained on one NVIDIA 2080ti GPU. All convolution structures in the model use a convolution kernel with a size of 3 × 3 and a step size of 1 and perform Gaussian initialization; padding technology is used to ensure that all feature maps in the data stream maintain their shape. The number of ResAttentionBlocks, mx, means that there are m blocks (see Fig. 4a). In these paper m is 9, and the training batch size is 64 (these parameters are adjusted repeatedly). The Adam optimizer is used, and the learning rate is set to 0.001 to optimize the network model's parameters.
-
To evaluate the results of CLDASSD in detail and comprehensively, we design the "double true values" evaluation. Specifically, first, we use the observation data as the "true value" and take bias, root mean square error (RMSE), mean absolute error (MAE), and COR as metrics to evaluate the reconstruction field of the model. Moreover, we also care about the spatial distribution of the model’s reconstruction field and the accuracy of the texture details at high resolution. Therefore, we take HRCLDAS as the “true value” and use the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), which are often used in super-resolution, to evaluate the similarity between the reconstruction field and HRCLDAS.
Bias, RMSE, MAE, and COR are used mainly to evaluate the reconstruction field's pixel-level error. Their formulas are as follows:
where
$ {O}_{i} $ denotes the observations at weather station$ i $ (i.e., the true value),$ {G}_{i} $ denotes the reconstruction field interpolated to the corresponding station$ i $ , and$ N $ denotes the number of stations.For spatial distribution evaluation of the model downscaling results, we use PSNR and SSIM (Wang et al., 2004). The formulas are as follows:
$ {I}_{\mathrm{m}\mathrm{a}\mathrm{x}} $ refers to the bit depth of the data, and the value of$ {I}_{\mathrm{m}\mathrm{a}\mathrm{x}} $ is 255 for the natural image uint8 data type. Therefore, when calculating the PSNR of the temperature field, we convert the data range to (0, 255) so that$ {I}_{\mathrm{m}\mathrm{a}\mathrm{x}} $ can be calculated as 255.In Eq. (10),
$ {\mu }_{G} $ is the mean value of the reconstruction field,$ {\mu }_{O} $ is the mean value of the real high-resolution field (i.e., the second “true value,” the data from HRCLDAS),$ {\mathrm{\sigma }}_{\mathrm{G}} $ is the standard deviation of the reconstruction field,$ {\sigma }_{O} $ is the standard deviation of the real high-resolution field, and$ {\sigma }_{GO} $ is the covariance between the reconstruction field and the real high-resolution field.To compare with the results of CLDASSD, we use bilinear interpolation as the baseline for comparison experiments. The formula for bilinear interpolation is as follows:
$ Z\left({I}_{1},{J}_{1}\right) $ ,$ Z({I}_{1},{J}_{2}) $ ,$ Z\left({I}_{2},{J}_{1}\right) $ , and$ Z({I}_{2},{J}_{2}) $ are the variable values on the grid;$ Z\left({I}_{1},J\right)\;\mathrm{a}\mathrm{n}\mathrm{d}\;Z\left({I}_{2},J\right) $ are the interpolation results obtained after linear interpolation on the latitudes$ {I}_{1} $ and$ {I}_{2} $ , respectively; and$ Z\left(I,J\right) $ is the value at a specific position after interpolation.To illustrate the contribution of the attention mechanism and global skip connection proposed in this paper, we conduct a series of ablation experiments (sub-models) as shown in Table 1.
Model Structure CLDASSD_w The entire network has only a simple stack of residual structures. CLDASSD_a Based on the residual structure, we added the attention unit we designed. CLDASSD_g Based on CLDASSD_w, a global skip connection is added. CLDASSD This model has both a global skip connection and an attention mechanism. Table 1. Design framework of several ablation experiments (sub-models). It is used to evaluate the contribution of global skip connection and attention mechanism.
-
In the following results and discussion, it is assumed that the evaluation metrics results are all in the test set if no specific explanation is given. Except for PSNR and SSIM, we take the products from HRCLDAS as the “true value”; other metrics take site observation data as the “true value.”
First, the average of the evaluation metrics on the test set is given in Table 2. Compared with bilinear interpolation, our model provides an improvement, especially the improvement of SSIM by approximately 0.2. The apparent improvement of the visual evaluation metric, SSIM, can preliminarily illustrate the great potential of CLDASSD in estimating the structural features of high-resolution temperature fields.
Metrics Bilinear CLDASSD_w CLDASSD_a CLDASSD_g CLDASSD RMSE 1.37 1.34 1.34 1.31 1.30 MAE 0.97 0.95 0.95 0.94 0.93 PSNR 29.90 30.68 30.81 31.06 31.21 SSIM 0.35 0.57 0.59 0.59 0.60 Table 2. The average statistics for each metric in the test set (bold stands for the best). It can be seen that there is an improvement in SSIM.
To explore the model’s ability to reconstruct the temperature field under different types of terrain, we divide the DEM maps of the research area into an 8 × 8 chessboard, i.e., 64 small, uniform patches; the ID of each small patch is shown in Fig. 5. Then, we classify each patch into one of four terrain types (plain, water body, mountain, and plateau). Table 3 shows the specific content of each terrain set. Table 4 shows the evaluation results under the four terrain types. Compared with bilinear interpolation, CLDASSD provides a more substantial improvement in mountain areas, where the RMSE can be reduced by approximately 0.1°C, and a small improvement in plain, water body, and plateau areas. These results show that CLDASSD is particularly outstanding in reconstructing areas with complex terrain gradients. The remainder of this section will combine daily and seasonal change to evaluate the performance of CLDASSD.
Figure 5. DEM map of the Beijing-Tianjin-Hebei region. We divided it into an 8 × 8 chessboard and assigned an ID to each patch.
Terrain type Set Mountain {5, 12, 13, 18, 19, 20, 21, 22, 25, 26, 27, 32, 33, 34, 40, 41, 48, 49, 56, 57, 60, 61} Plain {6, 7, 14, 15, 23, 28, 29, 30, 35, 36, 42, 43, 44, 50, 51, 52, 53, 55, 58, 59} Water body {31, 37, 38, 39, 45, 46, 47, 54, 62, 63} Plateau {0, 1, 2, 3, 4, 8, 9, 10, 11, 16, 17, 24} Table 3. Classification of all the patches in Fig. 5. We divided them into four types, i.e., mountain, plain, water body, and plateau.
Metrics Type Bilinear CLDASSD_w CLDASSD_a CLDASSD_g CLDASSD RMSE Plain 1.01 1.01 1.01 1.0 0.99 Water body 1.19 1.18 1.19 1.19 1.17 Mountain 1.61 1.55 1.55 1.51 1.50 Plateau 1.35 1.33 1.33 1.33 1.32 MAE Plain 0.73 0.73 0.73 0.73 0.72 Water body 0.87 0.86 0.86 0.87 0.85 Mountain 1.16 1.13 1.13 1.1 1.11 Plateau 1.0 0.99 0.99 0.99 0.98 Table 4. The reconstruction field for each time was divided into four different terrain types mentioned in Table 3 for evaluation. Although the RMSE is the highest in the mountainous area, it has the biggest improvement compared with that in bilinear interpolation (bold stands for the best).
-
We evaluate the model according to different daily times (we use UTC times), and Fig. 6 shows the evaluation results. Figure 6a shows that the RMSE at 0600 UTC is the lowest, reduced by 0.13°C compared with that in bilinear interpolation. It is noteworthy that 0600 UTC occurs at noon in the local area when the spatial distribution of temperature has the strongest correlation with terrain elevation. As an auxiliary element, the terrain elevation data can provide more detailed information for the model at that time.
Figure 6. These four figures show the evaluation results of different metrics of different models in daily times (All times are coordinated universal time, UTC). (a), (b), (c), and (d) represent RMSE, MAE, PSNR and SSIM, respectively. CLDASSD performs best at 0600 UTC, i.e., noon in the local area.
To evaluate the ability of CLDASSD to capture the spatial features of daily change in the temperature field, we use four daily times on 15 April 2019, as an example. Specific spatial details of the reconstruction are shown in Fig. 7. It can be seen that the spatial distribution of our model output is more similar to ground truth than that produced by bilinear interpolation (our models include CLDASSD and all sub-models). Specifically, our models have obvious advantages in the fine-scale reconstruction, noting that the output of bilinear interpolation is not detailed due to its averaging scheme. Our models also perform better in terms of metrics than bilinear interpolation in RMSE and SSIM; overall, CLDASSD is superior.
Figure 7. Analysis from 15 April 2019 used as an example to show the daily change of the spatial distribution of the temperature field in a fixed mountainous area (the area in the black box). (a), (b), (c), and (d) denote 0000, 0600, 1200, and 1800 UTC, respectively.
In summary, regardless of the visualization of spatial distribution or evaluation metrics, the reconstruction field of CLDASSD at each daily time is close to the “double true values,” which shows that CLDASSD has performance robustness regarding daily changes.
-
In this section, our models are re-evaluated according to season. The evaluation metrics results are shown in Fig. 8. Based on bilinear interpolation, CLDASSD has a similar improvement of RMSE for each season, with an average of approximately 0.07°C. Among the four seasons, the lowest RMSE is observed in summer, likely because the plains area is greatly affected by the summer monsoon. However, the plateau and mountainous areas are less affected by the summer monsoon. Thus, the temperature difference is most affected by terrain, and CLDASSD can make full use of the terrain’s data to reconstruct fine-scale details that are not observable in the coarse-scale temperature field. However, in winter (DJF), we found that the model fares much worse than in summer. The main reasons which explain this, center around the facts that the latent and sensible heat fluxes in winter are not as strong as in summer, and the spatial distribution of temperature is less affected by topography compared to summer. The auxiliary information input of our model only adds a DEM, resulting in poor results in winter. We also mention in section 4.3 that we will consider adding more factors that affect the spatial distribution of temperature in future work.
Figure 8. These four figures show the evaluation results of different metrics of different models in all seasons. (a), (b), (c), and (d) represent RMSE, MAE, PSNR, and SSIM, respectively. CLDASSD performs best in summer.
We select the seasonal representative day (the first of the month) at 0600 UTC, and the outputs of CLDASSD and bilinear interpolation are compared, as shown in Fig. 9. It can be found that in mountain and plateau areas, CLDASSD can estimate several subtle textures more accurately than bilinear interpolation. However, CLDASSD cannot evaluate small disturbances in the plains area, as can HRCLDAS products. Ground truth, bilinear interpolation, and CLDASSD are consistent in the water body, reflecting an insufficient improvement in water body representation.
Figure 9. Analysis from the first day in each representative month of four seasons at 0600 UTC as an example to compare the seasonal change in reconstruction fields of bilinear interpolation and CLDASSD. The leftmost column is the product of HRCLDAS, the middle column is the output of bilinear interpolation, and the rightmost column is the output of CLDASSD. (a), (b), (c), and (d) denote the first day in January, April, July, October, respectively.
Reconstructing the subtle disturbances in the areas of the plains are not the primary task of our experiments. Such disturbances may arise from various physical processes. Moreover, for the plains area, the spatial gradient of the temperature field is not large, so ordinary interpolation methods can also reconstruct the fine-scale temperature field with very small error. However, temperature reconstruction for complex terrain requires the precise spatial distribution of temperature. Moreover, CLDASSD can estimate the fine texture of the temperature field, similar to HRCLDAS products, under complex terrain gradients. Moreover, from seasonal change, CLDASSD can reconstruct detailed textures in complex mountainous areas in every season. This robustness is similar to the robustness in daily change.
-
We have discussed the evaluation results according to daily times, seasons, and space. In terms of evaluation metrics and spatial distribution, CLDASSD has a lower RMSE than bilinear interpolation and can present finer-scale details.
To further compare the output quality of CLDASSD and bilinear interpolation on the test set, it can be seen in Fig. 10 that the bias of bilinear interpolation is basically between –0.1°C and –0.2°C. Furthermore, CLDASSD has an improvement of approximately 0.1°C. In addition, the bias in summer is stable and close to zero, which echoes the previous analysis. We then determine the COR and RMSE frequency between the outputs of CLDASSD and bilinear interpolation, as shown in Fig. 11. The reconstructed field, with a COR greater than 0.98 and RMSE greater than 0.75°C, demonstrates an improvement.
Figure 10. The line chart of bias compares the bias of bilinear interpolation and CLDASSD on the test set. Because a different amount of data is discarded by quality control each month, the axis scale is not uniform.
Figure 11. The frequency of COR and RMSE for bilinear interpolation and CLDASSD. It is clear that the output quality of CLDASSD is better than that of bilinear interpolation.
However, CLDASSD still has shortcomings. As discussed in section 4.2, CLDASSD performs only slightly better than bilinear interpolation in plains areas, but some subtle disturbances were absent or not well-simulated. This shortcoming is due to the small terrain gradient in plains areas, whereby the spatial distribution of temperature is less affected by the terrain. Therefore, in future work, we will select the underlying surface elements that influence the small disturbances in the plain areas, such as land cover, land utilization, ground incident solar radiation, and ground surface albedo. In addition, there is almost no improvement in the water area. This shortcoming may be due to the direct use of the same background field between CLDAS and HRCLDAS, precluding us from obtaining better results.
In addition to the above analysis of the advantages and disadvantages of the model results, we want to re-emphasize the advantages of the downscaling method based on deep learning in engineering. First, the deep learning model is a powerful feature extractor, which saves us from complicated feature engineering (e.g., accurately selecting sensitive factors related to temperature) during the data preparation phase. Second, many physical parameters need to be adjusted before running the test in a physical model, but the deep learning model only needs to adjust a few parameters that are independent of physics. Finally, the biggest advantage of the deep learning model in downscaling tasks is that it saves considerable amounts of computing power (Reichstein et al., 2019). It only needs a few GPUs to complete our needs, which is especially useful when the research area is larger and the resolution is higher, because the time complexity of the physical model will increase exponentially.
Finally, the downscaling task involves complex physical processes, and our experimental results still have room for improvement. We also need to have a deeper understanding of the use of deep learning for downscaling, such as designing more suitable models for downscaling tasks.
Model | Structure |
CLDASSD_w | The entire network has only a simple stack of residual structures. |
CLDASSD_a | Based on the residual structure, we added the attention unit we designed. |
CLDASSD_g | Based on CLDASSD_w, a global skip connection is added. |
CLDASSD | This model has both a global skip connection and an attention mechanism. |