-
During model development, the gas absorption/transmittance and radiance datasets are calculated in advance (the blue boxes in Fig. 2), and representative elements are selected for the two PCA/NN simulations. This section details feature selection and the two PCA/NN procedures. Note that the datasets must be built backward during model development because the spectral information (from radiances to gas transmittances) is compressed to a minimum. The representative variables are also chosen backward among the necessary ones.
-
For the radiance domain compression, the monochromatic radiances best suited for the prediction of others should be chosen first. Both Liu et al. (2006) and Liu et al. (2020) have developed methods to choose representative radiances for the PCA-based simulations, and a more straightforward one is developed for our model. As shown in Fig. 4, the selection is mainly based on the correlations among the full monochromatic radiances. If the full set of radiances contains a total of M elements with values from many different atmospheric profiles, the correlation matrix of the radiances can thus be obtained, which indicates the linear correlations among different elements. Then, the sum of the absolute values for each row/column can quantify the correlation of each element with all the other elements, and it can be expressed as a vector with M elements. Larger absolute values indicate that the elements have stronger correlations with other elements and correspond to more informative radiances. Thus, the element that gives the largest summation is chosen first, and all other elements that have high correlations with the selected one would be eliminated from the chosen list (because the selected one well-represents them). We consider a correlation criterion of 0.95 for this step. After the first selection and removal, the correlation matrix of the remaining elements and the absolute value summation vectors are recalculated, and the chosen procedure is repeated until all elements are either selected or eliminated. Meanwhile, a small fraction of elements that give relatively larger errors is added to the representative dataset. This procedure is performed for both transmittances of different gases and monochromatic radiances.
Figure 4. Procedure to choose the representative elements based on their correlations among different elements.
Figure 5 illustrates the corresponding results for the LWB (left panels) and MWB (right panels). The upper panels give the absolute values of the correlation coefficients among the monochromatic radiances. Higher correlations (yellow) are noticed among the independent gas absorption wavenumbers, e.g., carbon dioxide within the 680–1000 cm–1, ozone within the 700–1050 cm–1, and water vapor within the 1500–2000 cm–1 range. Through the correlation distributions and the feature selection mentioned above, we select 440 wavenumbers for GIIRS LWB (among 18 161, i.e., 2.4%), and 650 of these are selected for the MWB (among 24 161, i.e., 2.6%); their distributions in the spectrum domain are given in the bottom panels. The chosen wavenumbers are, for the most part, evenly distributed within the spectrum, with more elements chosen between 700 and 800 cm–1 and 2000 and 2200 cm–1 (the strong absorption spectral range), yielding a selected spectral set that is more representative.
Figure 5. (top) Correlation coefficients between monochromatic radiances within each spectrum and (bottom) spectral distributions of numbers of selected representative wavenumbers.
Figure 6 shows the summations of the correlation coefficient absolute values for each wavenumber, which are key criteria for the feature selection, noting that the features are rearranged in increasing order. The summation is normalized by dividing the total number of wavenumbers within each spectrum, ensuring a value between 0 and 1. For the LWB, over half of the wavenumbers show summations of ~0.8, indicating redundant information among many monochromatic radiances, as the summation slowly decreases to ~0.2. The right panels show the density distributions among the correlation summation domain, and clearly, radiances with higher summations are less likely to be chosen. The results for MWB have slightly different summation variations, i.e., over 90% of points have summations over 0.4, and similar distribution features are noticed. Overall, approximately 2% of the points are chosen when the summation is over ~0.5, and less than 10% is enough when the summations are relatively small.
-
Unlike the monochromatic radiances that are functions of only the wavenumber, the gas transmittances/absorptions are in both atmospheric and spectral domains. Since there is clear overlapping information in both dimensions, they are treated together, i.e., gas absorption coefficients at each pressure level and each wavenumber are consolidated into a whole dataset. The wavenumbers (only the selected ones for radiance PCA/NN, i.e., Nwvn = 440 and 650 for LWB and MWB, respectively) and pressure layers (100 layers between 0.005 and 1010 hPa) are fixed, so only the temperature difference is considered as a variable for dataset development. A similar feature selection is performed for each gas component. Considering that full gas transmittances at Nwvn wavenumbers and NLayer altitudes (a total of ~50 000) are needed, 1000–1500 representative absorption coefficients (Ntrans) selected for each absorbing gas for each compression in each band, which also constitute ~2% among all transmittances for this step. Note that the calculations are all conducted as logarithmic absorption coefficients.
This study considers H2O, O3, and CO2 as absorbing gases, and examples of atmospheric transmittances in LWB are illustrated in Fig. 7. The results are for column transmittances from 1000 profiles and are given by 100 atmospheric layers. Only values at the selected wavenumbers are illustrated because the others are not needed for radiance simulations. In our model, those transmittances are also compressed and calculated by PCA/NN. The selected points within the spectral and atmospheric vertical height domain for gas transmittance calculations are given in the right panels. Due to the high correlations among the gas absorption coefficients, only a few elements are chosen based on our method, and more elements are added uniformly among the pressure and spectral domain to improve accuracy. There are more selections for spectral bands with strong absorption and fewer selections for spectral bands with transmittance close to 1, i.e., almost no absorption.
Figure 7. An example of column gas transmittances (left panels) for H2O, O3, and CO2 in the GIIRS longwave bands and the corresponding locations (right panel) of representative points selected in the pressure and spectral domain. Each dot in the right panels indicates one position with the given wavenumber and pressure, where the transmittance is chosen as our representative variable.
Lastly, only the gas transmittances at a highly compressed number of wavenumbers and atmospheric conditions are needed, i.e., those in the right panel of Fig. 7. Absorption coefficients for any new profiles are calculated by interpolation, which is given by pre-calculated, gas absorption coefficient look-up tables (LUTs). Because of the small number of transmittances required, such LUTs are small and can be easily implemented. Furthermore, each gas component is considered independent, and the interpolation can be performed quite accurately. The LUTs consider a wide range of temperatures from 160–310 K, with an interval of 2 K.
To summarize, Table 1 lists the specific parameters constructed for our model. The first three columns give the numbers required for full dataset simulations or for accurate LBL simulations, i.e., number of atmospheric layers (MLayer), number of full spectral wavenumbers (Mwvn), and their product (number of full gas transmittances, Mtrans). The remaining columns are for the numbers required for our spectral compressed simulations. From the above selections, we can get the set of representative spectra Nwvn that are much smaller than Mwvn, but carry the most useful information.
Band MLayer Mwvn Gas Trans
MtransComp-Trans ${{N} }_{\rm{trans} }$ Comp-Rad
${{N} }_{\rm{wvn} }$Comp-Trans Fraction Comp-Rad
FractionH2O O3 CO2 LWB 100 18 161 1 816 100 1000 1100 1500 440 ~ 0.1% 2.4% MWB 100 24 161 2 416 100 1100 1300 1200 650 ~ 0.1% 2.6% Table 1. Specific parameters of the new model for the GIIRS Long-Wavenumber Band and Middle-Wavenumber Band. “Comp” in the table represents the number used for our compressed variables.
For the PCA calculation, the PCs can be stored in advance from the results of our training atmospheric profiles and LBL simulations. This is done during the model development and is not repeated for model application. After testing, we selected the first 200 PCs to expand the data for both transmittance and radiation.
For the NN-based transmittance model, we set up a two-layer learning algorithm. The mean squared error and Rectified Linear Unit (ReLu) are used as the loss and activation functions, respectively, and the number of hidden neurons is set to 2000 (Glorot et al., 2011; Taylor et al., 2016; Le et al., 2020). Here, the activation function adds non-linear inputs to the neuron, and the ReLu is computationally efficient and accelerates the training convergence rate (Le et al., 2020). Similarly, the radiation-dimension model uses a three-layer ReLu model to account for the more complex relations among the monochromatic radiances, and the numbers of hidden neurons are set to 1000 and 2000, respectively. The number of iterations for both models was set to 2000, which is large enough to achieve stable results. Notably, many NN models with different variables and values were tested, and the value results mentioned above are closest to the accurate ones.
-
Figure 8 compares the column gas transmittances at the selected wavenumbers given by the LBL and our PCA- and NN-based simulations, which are the outputs for our first extension. Again, a testing dataset with 80 EC-Global profiles is used. The top panels are for the averaged transmittances, noting that both PCA and NN results closely agree with the LBL ones. The relative errors (REs) of the PCA-based and NN-based results are illustrated in the middle and bottom panels, respectively. The relative errors are defined according to the maximum transmittances within each spectral band, following the definition by Liu et al. (2020). The blue curves represent errors from each of the 80 EC-Global testing profiles, and the red ones indicate their average. The maximum REs for both models and both bands are under 1%, while most are under 0.1%, yielding average REs for each profile within ±0.2%.
Figure 8. (upper) Averaged gas transmittances and their relative errors (RE) given by the (middle) PCA-based and (bottom) NN-based simulations compared with the LBL results at only the representative wavenumbers. The independent 80 EC-Global testing profiles are used for comparison here. The black curves are for the REs of each profile, and the red ones are the averaged REs.
Given accurate gas transmittances, rigorous radiative transfer simulations and radiance domain PCA/NN are performed to give the monochromatic radiances with a spectral resolution of 0.025 cm–1. Figure 9 compares the radiances within the two bands given by LBL, PCA, and NN models, and the results are also based on the EC-Global profiles. Again, the differences between LBL-based and PCA/NN-based results are hardly noticeable, and the monochromatic radiance differences are on the order of 10–5 W m–2 Sr–1 cm. Such accuracy is similar to those reported by previous studies (Liu et al., 2006; Le et al., 2020). The PCA-based results are slightly better in the MWB, while the NN-based model is better in LWB. Note that the bias and the mean differences contained here include those from the two PCA/NN simulations as well as the absorption coeffeicient interpolation.
Figure 9. Similar to Fig. 8, but for the averaged hyperspectral revolution radiance and radiance differences and standard deviations of PCA-based and NN-based results.
After the high-spectral-resolution monochromatic radiance is obtained, it is straightforward to perform a convolution to give the channel-based radiances using the GIIRS SRFs; the resulting channel BTs are shown in Fig. 10. Here, we apply the GIIRS SRFs, as reported by Di et al. (2018). As expected, the accuracy is significantly improved after the convolution because the errors in each wavenumber can cancel or offset one other. Most channel BTDs are within 0.5 K, and the averages are on the order of (or less than) 0.1 K. In the MWB, a few profiles have a larger error in the NN performance because the training set samples should have greater coverage (Le et al., 2020). Relatively larger BTDs are noticed in the strong absorbing regions, e.g., around 700 cm−1 and 2200 cm−1, which could be caused by multiple reasons. The errors from gas overlapping absorption and radiance simulations within the strong absorbing spectral regions may accumulate. Furthermore, the BT variations within these regions are more significant and more sensitive to the atmospheric profiles, and a larger number of the atmospheric dataset may be needed to further improve the model performance here (Kan et al., 2020). Also, more representative wavenumbers can be added within the region of larger errors in future development.
Figure 10. Similar to Fig. 9, but for GIIRS convolution brightness temperature (BT) and corresponding BT differences.
It becomes clear that the PCA/NN simulations can maintain computational accuracy with a greatly reduced number of independent simulations for both gas transmittances and radiances. Table 2 summarizes the computational efficiency by comparing the multi-domain compression model to the LBL model. Considering the influences due to differences among computer processors, only the orders of computational times in units of seconds used for each step (64-bit processors with a frequency of 2.4 GHz) are listed. For the transmittance calculations, when both PCA/NN methods and interpolation are used, a speedup of three orders is obtained. For the radiance simulations, the improvement is almost proportional to the reduction of the independent RT simulations. Also, this study considers only clear-sky conditions, so the independent monochromatic simulations are highly efficient. Thus, the overall improvement of the computational efficiency is roughly three orders of magnitudes.
Transmittance Radiance Total LBL-based Model (s) 102 100 102 Our Model (s) 10–1 10–2 10–1 Speedup 103 102 103 Table 2. Comparison of computational time between our model and the LBL model. The computational times are scaled to the nearest order of magnitude in units of seconds.
Band | MLayer | Mwvn | Gas Trans Mtrans | Comp-Trans ${{N} }_{\rm{trans} }$ | Comp-Rad ${{N} }_{\rm{wvn} }$ | Comp-Trans Fraction | Comp-Rad Fraction | ||
H2O | O3 | CO2 | |||||||
LWB | 100 | 18 161 | 1 816 100 | 1000 | 1100 | 1500 | 440 | ~ 0.1% | 2.4% |
MWB | 100 | 24 161 | 2 416 100 | 1100 | 1300 | 1200 | 650 | ~ 0.1% | 2.6% |