-
The flowchart in Fig. 1 illustrates the general structure of this work and the way pre-processing and prediction steps are carried out. During the pre-processing stage, ML-based models will be trained and tested using different features, ML techniques, and parameters, which will be detailed in this section. Then, an optimal model will be selected and used for predictions. The model evaluations will be presented in section 3.
-
As a member of the Multifunction Transport Satellite series, the geostationary Himawari-8 satellite was launched on 7 October 2014 and is located above 140.7°E for observation of Earth’s surface, atmospheric moisture, clouds, and the environment (Bessho et al., 2016; Shang et al., 2017; Wang et al., 2018; Letu et al., 2020). The AHI is an important instrument onboard Himwari-8 and provides images at 16 spectral bands with central wavelengths ranging from 0.46 to 13.3 μm, with a temporal resolution of 10 minutes for full-disk and spatial resolutions ranging from 0.5 to 2.0 km (Min et al., 2017; Wang et al., 2019).
CALIOP onboard CALIPSO is a two-wavelength polarization-sensitive lidar that provides continuous measurements on vertical cloud and aerosol structures (Stephens et al., 2002; Winker et al., 2007). The collocated CALIOP Level 2, 1 km cloud layer product provides a reliable assessment of cloud mask labels for our model training and validation (Heidinger et al., 2012). After collocation, if there are one or more layers of clouds in a CALIOP pixel, it is considered to be a cloudy one. Otherwise, the pixel is defined as a clear one. We find all CALIOP pixels (1 km resolution) within the collocated AHI pixel (spatial resolution of 5 km) with an observational time difference of fewer than 10 minutes, and only homogeneous AHI pixels (i.e., all collocated AHI pixels being cloudy or clear ones) are considered in the training dataset to ensure its quality for training.
A collocated AHI and CALIOP dataset spanning the entirety of 2017 is built, and over 420 000 AHI pixels are labeled. We randomly separate the dataset into a training (70%) and a testing (30%) subset, and the latter is used to tune and to find the optimal ML model parameters. This particular training and testing split is widely used for ML algorithm development to avoid overfitting. CALIOP VFM product and MODIS cloud mask products will also be considered for comparison and evaluation in section 3.
-
Features, i.e., input parameters for cloud detection, can be chosen by considering the characteristics of different spectral bands with respect to cloudy or clear atmospheres. The analysis of previous cloud mask algorithms (either threshold-based or ML-based models) provides us with the best suggestions on possible feature selection. Table 1 gives examples of features used by recent cloud mask or cloud classification algorithms. Aside from direct radiative variables such as R, BTs, and BTDs, auxiliary data such as observational geometries, geolocation information, and surface properties are also used as input datasets. Most of the predictors are chosen from those considered by previous cloud detection algorithms and these predictors have physical support. For example, window band BT (11.2 μm) normally represents cloud top temperature and is one of the most important and widely-used channels to distinguish cloudy and clear pixels (Strabala et al., 1994). Saunders and Kriebel (1988) use BTD (11–12 μm) to detect cirrus clouds because BTDs over clouds are greater than those in the absence of clouds. The low cloud test BTD (3.9–11 μm) is based on the differential absorption of water and ice cloud particles between these wavelengths (Platnick et al., 2003). This study uses only radiative variables (and surface characteristics) as input parameters to avoid possible influences due to collocation, noting that only local afternoon observations are considered because of CALIOP passing time.
References Feature parameters Auxiliary parameters Satellite Lyapustin et al., 2008 R (0.64 μm), R (0.47 μm), R (0.55 μm), R (0.86 μm),
R (1.24 μm), R (2.11 μm), BT (11.03 μm)No MODIS Chen et al., 2018 R (0.47 μm), R (0.55 μm), R (0.66 μm),
R (0.86 μm), R (1.24 μm), R (2.13 μm)SZA, VZA, RAZ, Surface elevation MODIS Zhang et al., 2019 R (0.64 μm), BT (3.85 μm), BT (7.35 μm),
BT (8.6 μm), BT (11.2 μm), BT (12.35 μm),
BTD (11.2–3.85 μm), BTD (11.2–7.35 μm),
BTD (11.2–8.6 μm), BTD (11.2–12.35 μm)VZA, Ts, Lat, Lon AHI Gomis-Cebolla et al., 2020 R (0.64 μm), R (0.47 μm), R (0.45 μm),
R (0.86 μm), R (2.13 μm), R (1.38 μm)No MODIS Wang et al., 2020 R (0.86 μm), R (1.24 μm), R (1.38 μm), R (1.64 μm),
R (2.25 μm), BT (8.6 μm),BT (11 μm), BT (12 μm)VZA, Ts, Lat, Lon VIIRS *VZA=View Zenith Angle, SZA= Solar Zenith Angle, RAZ= Viewing zenith angle, Ts=Surface skin temperature, Lat=Latitude, and Lon=Longitude. Table 1. Comparison for some recent ML-based cloud detection and classification algorithms for spectral radiometers.
Cloud detection algorithms with and without solar band observations are developed as daytime and nighttime versions, respectively. The daytime algorithm refers to the algorithm with solar band reflectance considered, so it is only available during the local daytime. The nighttime algorithms that exclude solar-related parameters can be used for all-time observations. For a fair comparison during algorithm development and validation, all collocated AHI and CALIOP observations are from local daytime. In this way, we used the same dataset for the daytime and nighttime algorithm, and the two algorithms differ on whether the solar band reflectance was used. The following brightness temperatures are considered for both algorithms: BT (3.85 μm), BT (7.35 μm), BT (8.6 μm), BT (11.2 μm), BT (12.35 μm), BTD (3.85–11.2 μm), BTD (11.2–7.35 μm), BTD (8.6–11.2 μm), and BTD (11.2–12.35 μm). The solar band reflectance channels include: R (0.64 μm), R (0.86 μm), R (1.61 μm), and R (2.25 μm), and are solely used in the daytime model.
-
As mentioned above, the surface is a special but important variable influencing cloud detection. Clear desert pixels might be erroneously detected as cloudy in the daytime due to the higher albedo and emissivity of desert sand (Ackerman et al., 1998), so most algorithms develop independent models for different surface conditions. To better eliminate the negative impact of surface features on cloud detection, three different methods are introduced to treat the surface. Assume that there are N surface types (ST), referred to as ST1, ST2…STN. The first model (Model #1) develops separated ML models for each surface type, and N independent models will be achieved, each of which handles only observations from the particular surface type. Model #2 adds an input parameter, i.e., surface type, as a new feature, and each type of surfaces is specified by an integer from 1 to N. In this way, only one ML-based model is needed for all observations, but the integers may misrepresent the physical differences among different surfaces. To avoid such misrepresentation by using a single integer, Model #3 is similar to Model #2, but adds N binary parameters, i.e., an additional parameter (a binary variable) for each surface type. To be more specific, if the observation is over the nth surface, its nth surface variable will be defined as one, while all others are zero. The three models differ only on how the surface types are considered, further noting that all radiative features, ML models, and observational datasets are kept the same.
Figure 2 illustrates the structures of the three models. Each quadrangle in the figure represents a ML-based algorithm, and Model #2 and Model #3 also illustrate how surface variables are defined. Considering the coverage of AHI, this study considers four surface types, i.e., ocean, forest, land and desert, and the MODIS Land Cover Climate Modeling Grid Product from MODIS Collection 6 annual surface type product MCD12C1 is considered (Loveland and Belward, 1997; Sulla-Menashe and Friedl, 2018). There are fewer observations over ice or snow surfaces in the covered area, so we have not yet included them in our model.
-
Two popular supervised ML methods are considered, ANN and RF (Swami and Jain, 2013), because their performances have been well justified (Chen et al., 2018; Gomis-Cebolla et al., 2020; Wang et al., 2020). We pay more attention to the construction of the algorithms, e.g., preparation of the training dataset, feature selection, and surface treatment, as opposed to the particular ML techniques, because the latter is responsible for fewer differences in the results. Thus, we only consider ANN and RF in this algorithm, and others may be tested similarly in future studies.
ANN is a multilayer perceptron model consisting of an input layer, a hidden layer, and an output layer by seizing the non-linear relationship between input and output variables. As a simple binary classification problem, we consider only one hidden layer, and the sigmoid function is used as the activation function. The neuron number is chosen between 5 and 100 in steps of five to find the optimal value. RF considers an ensemble of decision trees and uses bagging to train the model (Breiman, 2001). Two important parameters in the RF model are the number of trees and the maximum depth of the tree. We test the number of trees varying from 100 to 500 in steps of 100, and the maximum depth from 10 to 50. Table 2 lists the tuning parameters for the two ML models.
Scenario AHI cloudy AHI clear CALIOP cloudy TP FN CALIOP clear FP TN Table 2. Contingency matrix of evaluation of cloud detection results by comparing with CALIOP results.
The optimal parameters of the ML algorithms are obtained by grid searches. Here, we define “accuracy” as the ratio of the number of pixels (samples), which are correctly detected by our algorithm (according to CALIOP results), to that of the total pixels. Figure 3 gives the accuracy values of the ANN and RF algorithms with different parameters. The accuracies for the best ANN daytime and nighttime model are 0.88 and 0.80, respectively. The accuracy of the best daytime model for RF is as large as 0.94, and that for the nighttime model is 0.87. Evidently, the algorithm performance is not significantly sensitive to the model parameters with the accuracy generally varying around approximately 0.03. For the daytime algorithm, the best neuron node parameter for ANN is 11, and the best ntrees and mdepth parameters for RF are 200 and 20, respectively. Larger mdepth may lead to overfitting, so two relatively small parameters are used to guarantee the robustness of the models (Scornet, 2018). For the nighttime model, the best neuron node parameter for ANN is eight, and the best ntrees and mdepth parameters for RF are 100 and 10.
Through feature selection, the contribution of each feature to the algorithm is calculated. For ANN, a “f_classif score” is obtained based on the analysis of variance. The higher the score of an interest field, the more the feature contributes to the cloud detection. For the RF algorithm, the importance of a feature can be illustrated by the “mean decrease gini”, and larger ”mean decrease gini” values correspond to features that are more “useful” for the detection.
Figure 4 shows the f_classif score and mean decrease gini for the two models to demonstrate the feature importance. The tests are performed using the entire training dataset including observations from all surfaces. The six most influential parameters in the daytime ANN algorithm are BTD (11.2–7.35 μm), BT (12.25 μm), BT (11.2 μm), BT (8.6 μm), BTD (3.85–11.2 μm), and R (0.64 μm). For the RF-based algorithm, BTD (11.2–7.35 μm), BT (12.25 μm), BTD (3.85–11.2 μm), BT (11.2 μm), R (0.64 μm), and BTD (11.2–8.6 μm) are the six more important inputs. Clearly, the physically important bands and combinations all rank relatively high here. It should also be noticed that the two water vapor bands, i.e., BT (7.35 μm) and BT (3.85 μm), contribute less to the two ML-based algorithms. Meanwhile, we have also considered geolocation and solar-viewing geometries for tests, and their contributions are relatively limited. Thus, we retain only radiative information in the model.
Figure 4. F_classif scores (top panels) in the ANN algorithm and mean decrease gini (bottom panels) in the RF algorithm for both daytime (left) and nighttime (right) algorithms.
Figure 5 gives the feature contribution under three surface models based on the RF algorithm. Figures 5a–5d are feature contributions of four specific surface types based on Model #1, and Fig. 5e and Fig. 5f are for Model #2 and Model #3, respectively. For ocean surfaces, both solar and infrared bands, e.g., BTD (3.85–11.2 μm), R (0.64 μm), and BTD (11.2–7.35 μm), correspond to higher feature importance values, and, for other surfaces, longwave infrared BT differences such as BTD (11.2–7.35 μm), BTD (11.2–12.35 μm), and BTD (11.2–8.6 μm) are more important. For Model #2 and Model #3, the surface variables don’t rank high (not in the top six), but clearly show different impacts on the models by modifying the orders of the radiative parameters. The relative performances of the three surface models will be evaluated in the following section.
Figure 5. Mean decrease gini (bottom panels) in the RF algorithm for different surface models: (a–d) for Model #1 with four surface types, (e) for Model #2, and (f) for Model #3.
It is worth mentioning that CALIPSO is in the afternoon orbit, so the collocated AHI and CALIOP observations include only local afternoon data, which could bring uncertainties for the all-time cloud detection algorithm. To avoid such influences, our algorithms consider only direct radiance observations, auxiliary information such as pixel position and viewing geometries (especially solar zenith) are not included. In this way, the algorithms would be less dependent on the time of observation, viewing geometries, or the spatiotemporal distributions of the clouds.
References | Feature parameters | Auxiliary parameters | Satellite |
Lyapustin et al., 2008 | R (0.64 μm), R (0.47 μm), R (0.55 μm), R (0.86 μm), R (1.24 μm), R (2.11 μm), BT (11.03 μm) | No | MODIS |
Chen et al., 2018 | R (0.47 μm), R (0.55 μm), R (0.66 μm), R (0.86 μm), R (1.24 μm), R (2.13 μm) | SZA, VZA, RAZ, Surface elevation | MODIS |
Zhang et al., 2019 | R (0.64 μm), BT (3.85 μm), BT (7.35 μm), BT (8.6 μm), BT (11.2 μm), BT (12.35 μm), BTD (11.2–3.85 μm), BTD (11.2–7.35 μm), BTD (11.2–8.6 μm), BTD (11.2–12.35 μm) | VZA, Ts, Lat, Lon | AHI |
Gomis-Cebolla et al., 2020 | R (0.64 μm), R (0.47 μm), R (0.45 μm), R (0.86 μm), R (2.13 μm), R (1.38 μm) | No | MODIS |
Wang et al., 2020 | R (0.86 μm), R (1.24 μm), R (1.38 μm), R (1.64 μm), R (2.25 μm), BT (8.6 μm),BT (11 μm), BT (12 μm) | VZA, Ts, Lat, Lon | VIIRS |
*VZA=View Zenith Angle, SZA= Solar Zenith Angle, RAZ= Viewing zenith angle, Ts=Surface skin temperature, Lat=Latitude, and Lon=Longitude. |