Abstract:
Cloud classification is a crucial aspect of meteorological research and operational applications. Traditional methods are limited by the spectroscopic approaches of single satellite data and struggle to capture information relating to the vertical structure of clouds. This study proposes a multisource satellite data fusion cloud classification method based on the random forest algorithm. By virtue of the high spatial and temporal resolution radiation data of the Fengyun-4A (FY-4A) satellite and vertical observation data from the CloudSat Cloud Profiling Radar (CPR), a cloud type identification and cloud height inversion model is constructed. Training data were acquired from the radiation data of 14 FY-4A channels (including visible light, shortwave infrared, and longwave infrared bands), and the cloud classification mask parameters of CloudSat were matched spatially and temporally. Specific channel combinations are selected as input features according to the physical characteristics of channel radiation and statistical laws. A classification model based on the random forest algorithm is then used to achieve fine-grained cloud type identification, and a regression model is used to predict the cloud base and cloud top heights according to the classification results obtained. In addition, this study proposes a hierarchical architecture random forest model (Model B) in addition to the common random forest model (Model A) for classification. Comparisons of the results obtained demonstrate that the classification accuracies of the models in the daytime scene reach 94.2% (Model A) and 95.7% (Model B). Under the condition of ignoring clear-sky conditions, the determination coefficient (R
2 Score) of cloud top height regression are close to 0.98 and 0.99, respectively. In the night scene, limited by the lack of short-wave channels, the classification accuracies are 92.0% and 93.08%, respectively. Finally, two typhoon cases are used in this study for comparative analysis to verify the highly efficient identification ability of the model for strong convective cloud systems. However, the inversion accuracy of the cloud classification model in low cloud and thin fog scenes needs further improvement. The developed approach utilizes the advantages of geostationary satellites and polar-orbiting satellites, conducts cloud retrieval aided by the random forest algorithm, and improves the interpretability of the resulting model. This provides a technical reference for meteorological applications of multisource satellite collaborative observations and machine learning integration.