基于随机森林算法的FY-4A与CloudSat卫星数据融合云分类及高度反演

骆鼎之; 许冬梅; 诸葛小勇; 闵锦忠; 费海燕; 沈菲菲; 孙启龙

doi:10.3878/j.issn.1006-9895.2509.25043

基于随机森林算法的FY-4A与CloudSat卫星数据融合云分类及高度反演

Retrieving Cloud Classification and Cloud Height using FY-4A and CloudSat Satellite Data via Random Forest Algorithms

摘要

摘要: 云分类是气象研究及业务化应用的关键环节，传统方法受限于单一卫星数据的光谱，难以捕捉云的垂直结构信息。本研究提出了一种基于随机森林算法的多源卫星数据融合云分类方法，利用风云四号A星（FY-4A）高时空分辨率的辐射资料与CloudSat卫星云剖面雷达（CPR）的垂直观测数据，构建了云类型识别及云高度反演模型。本文将FY-4A的14个通道辐射数据（包括可见光、短波红外和长波红外波段）与CloudSat的云分类掩码参数进行时空匹配从而获取训练数据。根据通道辐射物理特性和统计规律选取了特定的通道组合作为输入特征，利用基于随机森林算法的分类模型实现云类型精细化识别，并基于分类结果使用回归模型预测云底和云顶高度。此外，本文在普通随机森林分类模型（模型A）的基础上，进一步设计了分层架构随机森林分类模型（模型B）。通过对比发现：白昼场景下模型分类准确率分别达94.2%和95.7%。在忽略晴空的条件下，云顶高度回归决定系数（R²分数）分别接近0.98和0.99；夜晚场景受限于短波通道缺失，分类精度分别为92.0%和93.08%。最后，本文使用两个台风个例验证了云分类模型对强对流云系的高效识别能力。然而，本文设计的云分类模型在低云及薄雾场景的反演精度仍需改进。本研究利用静止卫星和极轨卫星各自的优点，借助随机森林算法进行云反演，并提升了模型的可解释性。本研究为多源卫星协同观测与机器学习融合的气象应用提供了技术参考。

Abstract: Cloud classification is a crucial aspect of meteorological research and operational applications. Traditional methods are limited by the spectroscopic approaches of single satellite data and struggle to capture information relating to the vertical structure of clouds. This study proposes a multisource satellite data fusion cloud classification method based on the random forest algorithm. By virtue of the high spatial and temporal resolution radiation data of the Fengyun-4A (FY-4A) satellite and vertical observation data from the CloudSat Cloud Profiling Radar (CPR), a cloud type identification and cloud height inversion model is constructed. Training data were acquired from the radiation data of 14 FY-4A channels (including visible light, shortwave infrared, and longwave infrared bands), and the cloud classification mask parameters of CloudSat were matched spatially and temporally. Specific channel combinations are selected as input features according to the physical characteristics of channel radiation and statistical laws. A classification model based on the random forest algorithm is then used to achieve fine-grained cloud type identification, and a regression model is used to predict the cloud base and cloud top heights according to the classification results obtained. In addition, this study proposes a hierarchical architecture random forest model (Model B) in addition to the common random forest model (Model A) for classification. Comparisons of the results obtained demonstrate that the classification accuracies of the models in the daytime scene reach 94.2% (Model A) and 95.7% (Model B). Under the condition of ignoring clear-sky conditions, the determination coefficient (R² Score) of cloud top height regression are close to 0.98 and 0.99, respectively. In the night scene, limited by the lack of short-wave channels, the classification accuracies are 92.0% and 93.08%, respectively. Finally, two typhoon cases are used in this study for comparative analysis to verify the highly efficient identification ability of the model for strong convective cloud systems. However, the inversion accuracy of the cloud classification model in low cloud and thin fog scenes needs further improvement. The developed approach utilizes the advantages of geostationary satellites and polar-orbiting satellites, conducts cloud retrieval aided by the random forest algorithm, and improves the interpretability of the resulting model. This provides a technical reference for meteorological applications of multisource satellite collaborative observations and machine learning integration.

HTML全文

参考文献(24)

施引文献

资源附件(0)

基于随机森林算法的FY-4A与CloudSat卫星数据融合云分类及高度反演

Retrieving Cloud Classification and Cloud Height using FY-4A and CloudSat Satellite Data via Random Forest Algorithms

联系我们