Advanced Search
ZHAO Yuhui, CHEN Guanghua, WANG Ziqing, et al. 2024. Applying Machine Learning in Clustering and Discriminant Analysis of Large-Scale Circulation Patterns Favorable for Tropical Cyclogenesis over the Western North Pacific [J]. Chinese Journal of Atmospheric Sciences (in Chinese), 48(2): 671−686. doi: 10.3878/j.issn.1006-9895.2208.22074
Citation: ZHAO Yuhui, CHEN Guanghua, WANG Ziqing, et al. 2024. Applying Machine Learning in Clustering and Discriminant Analysis of Large-Scale Circulation Patterns Favorable for Tropical Cyclogenesis over the Western North Pacific [J]. Chinese Journal of Atmospheric Sciences (in Chinese), 48(2): 671−686. doi: 10.3878/j.issn.1006-9895.2208.22074

Applying Machine Learning in Clustering and Discriminant Analysis of Large-Scale Circulation Patterns Favorable for Tropical Cyclogenesis over the Western North Pacific

  • Based on the International Best Track Archive for Climate Stewardship dataset (IBTrACS) and ERA5 850 hPa winds from June to November 1979–2020, the low-level, large-scale circulations associated with the tropical cyclogenesis over the western North Pacific can be clustered into five patterns using a self-organizing map. The five patterns are Monsoon Confluence (MC), Monsoon Gyre (MG), Strong Monsoon Trough (SMT), Weak Monsoon Trough (WMT), and Easterly Wave (EW). Tropical Cyclones (TCs) in the MC pattern form in the confluence zone south of the subtropical high, occurring in the highest proportion of TC geneses. Furthermore, cyclogeneses in the MG, SMT, and WMT patterns are affected by the cyclonic wind shear or the confluence zone related to the monsoon trough. The EW pattern with the smallest number of TC geneses features an EW directly evolving into a TC. A comparison is performed among the following three discriminant analysis models to select an optimal machine learning method for automatic pattern identification for a given TC circulation: Support Vector Machines (SVM), k-nearest neighbors, and Random Forest (RF). The results reveal that the SVM achieves the highest accuracy of 0.965 and least sensitivity to imbalanced data, with the recall rate and precision exceeding 0.94 for each circulation pattern. Moreover, model sensitivity to dataset size is evaluated. The results indicate that the SVM model can most effectively capture characteristic signals from relatively limited training data.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return