Abstract:
Based on the International Best Track Archive for Climate Stewardship dataset (IBTrACS) and ERA5 850 hPa winds from June to November 1979–2020, the low-level, large-scale circulations associated with the tropical cyclogenesis over the western North Pacific can be clustered into five patterns using a self-organizing map. The five patterns are Monsoon Confluence (MC), Monsoon Gyre (MG), Strong Monsoon Trough (SMT), Weak Monsoon Trough (WMT), and Easterly Wave (EW). Tropical Cyclones (TCs) in the MC pattern form in the confluence zone south of the subtropical high, occurring in the highest proportion of TC geneses. Furthermore, cyclogeneses in the MG, SMT, and WMT patterns are affected by the cyclonic wind shear or the confluence zone related to the monsoon trough. The EW pattern with the smallest number of TC geneses features an EW directly evolving into a TC. A comparison is performed among the following three discriminant analysis models to select an optimal machine learning method for automatic pattern identification for a given TC circulation: Support Vector Machines (SVM),
k-nearest neighbors, and Random Forest (RF). The results reveal that the SVM achieves the highest accuracy of 0.965 and least sensitivity to imbalanced data, with the recall rate and precision exceeding 0.94 for each circulation pattern. Moreover, model sensitivity to dataset size is evaluated. The results indicate that the SVM model can most effectively capture characteristic signals from relatively limited training data.