Abstract:
Based on ERA5 reanalysis data and Chinese surface basic meteorological observation data from 2000-2019, this study statistically analyzes the spatiotemporal distribution characteristics of various precipitation phases under complex terrain conditions in the Two Lakes region, optimizes physical factors, and constructs lightweight gradient boosting tree models based on data resampling techniques (ADASYN-LGBM and Hybrid-LGBM), as well as a fully connected neural network model incorporating FocalLoss function (FocalLoss-MLP). Finally, a soft voting strategy is employed to ensemble individual models (Ensemble). The results show that: The spatial distribution characteristics of winter precipitation phase frequency in the Two Lakes region are related to topography, circulation, and climate background. Influenced by the blocking effect of the Nanling Mountains and the convergence of cold and warm air masses, the frequency of freezing rain in Hunan Province is significantly higher than that in Hubei Province. Diurnal periods also have certain impacts on the distribution of different precipitation phases. Feature importance analysis indicates that 2m temperature, 0℃ level height, and thickness factors play dominant roles in the model discrimination process, while diurnal binary classification factors and latitude serve as auxiliary variables with certain contributions. All four models demonstrate optimal discrimination performance for the dominant category of rain, followed by snow, freezing rain, and sleet. Particularly for sleet (accounting for 2.2% of samples), the test set TS scores are only 8-18%, which is related to the overlapping meteorological characteristics and blurred boundaries during the rain-snow phase transition stage. The Ensemble model can compensate for the weakness of individual models in discriminating certain precipitation phase categories and further improve overall recognition accuracy, although model performance is highly dependent on sample size.