High-resolution soil moisture (SM) data are critical for drought monitoring and flood forecasting. This paper describes the establishment of an interpretable machine learning (ML)-based framework for SM data fusion and generates a daily-scale, 1-km resolution, surface SM (0–10 cm) dataset over China (2000–2025). Four state-of-the-art ML models—Random Forest, XGBoost, LightGBM, and CatBoost—were trained based on in situ SM data from 2371 automatic observation stations across China. Model performance was optimized via Recursive Feature Elimination (RFE) and automated hyperparameter tuning using Optuna, while SHapley Additive exPlanations (SHAP) provided mechanistic interpretability of the ML models. The key findings of this study are as follows: (1) The fusion model primarily enhances SM estimation, exhibiting lower root-mean-square error than CLDAS (China Meteorological Administration Land Data Assimilation System) SM, despite marginally weaker daily temporal correlation; (2) RFE eliminated 57% of features while preserving predictive accuracy; (3) SHAP analysis revealed high-accuracy SM inputs as the most influential predictors, followed by static (terrain and soil properties) and meteorological variables. The SM fusion method developed in this study is transferable to multi-source satellite SM fusion and downscaling. The dataset is publicly available at
https://doi.org/10.11888/Terre.tpdc.302923.