-
For the ConvLSTM and FC-Unet models described above, the temporal coverage of available observational/reanalysis data is relatively limited, which leads to insufficient sample sizes when utilizing pan-Arctic monthly fields as input to support the training of DL models. To address this limitation, we employ transfer learning methods, which have been successfully applied to the field of climatology as demonstrated in previous studies (Ham et al., 2019). By applying transfer learning, we utilize a large amount of CMIP6 simulation data. This provides a robust pre-training foundation for our DL models, enabling successful fine-tuning on a smaller reanalysis dataset for accurate predictions and avoiding the potential risk of overfitting.
For optimal transfer learning benefits, we first evaluate the impact of incorporating additional CMIP6 historical runs as transfer training data. In this study, due to time and data download speed limitations, we have referred to the results of some previous work on evaluating the performance of CMPI6 in the polar regions, and have selected two of the models that performed relatively well for SIC, SST, and SAT simulations. Table 1 shows the two CMIP6 models we chose: IPSL-CM6A-LR and ACCESS-ESM1-5. Given that CMIP models do not perform as accurately in polar regions as they do at lower latitudes, the results of randomly introducing a large number of CMIP6 models may run the risk of introducing large systematic errors and thus do not always contribute to improving model performance. Indeed, in the few previous works on Arctic SIC forecasting involving transfer learning, multiple members of two evaluated CMIP models are employed (Andersson et al., 2021). Previous research shows that both of them perform well in polar regions with regard to sea ice and other climate factors (Long et al., 2021; Shen et al., 2021; Casagrande et al., 2023). The first five members of their historical runs (from r1p1i1f1 to r5p1i1f1) are used to generate large training datasets for transfer learning (~2000 samples per CMIP6 member) through the same sliding-window technique as for processing observations and reanalysis data.
Model Country (Institution) Sea ice model (resolution) Members selected ACCESS-ESM1-5 Australia (CSIRO) CICE4.1 (360 × 300) historical run (r1i1p1f1 to r5i1p1f1) IPSL-CM6A-LR France (IPSL) NEMO-LIM3 (362 × 332) historical run (r1i1p1f1 to r5i1p1f2) Table 1. Selected CMIP6 members for transfer learning.
Moreover, due to the missing point problem in IPSL-CM6A-LR, the experiments that determine whether to introduce its data for training or not are used as the main difference between the two groups of experiments when conducting data sensitivity experiments. Due to the grid scheme of IPSL-CM6A-LR, missing values are left along a specific meridional path between approximately 70°E–110°W for oceanic variables. Therefore, using its members for transfer learning introduces local systematic errors to some degree. So, the composition schemes of CMIP6 data for transfer learning in our prediction experiments are categorized into two types: one that solely uses the 5 members from ACCESS-ESM1-5 (as indicated by the suffix “5mem” in the x-axis of Fig. 2), and the other that uses a total of 10 members from both models (similarly indicated by suffix “10mem”). During transfer learning, we conduct four epochs for the “5mem” experiments and two epochs for the “10mem” experiments, both with a fixed learning rate of 0.0005.
Figure 2. (left) Centered-RMSE and nRMSE between PIOMAS and predicted SIT by ConvLSTM and FC-Unet model with different data and predictor configurations. (right) The histogram of PIOMAS and predicted SITA by each experiment.
Figure 2 shows the assessments of the prediction skills of each experiment in the form of error statistics. The adopted metrics are spatiotemporally averaged centered-RMSE (standardized SIT RMSE) and nRMSE (calculated by dividing the SIT RMSE by the mean SIT of the sea ice-covered area in the corresponding month) for each season and annual period. We conduct four experiments using the ConvLSTM model (from the left of the x-axis) to assess the potential impact of data selection since the ConvLSTM model appears to be more sensitive to data quality compared to the FC-Unet (demonstrated in the next chapter). The results show that both models capture the main variability in pan-Arctic SIT relatively well, with the centered-RMSE being lowest in SON and slightly higher in MAM, which is consistent with our weighting of September samples by a factor of 1.2. Meanwhile, using more CMIP6 data for transfer learning (“10mem”) tends to result in smaller centered-RMSEs for SIT predictions, even if the additional IPSL-CM6A-LR data may introduce systematic errors at specific locations.
Meanwhile, we also assess the sensitivity of prediction skills to the introduction of additional atmospheric and oceanic variables as predictors. The predictor configurations for the experiments are also categorized into two types: one relies on the spatiotemporal fields of two sea ice variables, SIC and SIT, as indicated by the suffix “2inp” in Fig. 2. The other (denoted by the “11inp” suffix in Fig. 2) synthesize insights from several SIC forecasting studies (Fritzner et al., 2020; Kim et al., 2020; Andersson et al., 2021), which incorporate the spatiotemporal fields for the 11 climate variables listed in Table 2.
Variable Source Abbreviation or Calculation Variable name in CMIP6 Units Sea ice thickness PIOMAS SIT sithick m Sea ice concentration NSIDC SIC siconca % 10-m wind speed ERA5 sqrt(u102+v102) sqrt(uas2+vas2) m s–1 Specific humidity ERA5 q huss kg kg–1 2-m air temperature ERA5 t2m tas K Sea surface temperature ERA5 sst tos K Rain rate ERA5 mtpr-msr pr-prsn mm d–1 Snow rate ERA5 msr prsn mm d–1 Surface pressure ERA5 sp ps Pa Mean surface downward shortwave radiation flux ERA5 msdwswrf rsds W m–2 Mean surface downward longwave radiation flux ERA5 msdwlwrf rlds W m–2 Table 2. Selected predictors for DL models.
Based on the same metrics, the four x-labels with suffix “10mem” in Fig. 2 indicate that when considering the total SIT variability, to include the seasonal cycle, the introduction of more predictors seems to be harmful to prediction skills. However, noting that in climatological research, more attention tends to be paid to the anomalies relative to historical climatology. We thus evaluate the anomalies of predicted and actual SIT relative to the 1989–2015 monthly climatology (not shown). The result shows that the incorporation of a larger set of climate variables (listed in Table 2) as predictors contributes to reducing the centered-RMSEs of SIT anomalies. In addition, intermodel comparisons show that the FC-Unet model has a smaller prediction error compared to ConvLSTM. Consequently, we select the experiments with more transfer learning data and more predictors as the optimal configuration for subsequent analysis.
Moreover, the parameters for subsequent fine-tuning phases are set as follows. The fine-tuning data are split into a training set for the years 1979–2011 and a test set for 2012–20 respectively, and 10% of the training set is further split for validation. Early stopping is implemented to prevent overfitting and is only triggered if the loss on the validation set does not decrease for 10 consecutive epochs. Meanwhile, the learning rate is initialized to half the level of the transfer training phase and follows an exponentially decaying schedule. In practice, the early stopping is usually triggered within 30 fine-tuning epochs.
-
Based on ConvLSTM and FC-Unet models with optimal configurations in the previous chapter, we conduct a comprehensive evaluation of their prediction skills for Arctic SIT and SIT anomaly (SITA). Figure 3 shows the spatial correlation between predicted and actual SIT by month. The overlap of test periods for both models (2014–17) is depicted, except for 2012–13 which is reserved for subsequent extreme event assessment. Both models reproduce the spatial pattern of SIT well, with spatial correlation coefficients above the 0.95 level in most months. Meanwhile, the SIT time series can be decomposed into a superposition of a historical monthly climatology and SITA, the latter being important in climatological research but has yet to be emphasized in most previous work on deep learning prediction of Arctic SIC. We therefore calculate SITAs with the same approach as the previous section (not shown) and find that the skill differences become more significant. The correlations of FC-Unet remain consistent around the 90% level for all months, while the performance of ConvLSTM shows stronger seasonality with higher correlation during the melting season (Highest in September at ~70%, consistent with sample weight settings mentioned previously) and lower by about 10%–20% during the freezing season.
Figure 3. Mean spatial correlation between PIOMAS and predicted SIT by ConvLSTM (blue line) and FC-Unet (red line) within the time period of the test set.
In summary, the performance differences in terms of spatial correlation between the models are evident. The spatial consistency of the FC-Unet predictions with PIOMAS is reliable throughout the testing period, whereas the ConvLSTM model slightly underperforms, with stronger seasonality partly because the model is more sensitive to sample weight settings.
In addition, we also access some non-transfer learning models adopted in previous studies (Chi and Kim, 2017; Kim et al., 2020), such as the conventional CNN model trained on local samples (not shown). To overcome data scarcity, the model subdivides the pan-Arctic prediction mission into numerous local prediction tasks, each of which predicts SIT at the central grid point of an 11 × 11 local predictor (radius of ~125 km). Assessment results show that its performance significantly declines when anomalies are examined.
The deficient skill of a conventional CNN model is mainly attributed to several underlying factors. First, the feasibility assumption of this approach is that the SIT at grid points throughout the pan-Arctic region can be related to local climate variables via the same nonlinear relationship. Thus, it sacrifices larger-scale spatial information and the corresponding teleconnections. Meanwhile, the conventional CNN model simply treats different timesteps as separate features, inherently lacking the capability to extract temporal features from the inputs. As a result, there are large performance gaps compared to ConvLSTM and FC-Unet; therefore, we do not conduct follow-up assessments on it.
We then proceed to examine the temporal correlation between predicted and actual SITA by season and region. The temporal anomaly correlation coefficient (TACC) is chosen as the metric, which represents the correlation between predicted and actual anomalies. Figures 4a and 4b depict the spatial distribution patterns of TACCs for the ConvLSTM and FC-Unet models as a function of seasons. Overall, both DL models well capture the anomalous variability of SIT across the seasons, with TACC levels close to 1 over much of the pan-Arctic region. Meanwhile, the TACCs are relatively low along the eastern coast of Greenland throughout the year, and at the southern entrance to the Bering Strait during the freezing season (DJFMAM). In contrast, the TACCs drop below zero along the ice edges in the Baffin Bay and Labrador Sea, as well as in the northern part of the Hudson Bay in SON. Figure 4c depicts the differences between the two models. Relative to FC-Unet, the ConvLSTM model has lower correlations throughout the year, mainly near the North Pole and also in coastal areas with more complex land-sea distributions, such as the Canadian Archipelago, Hudson Bay, Baffin Bay, and Davis Strait. In addition, lower TACCs are also observed along the western coast of the Labrador Sea and the northern parts of the Bering and Okhotsk Seas during the freezing season.
Figure 4. Prediction skills of the (a) ConvLSTM and (b) FC-Unet models as measured by TACC for each season and annual period. (c) TACC difference between the ConvLSTM and FC-Unet model. The numbers in the upper-left corner of the subplots are the mean pan-Arctic spatial correlation coefficients.
It is worth mentioning that both two DL models have slightly lower correlations for locations near 70°E along the 110°W meridian. This is due to the setup of the grid points in the ocean module of IPSL-CM6A-LR, which creates a certain default value problem in this elongated area. Although the difference in these correlations is not very significant, to some extent it supports the finding that the ConvLSTM appears to be more sensitive to data configuration, when IPSL-CM6A-LR members are introduced as samples for training.
In summary, the TACC and MAE (Figs. 4 and 5) results are consistent; both indicate that the performance gap between the ConvLSTM and FC-Unet is mainly around the North Pole, as well as in regions with complex land-sea distribution. In these regions, subtle differences in the models’ capabilities to capture spatiotemporal features come to the forefront, thus leading to the correlation gap described above.
Figure 5. Same as in Fig. 4, but for MAE patterns. The numbers in the upper-left corner of the subplots are pan-Arctic average MAE levels.
Building upon the above results, we further conduct a systematic statistical analysis of the SITA predictions for each region using a Taylor diagram (Fig. 6). For sea area zoning, the regional mask provided by NSIDC is adopted.
For both models, the primary discrepancies between predictions and actual values are mainly found in the Central Arctic Ocean, Hudson Bay, and Canadian Archipelago, while the consistency is higher in other regions (Fig. 6). Overall, the SITA predictions by FC-Unet are generally consistent with PIOMAS. In contrast, as evidenced by lower correlations and larger RMSDs, the ConvLSTM model underperforms mainly in the Canadian Archipelago and Hudson Bay regions with complex land-sea distributions, as well as a slight underestimation of the magnitude of the SITA in the Central Arctic Ocean region.
Statistical analysis of the distributions of SIT predictions and true values (not shown) reveal that our conclusions are consistent with the above findings. Further, the results of statistics by season show that errors in the Hudson Bay and Canadian Archipelago regions exhibit distinct seasonality. In Hudson Bay, the discrepancies mainly appear in DJF and manifest as negative anomalies, which are more significant in the ConvLSTM model. As for the Canadian Archipelago region, the seasonality of the discrepancies is even stronger. During the freezing season, the ConvLSTM model appears to overestimate the negative SITA, whereas in summertime, both models tend to underestimate the strong negative SITA. Meanwhile, discrepancies in the Central Arctic Ocean region show a slight underestimation of the negative SITA across all seasons, with a slightly larger deviation for the ConvLSTM model.
In summary, the SITA predictions of FC-Unet are generally consistent with PIOMAS, while performance gaps for the ConvLSTM model are mainly found in regions with complex land-sea distributions, and also in the central Arctic Ocean.
We further attempt to explain the reasons for these performance gaps. The skill discrepancies in coastal regions are probably due to filling value settings. To be specific, the distribution peak of SIT is biased towards the minimal end (zero, same as the inactivated state of neurons); meanwhile, the landmask and other missing values in the samples are also filled with zeros. This results in the inability of the models to effectively distinguish between ice-free sea surface and filling values. Moreover, a higher prevalence of inactivated values can exacerbate the gradient vanishing problem. The FC-Unet model mitigates this problem to some extent by adopting the ResNet architecture (He et al., 2016) in its main composition modules, and thus outperforms the sequential ConvLSTM model. In contrast to the SIT distribution, the peak of the SIC distribution is located near its maximum value (~1). This implies that the models encounter fewer aforementioned problems when predicting SIC, which is consistent with our finding that the models’ prediction skills for SIT are slightly lower than those for SIC (not shown).
-
In September 2012, Arctic sea ice extent reached a record low of 3.63 million km2, followed by a rapid recovery to 5.35 million km2 in September 2013 (Liu and Key, 2014). Previous research suggests that anthropogenic influences played a significant role in the extreme sea ice minimum in 2012 (Kirchmeier-Young et al., 2017) and its subsequent recovery, through a combination of multiple factors. Considering the important role of Arctic sea ice in the global climate system, accurate predictions of such extreme events are essential for investigating the attribution and impacts of climate change or specific operations such as real-time shipping route planning.
Figure 7a illustrates the magnitude of the extreme sea ice minimum in September 2012, and Figs. 7b and 7c depict the difference between actual and predicted SIT/SIE by the DL models. Overall, both the FC-Unet and ConvLSTM models accurately predict the lowest historical SIE. In contrast, in the ice edge region between 120°W–180°, the prediction errors of FC-Unet are relatively small, while the ConvLSTM exhibits clear positive errors.
Figure 7. (a, d) The actual SITA, and error between the predicted and actual SITAs for (b,e) ConvLSTM and (c, f) FC-Unet for September (a–c) 2012 and (d–f) 2013, with SIE contours superimposed on each subplot (the actual SIE in green and the predicted SIE in yellow).
Similarly, the bottom row of Fig. 7 illustrates the rapid recovery of sea ice in September 2013 and its prediction errors. Figure 7d shows that the overall sea ice anomalies remained negative in the context of the multidecadal declining trend of Arctic sea ice, and both DL models exhibit similar patterns of prediction errors in Figs. 7e and 7f. In comparison, the ConvLSTM model shows a broader spread of positive errors from 120°W–60°E, and significantly larger positive errors to the north of Greenland.
In summary, both DL models are capable of predicting extreme events. This is possibly due to the incorporation of multiple climate variables as predictors, which allows the models to recognize the precursors leading to such extreme events, thus allowing for the successful prediction of abrupt changes in Arctic sea ice.
Model | Country (Institution) | Sea ice model (resolution) | Members selected |
ACCESS-ESM1-5 | Australia (CSIRO) | CICE4.1 (360 × 300) | historical run (r1i1p1f1 to r5i1p1f1) |
IPSL-CM6A-LR | France (IPSL) | NEMO-LIM3 (362 × 332) | historical run (r1i1p1f1 to r5i1p1f2) |