-
For a set of samples
${x_1},{x_2}, \cdots,{x_n}, \cdots,{x_N}$ , the normalization of their exponential functions are${{\rm{x}}_1},{{\rm{x}}_2}, \cdots,{{\rm{x}}_n}, \cdots,{{\rm{x}}_N}$ , where${{\rm{x}}_n}$ is defined asThis transformation is called the Softmax function (Bridle, 1990), which is commonly used in neural networks. The numerator of Eq. (1) is the exponential function of
${x_n}$ , and the denominator is the sum of the exponentials of all the samples; β is a parameter that controls the degree of increase in the contrast of the Softmax function. The Softmax function is a normalized exponential function and is often used in neural networks for classification problems. The Softmax function can be used to represent the probability of class membership for parameters with exponential distributions such as the Gaussian distribution (Bishop, 1995). The Softmax function has been employed to transform cloud fraction to a more Gaussian-like control variable in retrieving cloud fraction from satellite radiances (Auligné, 2014).Considering that the magnitude of hydrometeor mixing ratios is relatively small and the typical non-precipitation region may cover a large area, the calculated denominator in Softmax function may be very close at different levels, making it possible that the vertical distribution characteristics of hydrometeor mixing ratios may be lost after transformation. To handle this issue, a modification has been made to the Softmax function, renamed as the Quasi-Softmax function. With the Quasi-Softmax function, the original hydrometeor mixing ratio
${q_{i,j,k}}$ is transformed to${Q_{i,j,k}}$ :In Eq. (2),
${q_{i,j,k}}$ is the corresponding hydrometeor at the coordinate position$(i,j,k)$ , and${\bar q_{i,j}}$ is the average of the vertical profiles of hydrometeors at horizontal positions$(i,j)$ , which is defined as:where K is the number of vertical levels. Compared to the original Softmax function, the denominator of Quasi-Softmax function becomes the sum of the exponential function of
${\bar q_{i,j}}$ in certain areas rather than the whole model domain. The sum is calculated over an area where${\bar q_{i,j}}$ > 0 and${q_{i,j,k}}$ = 0. To increase the contrast after transformation, β is set to 100 for cloud water mixing ratio (${q_{\rm{c}}}$ ) and rain water mixing ratio (${q_{\rm{r}}}$ ), and 1000 for cloud ice mixing ratio (${q_{\rm{i}}}$ ) and snow mixing ratio (${q_{\rm{s}}}$ ). In this study, the Quasi-Softmax function is applied to the full variable states rather than the perturbations. -
The degree to which samples deviate from being truly Gaussian can be detected from the PDF’s skewness and kurtosis. The skewness measures asymmetry of the PDF about its mean, while kurtosis is a measure of how peaked is the distribution. For a given sample
${x_1},{x_2}, \cdots,{x_n}, \cdots,{x_N}$ , its skewness and kurtosis can be calculated as:where
${G_3}$ and${G_4}$ are the skewness and kurtosis of the sample, respectively, and$\bar x$ is the mean of the sample. For a Gaussian distribution, skewness is zero, whereas positive (negative)${G_3}$ values indicate a median of PDF that is smaller (larger) than its mean and with a large right (left) tail. The kurtosis will be 3 if the distribution is Gaussian, with larger tails and a narrow modal peak resulting in larger${G_4}$ values. Sample skewness and kurtosis can be used together to detect deviations from being exactly Gaussian, but in NWP, the ensemble number is relatively small (typically <100), making the normality of skewness and kurtosis often difficult to attain with sufficient accuracy (Thode, 2002). Therefore, we introduce the D′Agostin test (hereafter K2 test; D′Agostin et al., 1970) to diagnosis the degree of NG of samples. The K2 test is a univariate statistical test which combines the transformed skewness and kurtosis, and it can be used to test the NG of samples with number > 20 (Thode, 2002). In the K2 test,${G_3}$ and${G_4}$ are transformed to${f_3}({G_3})$ and${f_4}({G_4})$ , respectively, where${f_3}({G_3})$ is defined as:and
${f_4}({G_4})$ is defined asPositive (negative)
${f_3}({G_3})$ values mean that the PDF distribution of sample has a median smaller (higher) than the mean with a longer right (left) tail, while positive (negative)${f_4}({G_4})$ values indicate that the PDF has a larger (smaller) modal peak than the Gaussian distribution. Finally,${f_3}({G_3})$ and${f_4}({G_4})$ are combined to produce an omnibus test${K^2}$ :The
${K^2}$ (hereafter K2) is zero when the PDF of the sample is a Gaussian distribution. The higher the calculated K2 value is, the greater the NG of the sample will be. Legrand et al. (2016) used the K2 test to diagnose NG of forecast and analysis errors in a convective-scale model, and the NG of common variables relating to wind, temperature and humidity fields were well quantified by the K2 test. Therefore, in this study, the K2 test is employed to diagnose the NG of background errors of hydrometeors as well as that of the transformed hydrometeors. The detailed description of K2 test can be found in Thode (2002) and Legrand et al. (2016). -
In this study, we focus more on the variational DA method, in which the background error covariance is static, homogeneous, and isotropic. The control variable transform (CVTs) method (Barker et al., 2004), which is common employed to model the background error covariance in variational DA systems, is used in this study. With the CVTs method, the square root of the background error
${{B}}$ matrix is decomposed into a series of sub-matrices:where
${{{U}}_{\rm{p}}}$ ,${{{U}}_{\rm{v}}}$ , and${{{U}}_{\rm{h}}}$ are physical, vertical, and horizontal transforms, respectively. In this study, the cross-variable correlations among hydrometeors and other control variables are not considered in the physical transform${{{U}}_{\rm{p}}}$ ; a recursive iterative filter is employed to calculate the vertical auto-correlations in the vertical transform${{{U}}_{\rm{v}}}$ ; the horizontal auto-correlations are calculated with the application of recursive filters in horizontal transforms${{{U}}_{\rm{h}}}$ . In this study, a Gaussian transform${{{U}}_{\rm{g}}}$ is added before the existing three transforms, and then the square root of the${{B}}$ matrix is expressed as the product:The Gaussian transform is conducted before the physical transform, and it is applied to the full model variables rather than perturbations.
-
In this study, a heavy rainfall case that occurred in the middle and lower reaches of the Yangtze River from late June to early July 2016 was studied. This event resulted in great economic losses in China. The period from 0600 to 1800 UTC 2 July 2016 was selected as the period of interest. The 12-h accumulated precipitation for this period in the simulation domain is shown in Fig. 1a, as reported by the China Hourly Merged Precipitation Analysis (CHMPA; Shen et al., 2014). Figure 1b shows the brightness temperature of the channel 8 of the Himawari-8 Advanced Himawari Imager (AHI) valid at 1800 UTC 2 July 2016, where the cold colors indicate the cloudy regions, corresponding well to the precipitation areas shown in Fig. 1a.
Figure 1. (a) Observed 12-h accumulated precipitation (units: mm) from 0600 UTC to 1800 UTC 2 July 2016 in the study domain, (b) the brightness temperature (K) of channel 8 from Himawari-8 AHI valid at 1800 UTC 2 July 2016, and (c) the vertical profiles of qc, qi, qr, and qs (g kg−1) from one ensemble member valid at 1800 UTC 2 July 2016.
The Weather Research and Forecasting (WRF) model V3.8.1 (Skamarock et al., 2008) is used as the NWP model in this study. The horizontal grid spacing is 4 km, and the number of horizontal grid points is 550×450. The number of vertical levels is 51, and the model top set to 10 hPa. The following physics parameterization schemes are adopted: the WRF single-moment 6-class microphysics scheme (WSM6); the Rapid Radiative Transfer Model for GCMs (RRTMG) shortwave and longwave radiation schemes; the Mellor-Yamada-Janjić (MYJ) boundary layer scheme. No cumulus parameterization is employed.
Considering that hydrometeors evolve rapidly with time, in this study we chose to use the ensemble sample to calculate hydrometeor background errors, as employed in previous studies (Michel et al., 2011; Legrand et al., 2016). In order to obtain the statistical samples of background errors for hydrometeors, an 80-member ensemble forecast was carried out, which was initialized from an 80-member ensemble analysis valid at 0600 UTC 2 July 2016. The 80-member ensemble analysis was provided by the EnKF system of NCEP’s operational Global Data Assimilation System (GDAS). The 12-h forecasts of the 80-member ensemble valid at 1800 UTC 2 July 2016 were used as the statistical samples, and the background errors of hydrometeors were approximated by the deviations of each ensemble member from the ensemble mean. Figure 1c shows the vertical profiles of
${q_{\rm{c}}}$ ,${q_{\rm{i}}}$ ,${q_{\rm{r}}}$ , and${q_{\rm{s}}}$ from one ensemble member. The liquid hydrometeor mixing ratios (${q_{\rm{c}}}$ and${q_{\rm{r}}}$ ) are confined primarily to levels below 500 hPa, while the ice particle mixing ratios (${q_{\rm{i}}}$ and${q_{\rm{s}}}$ ) are only found in the middle and upper levels between 700 and 150 hPa. The magnitude of the four hydrometeors is about 10−5 kg kg−1, so only the levels at which the mean value of each hydrometeor is greater than 10−6 kg kg−1 are diagnosed in this study.This study aims to find a Gaussian transform method to construct more Gaussian hydrometeor control variables in data assimilation systems. Four experiments are designed, and the details of the four experiments are shown in Table 1. The experiment Origin uses the original hydrometeors as a benchmark. It has been pointed out that the logarithmic transform, like denary logarithmic (Log10), can bring the PDFs of background errors for some variables closer to Gaussian (Errico et al., 2007; Fletcher and Zupanski., 2007), so the experiment Log10 employs the logarithm of hydrometeors as in Michel et al. (2011). The Softmax function is used in experiment Softmax in this study; The newly constructed Quasi-Softmax function is employed in the experiment Q_softmax.
Experiments Hydrometeor control variables Origin ${q_{i,j,k}}$ Log10 $\lg \left(\dfrac{{{q_{i,j,k}}}}{{{q_{\rm{0}}}}}\right);{q_0} = {10^{ - 3}}{\rm{kg}}\;{\rm{k}}{{\rm{g}}^{ - 1}} $ Softmax $\dfrac{{\exp (\beta {q_{i,j,k}})}}{{\displaystyle\sum {\exp (\beta {q_{i,j,k}})} }}$ Q_softmax $\frac{{\exp (\beta {q_{i,j,k}})}}{{\displaystyle\sum\limits_{{{\overline q }_{i,j}} > 0} {\exp (\beta {{\overline q }_{i,j}}){\rm{ - }}\displaystyle\sum\limits_{{q_{i,j,k}} > 0} {\exp (\beta {{\overline q }_{i,j}})} } }},{\overline q _{i,j}} = \dfrac{1}{K}\displaystyle\sum\limits_{k = 1}^K {{q_{i,j,k}}} $ Table 1. Four experiments and their hydrometeor control variables.
Experiments | Hydrometeor control variables |
Origin | ${q_{i,j,k}}$ |
Log10 | $\lg \left(\dfrac{{{q_{i,j,k}}}}{{{q_{\rm{0}}}}}\right);{q_0} = {10^{ - 3}}{\rm{kg}}\;{\rm{k}}{{\rm{g}}^{ - 1}} $ |
Softmax | $\dfrac{{\exp (\beta {q_{i,j,k}})}}{{\displaystyle\sum {\exp (\beta {q_{i,j,k}})} }}$ |
Q_softmax | $\frac{{\exp (\beta {q_{i,j,k}})}}{{\displaystyle\sum\limits_{{{\overline q }_{i,j}} > 0} {\exp (\beta {{\overline q }_{i,j}}){\rm{ - }}\displaystyle\sum\limits_{{q_{i,j,k}} > 0} {\exp (\beta {{\overline q }_{i,j}})} } }},{\overline q _{i,j}} = \dfrac{1}{K}\displaystyle\sum\limits_{k = 1}^K {{q_{i,j,k}}} $ |