Advanced Search
Article Contents

Forecasting Zonda Wind Occurrence with Vertical Sounding Data


doi: 10.1007/s00376-021-1007-0

  • Zonda wind is a typical downslope windstorm over the eastern slopes of the Central Andes in Argentina, which produces extremely warm and dry conditions and creates substantial socioeconomic impacts. The aim of this work is to obtain an index for predicting the probability of Zonda wind occurrence. The Principal Component Analysis (PCA) is applied to the vertical sounding data on both sides of the Andes. Through the use of a binary logistic regression, the PCA is applied to discriminate those soundings associated with Zonda wind events from those that are not, and a probabilistic forecasting tool for Zonda occurrence is obtained. This index is able to discriminate between Zonda and non-Zonda events with an effectiveness close to 91%. The best model consists of four variables from each side of the Andes. From an event-based statistical perspective, the probability of detection of the mixed model is above 97% with a probability of false detection lower than 7% and a missing ratio below 1%. From an alarm-based perspective, models exhibit false alarm rate below 7%, a missing alarm ratio lower than 1.5% and higher than 93% for the correct alarm ratio. The zonal component of the wind on both sides of the Andes and the windward temperature are the key variables in class discrimination. The vertical structure of Zonda wind includes two wind maximums and an unstable lapse rate at midlevels on the lee side and a wind maximum at 700 hPa accompanied by a relatively stable layer near the mountain top.
    摘要: 焚风是阿根廷中安第斯山脉东坡典型的下坡风暴,可导致极其温暖干燥的气象条件,并产生巨大的社会经济影响。本文基于安第斯山脉两侧的垂直探空数据,通过主成分分析法(Principal Component Analysis, PCA),构建了可预报阿根廷焚风发生概率的指数模型。通过二元逻辑回归分析,利用主成分分析法辨识与焚风相关的探空数据,得到焚风的概率预报模型。该指数能够区分焚风和非焚风事件,有效率接近91%。最佳模型由安第斯山脉两侧的四个变量组成。从已发生的焚风事件的统计结果看,混合模型的探测效率在97%以上,空探测率低于7%,漏探测率低于1%。从预报的角度来看,模型的空报率低于7%,漏报率低于1.5%,预报准确率高于93%。安第斯山脉两侧的纬向风分量和迎风坡气温是判断能否形成焚风的关键参量。焚风发生时的垂直结构特征为背风坡中层的两个风速峰值区和不稳定温度递减率以及迎风坡700 hPa处的风速峰值和接近山顶处的相对稳定层。
  • 加载中
  • Figure 1.  (a) South America region with topographic height (shading, units: m) and (b) zoomed-in region over surface weather stations. Black dots and white line correspond to sounding stations and filled black dots correspond to the stations used for Zonda wind classification.

    Figure 2.  (a) Zonda events distribution for the surface stations and for sounding data used. (b) Zonda duration annual frequency distribution. (c) Zonda onset time annual frequency distribution. The straight line corresponds to the sounding hour (1200 UTC).

    Figure 3.  (a) Mean (solid line) and Zonda (dotted line) vertical soundings at 1200 UTC for Mendoza Airport and (b) Santo Domingo (right). Wind barbs on the left correspond to the Zonda sounding, and wind barbs on the right correspond to the mean sounding.

    Figure 4.  Leeside model for temperature. (a) Index values distribution for Zonda (dark) and for non-Zonda (light) events, (b) model efficiency according to each cut-off value, and (c) index boxplot.

    Figure 5.  Same as Fig. 4 but for the zonal component model. (a) Leeside sounding, (b) windward sounding, and (c) mixed model.

    Figure 6.  (a) 1-var mixed model for temperature for the lee side and zonal component of wind for the windward side and (b) the total efficiency for all possible combinations with one variable on each side.

    Figure 7.  1-var mixed model metrics: true positives (a) and negatives (d), surprises (g) and false alarms (b), POD (e), POFD (h), MR (c), FAR (f), and MAR (i).

    Figure 8.  (a) Discriminant sounding between Zonda and non-Zonda classes for the 1-var models for the lee side and (b) for the windward side. Since the discriminant soundings are anomalies with respect to the mean profile, temperature and dewpoint temperature variables are multiplied by a factor of five and added to the mean sounding to better appreciate the differences with the mean sounding. The zonal component (U) of the wind is multiplied by a factor of 10 as anomalous values (i.e., the mean sounding is not added). The meridional component is omitted due to its lesser relevance in the models. The squared Brunt–Väisälä frequency presents the original values of the discriminant sounding.

    Figure 9.  Total efficiency, true positives and true negatives, false alarms, POD (right axis), POFD, MR, CAR (right axis), FAR, and MAR for the lee side (a, d, g, and j), for the windward side (b, e, h, and k), and for mixed models (c, f, i, and l).

    Figure 10.  Best mixed models considering the total efficiency for two, three, and four variables. (a) 2-var mixed model for temperature and zonal component for each side. (b) 3-var mixed model for temperature, zonal component, and squared Brunt–Väisälä for both sides. (c) 4-var mixed model for temperature, zonal and meridional components, and squared Brunt–Väisälä for the lee side and temperature, dewpoint, zonal component, and squared Brunt–Väisälä for the windward side.

    Figure 11.  Discriminant sounding between Zonda and non-Zonda classes for the 4-var model (a) for the lee side and (b) for windward side.

    Table 1.  Contingency table. TP (true positives), FP (false positives), FN (false negatives), and TN (true negatives).

    Predicted
    Yes (Zonda)No (non-Zonda)
    Observed (Zonda)TPFN
    (Hit Zonda)(surprise)
    No event (non-Zonda)FPTN
    (Zonda False Alarm)(Hit non-Zonda)
    DownLoad: CSV

    Table 2.  Description of metrics used for the validation of the model.

    ExpressionDescription
    PODTP/(TP+FN)Probability Of Detection (event-based)
    MRFN/(TP+FN) = 1−PODMiss Ratio (event-based)
    CARTP/(TP+FP)Correct Alarm Ratio (alarm-based)
    FARFP/(TP+FP) = 1−CARFalse Alarm Ratio (alarm-based)
    POFDFP/(TN+FP)Probability Of False Detection (event-based)
    MARFN/(TN+FN)Missed Alarm Ratio (alarm-based)
    DownLoad: CSV

    Table 3.  Model metrics for the best cut-off index value. MDZ, Mendoza’s sounding, CH, Santo Domingo’s sounding and MDZ-CH, mixed model for same variable on each side.

    (a) Leeside Model 1-var
    var MDZTotal efficiencyBest cut-off value
    True positivesTrue negativesErrorSurprisesFalse alarms
    T71.6486.4553.8428.366.7821.59
    Td70.5688.9548.1229.445.5323.91
    U79.3591.5364.2820.654.2316.41
    V57.1146.1261.442.8926.9415.95
    N267.3365.965.1632.6717.0515.62
    (b) Windward Model 1-var
    var CHTotal efficiencyBest cut-off value
    True positivesTrue negativesErrorSurprisesFalse alarms
    T65.9149.4577.5934.0925.288.82
    Td68.3267.2466.931.6816.3815.3
    U83.0682.3682.6716.948.828.12
    V72.16726927.841413.84
    N258.1568.5241.3341.8515.7426.11
    (c) Mixed Model 1-var
    var MDZ-CHTotal EfficiencyBest cut-off value
    True PositivesTrue NegativesErrorSurprisesFalse Alarms
    T78.8486.8367.3621.166.5914.57
    Td70.8883.1654.9529.128.4220.7
    U83.5777.9786.5716.4311.025.41
    V72.5272.6769.4127.4813.6613.82
    N267.2162.9168.0332.7918.5414.25
    DownLoad: CSV

    Table 4.  Model metrics for the best-cut value. MDZ, Mendoza’s sounding, CH, Santo Domingo’s sounding, and mixed model for same variable on each side. POD (probability of detection), POFD (probability of false detection), MR (miss ratio), CAR (correct alarm ratio), FAR (false alarm ratio), and MAR (missed alarm ratio).

    (a) Leeside Model 1-var
    Leeside varPODBest cut-off value
    POFDMRFARMARCAR
    T92.7330.007.2721.0711.1878.93
    Td94.1535.035.8522.5810.3077.42
    U95.5821.754.4216.336.1883.67
    V63.1323.9236.8729.5030.5070.50
    N279.4421.1020.5620.9120.7479.09
    (b) Windward Model 1-var
    Windward varPODBest cut-off value
    POFDMRFARMARCAR
    T66.1712.6233.8318.4824.5781.52
    Td80.4119.8319.5919.7519.6780.25
    U90.339.499.679.529.6490.48
    V83.7218.3416.2817.7116.8782.29
    N281.3241.5218.6829.9827.5870.02
    (c) Mixed Model 1-var
    Combined varPODBest cut-off value
    POFDMRFARMARCAR
    T92.9519.507.0515.828.9184.18
    Td90.8029.089.2021.3213.2978.69
    U87.627.2012.387.9311.2992.07
    V84.1718.0515.8317.3916.4582.61
    N277.2419.0222.7620.2621.4279.74
    DownLoad: CSV

    Table 5.  Mixed model metrics for the best combination.

    MixedEfficiencyTrue PositivesTrue NegativesSurprisesFalse AlarmsPODPOFDMRFARMARCAR
    1-var86.3190.3681.834.825.4194.947.205.067.935.7192.07
    2-var89.6694.3486.972.836.5297.096.973.346.913.7493.09
    3-var90.4194.1687.662.915.8197.006.173.006.363.3693.64
    4-var90.7790.8881.380.866.5698.907.471.057.921.3692.94
    DownLoad: CSV

    Table 6.  Best 10 mixed models considering the total efficiency for two and three variables and best 5 mixed models for four variables. First column: leeside 2-, 3- and 4-var best 10 models, with second column for the windward side and third column for the mixed models.

    (a) Leeside Model 2-var
    Efficiencywindward model 2-varEfficiencymixed model 2-varEfficiency
    U-N284.83U-V84.12T-U−T-U89.66
    T-U83.28U-N283.21T-N2T-U89.21
    U-V79.83Td-U83.08T-N2−U-V89.07
    Td-U79.44T-U82.78T-U−U-V88.84
    T-Td77.11T-V75.35T-U−U-N288.84
    T-N273.75V-N275.06T-U−Td-U88.68
    T-V72.46Td-V73.78T-Td−U-V88.41
    Td-N271.08T-Td70.93T-V−T-U88.33
    Td-V70.78T-N268.59T-Td−T-U88.27
    V2-N269.18Td-N268.53T-N2−U-N288.25
    (b) Leeside Model 3-var
    Efficiencywindward model 3-varEfficiencymixed model 3-varEfficiency
    T-U-N287.26U-V-N285.34T-U-N2−T-U-N290.68
    Td-U-N284.38Td-U-N284.97T-Td-U−T-U-N290.41
    T-U-V83.87T-U-V83.71T-Td-U−T- Td -U90.03
    U-V-N283.81Td-U-V83.66T- Td -U−T-U-V89.94
    T-Td-U83.47T-U-N283.28T-V-N2−T-U-N289.86
    Td-U-V80.72T-Td-U83.25T-U-N2−T- Td -U89.78
    T-Td-N278.10T-V-N277.51T-U-V−T-U-N289.63
    T-Td-V78.01T-Td-V76.62T-U-N2− Td -U-N289.39
    T-V-N274.25Td-V-N274.85T- Td -V−T-U-N289.22
    Td-V-N272.75T-Td-N274.43T-U-N2U-V-N289.13
    (c) Leeside Model 4-var
    Efficiencywindward model 4-varEfficiencymixed model 4-varEfficiency
    T-Td-U-N287.75Td-U-V-N286.34T-U-V-N2−T-Td-U-N290.77
    T-U-V-N287.25T-U-V-N285.65T-Td-U-V−T-Td-U-N290.26
    Td -U-V-N285.48T-Td-U-N285.60T-Td-U-N2−T-Td-U-N290.22
    T-Td-U-V84.64T-Td-U-V84.86T-Td-U-N2T-U-V-N289.99
    T-Td-V-N279.01T-Td-V-N279.47T-Td-V-N2−T-U-V-N289.83
    DownLoad: CSV
  • Araneo, D. C., S. C. Simonelli, F. A. Norte, M. Viale, and J. R. Santos, 2011: Caracterización de sondeos estivales del norte de Mendoza mediante el análisis de componentes principales y obtención de un índice de convección. Meteorológica, 36(1), 31−47. (in Spanish).
    Barnes, L. R., E. C. Gruntfest, M. H. Hayden, D. M. Schultz, and C. Benight, 2007: False alarms and close calls: A conceptual model of warning accuracy. Wea. Forecasting, 22, 1140−1147, https://doi.org/10.1175/WAF1031.1.
    Courvoisier, H. W., and T. Gutermann, 1971: Zur praktischen anwendung des föhntests von widmer. Rep. 21, 10 pp. [Available from http://www.agfoehn.org/doc/Courvoisier_1971.pdf]
    Damiens, F., F. Lott, C. Millet, and R. Plougonven, 2018: An adiabatic foehn mechanism. Quart. J. Roy. Meteor. Soc., 144(714), 1369−1381, https://doi.org/10.1002/qj.3272.
    Drechsel, S., and G. J. Mayr, 2008: Objective forecasting of foehn winds for a subgrid-scale alpine valley. Wea. Forecasting, 23(2), 205−218, https://doi.org/10.1175/2007WAF2006021.1.
    Dürr, B., 2008: Automatisiertes verfahren zur bestimmung von föhn in alpentälern. Arbeitsbericht 223, Bundesamt für Meteorologie und Klimatologie, MeteoSchweiz, 22 pp. (in German)
    Durran, D. R., 1990: Mountain waves and downslope winds. Atmospheric Processes over Complex Terrain, R. M. Banta et al., Eds., Springer, 59−81, https://doi.org/10.1007/978-1-935704-25-6_4.
    Jackson, P. L., G. Mayr, and S. Vosper, 2013: Dynamically-driven winds. Mountain Weather Research and Forecasting, F. K. Chow et al., Eds., Springer, 121−218, https://doi.org/10.1007/978-94-007-4098-3_3.
    Norte, F. A., 2015: Understanding and forecasting zonda wind (Andean Foehn) in Argentina: A review. Atmospheric and Climate Sciences, 5(3), 163−169, https://doi.org/10.4236/acs.2015.53012.
    Otero, F., and F. A. Norte, 2015: Métodos de clasificación y climatología del viento Zonda en San Juan. Geoacta, 40(1), 45−53. (in Spanish).
    Otero, F., and D. Araneo, 2021: Zonda wind classification using machine learning algorithms. International Journal of Climatology, 41, E342−E353, https://doi.org/10.1002/joc.6688.
    Otero, F., F. Norte, and D. Araneo, 2018: A probability index for surface zonda wind occurrence at Mendoza city through vertical sounding principal components analysis. Theor. Appl. Climatol., 131(1−2), 213−225, https://doi.org/10.1007/s00704-016-1983-7.
    Reinecke, P. A., and D. Durran, 2009a: The overamplification of gravity waves in numerical solutions to flow over topography. Mon. Wea. Rev., 137(5), 1533−1549, https://doi.org/10.1175/2008MWR2630.1.
    Reinecke, P. A., and D. R. Durran, 2009b: Initial-condition sensitivities and the predictability of downslope winds. J. Atmos. Sci., 66(11), 3401−3418, https://doi.org/10.1175/2009JAS3023.1.
    Smith, C. M., and E. D. Skyllingstad, 2011: Effects of inversion height and surface heat flux on downslope windstorms. Mon. Wea. Rev., 139, 3750−3764, https://doi.org/10.1175/2011MWR3619.1.
    Smith, R. B., 2007: Interacting mountain waves and boundary layers. J. Atmos. Sci., 64(2), 594−607, https://doi.org/10.1175/JAS3836.1.
    Sprenger, M., S. Schemm, R. Oechslin, and J. Jenkner, 2017: Nowcasting foehn wind events using the adaboost machine learning algorithm. Wea. Forecasting, 32(3), 1079−1099, https://doi.org/10.1175/WAF-D-16-0208.1.
  • [1] Peter SHERIDAN, Anlun XU, Jian LI, Kalli FURTADO, 2023: Use of Targeted Orographic Smoothing in Very High Resolution Simulations of a Downslope Windstorm and Rotor in a Sub-tropical Highland Location, ADVANCES IN ATMOSPHERIC SCIENCES, 40, 2043-2062.  doi: 10.1007/s00376-023-2298-0
    [2] Xia LI, Keming ZHAO, Shiyuan ZHONG, Xiaojing YU, Zhimin FENG, Yuting ZHONG, Ayitken MAULEN, Shuting LI, 2023: Evolution of Meteorological Conditions during a Heavy Air Pollution Event under the Influence of Shallow Foehn in Urumqi, China, ADVANCES IN ATMOSPHERIC SCIENCES, 40, 29-43.  doi: 10.1007/s00376-022-1422-x
    [3] Xiaodan Wu, Cao Hongxing, Andrew Flitman, Wei Fengying, Feng Guolin, 2001: Forecasting Monsoon Precipitation Using Artificial Neural Networks, ADVANCES IN ATMOSPHERIC SCIENCES, 18, 950-958.  doi: 10.1007/s00376-997-0014-0
    [4] LIN Zhenshan, SHI Xiangsheng, 2003: The Decade-Scale Climatic Forecasting in China, ADVANCES IN ATMOSPHERIC SCIENCES, 20, 604-611.  doi: 10.1007/BF02915503
    [5] Jorge A. REVELLI, Miguel A. RODR, Horacio S. WIO, 2010: The Use of Rank Histograms and MVL Diagrams to Characterize Ensemble Evolution in Weather Forecasting, ADVANCES IN ATMOSPHERIC SCIENCES, 27, 1425-1437.  doi: 10.1007/s00376-009-9153-6
    [6] Wenbo XUE, Hui YU, Shengming TANG, Wei HUANG, 2024: Relationships between Terrain Features and Forecasting Errors of Surface Wind Speeds in a Mesoscale Numerical Weather Prediction Model, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-023-3087-5
    [7] Rajabu J. MANGARA, Zhenhai GUO, Shuanglin LI, 2019: Performance of the Wind Farm Parameterization Scheme Coupled with the Weather Research and Forecasting Model under Multiple Resolution Regimes for Simulating an Onshore Wind Farm, ADVANCES IN ATMOSPHERIC SCIENCES, 36, 119-132.  doi: 10.1007/s00376-018-8028-3
    [8] LI Shan, RONG Xingyao, LIU Yun, LIU Zhengyu, Klaus FRAEDRICH, 2013: Dynamic Analogue Initialization for Ensemble Forecasting, ADVANCES IN ATMOSPHERIC SCIENCES, 30, 1406-1420.  doi: 10.1007/s00376-012-2244-z
    [9] Dazhi YANG, Wenting WANG, Xiang'ao XIA, 2022: A Concise Overview on Solar Resource Assessment and Forecasting, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 1239-1251.  doi: 10.1007/s00376-021-1372-8
    [10] Zhou Jiabin, 1985: A NEW TYPE OF TIME-SERIES-FORECASTING METHOD, ADVANCES IN ATMOSPHERIC SCIENCES, 2, 385-401.  doi: 10.1007/BF02677255
    [11] Shi Jiuen, Zhou Qinfang, Xiang Jingtian, 1986: AN APPLICATION OF THE THRESHOLD AUTOREGRESSION PROCEDURE TO CLIMATE ANALYSIS AND FORECASTING, ADVANCES IN ATMOSPHERIC SCIENCES, 3, 134-138.  doi: 10.1007/BF02680052
    [12] Xia Jianguo, 1991: How much Numerical Products Affect Weather Forecasting, ADVANCES IN ATMOSPHERIC SCIENCES, 8, 107-110.  doi: 10.1007/BF02657369
    [13] Ding Jincai, Dai Jianhua, Chen Yamin, Hu Fuquan, Tang Xinzhang, 1996: Helicity as a Method for Forecasting Severe Weather Events, ADVANCES IN ATMOSPHERIC SCIENCES, 13, 533-538.  doi: 10.1007/BF03342043
    [14] Zhang Jijia, Chen Xingfang, 1987: THE OPERATIONAL SEASONAL FORECASTING OF THE SUMMER RAINFALL IN CHINA, ADVANCES IN ATMOSPHERIC SCIENCES, 4, 349-362.  doi: 10.1007/BF02663605
    [15] Xu Youping, Xia Daqing, Qian Yueying, 1998: The Water-Bearing Numerical Model and Its Operational Forecasting Experiments Part II: The Operational Forecasting Experiments, ADVANCES IN ATMOSPHERIC SCIENCES, 15, 321-336.  doi: 10.1007/s00376-998-0004-x
    [16] CHEN Lianshou, LI Ying, CHENG Zhengquan, 2010: An Overview of Research and Forecasting on Rainfall Associated with Landfalling Tropical Cyclones, ADVANCES IN ATMOSPHERIC SCIENCES, 27, 967-976.  doi: 10.1007/s00376-010-8171-y
    [17] Steve R. COLWELL, Arthur M. CAYETTE, Matthew A. LAZZARA, Jordan G. POWERS, David H. BROMWICH, John J. CASSANO, Scott CARPENTIER, 2016: The 10th Antarctic Meteorological Observation, Modeling, and Forecasting Workshop, ADVANCES IN ATMOSPHERIC SCIENCES, 33, 656-658.  doi: 10.1007/s00376-016-6012-3
    [18] Wang Shaowu, 1984: THE RHYTHM IN THE ATMOSPHERE AND OCEANS IN APPLICATION TO LONG-RANGE WEATHER FORECASTING, ADVANCES IN ATMOSPHERIC SCIENCES, 1, 7-29.  doi: 10.1007/BF03187612
    [19] John ABBOT, Jennifer MAROHASY, 2012: Application of Artificial Neural Networks to Rainfall Forecasting in Queensland, Australia, ADVANCES IN ATMOSPHERIC SCIENCES, 29, 717-730.  doi: 10.1007/s00376-012-1259-9
    [20] Yunqing LIU, Lu YANG, Mingxuan CHEN, Linye SONG, Lei HAN, Jingfeng XU, 2024: A Deep Learning Approach for Forecasting Thunderstorm Gusts in the Beijing–Tianjin–Hebei Region, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-023-3255-7

Get Citation+

Export:  

Share Article

Manuscript History

Manuscript received: 20 January 2021
Manuscript revised: 06 July 2021
Manuscript accepted: 12 August 2021
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Forecasting Zonda Wind Occurrence with Vertical Sounding Data

    Corresponding author: Federico OTERO, fotero@mendoza-conicet.gob.ar
  • Instituto Argentino de Nivología, Glaciología y Ciencias Ambientales (IANIGLA) CCT Mendoza - CONICET Av. Ruiz Leal s/n., Parque Gral. San Martín, Mendoza 5500, Argentina

Abstract: Zonda wind is a typical downslope windstorm over the eastern slopes of the Central Andes in Argentina, which produces extremely warm and dry conditions and creates substantial socioeconomic impacts. The aim of this work is to obtain an index for predicting the probability of Zonda wind occurrence. The Principal Component Analysis (PCA) is applied to the vertical sounding data on both sides of the Andes. Through the use of a binary logistic regression, the PCA is applied to discriminate those soundings associated with Zonda wind events from those that are not, and a probabilistic forecasting tool for Zonda occurrence is obtained. This index is able to discriminate between Zonda and non-Zonda events with an effectiveness close to 91%. The best model consists of four variables from each side of the Andes. From an event-based statistical perspective, the probability of detection of the mixed model is above 97% with a probability of false detection lower than 7% and a missing ratio below 1%. From an alarm-based perspective, models exhibit false alarm rate below 7%, a missing alarm ratio lower than 1.5% and higher than 93% for the correct alarm ratio. The zonal component of the wind on both sides of the Andes and the windward temperature are the key variables in class discrimination. The vertical structure of Zonda wind includes two wind maximums and an unstable lapse rate at midlevels on the lee side and a wind maximum at 700 hPa accompanied by a relatively stable layer near the mountain top.

摘要: 焚风是阿根廷中安第斯山脉东坡典型的下坡风暴,可导致极其温暖干燥的气象条件,并产生巨大的社会经济影响。本文基于安第斯山脉两侧的垂直探空数据,通过主成分分析法(Principal Component Analysis, PCA),构建了可预报阿根廷焚风发生概率的指数模型。通过二元逻辑回归分析,利用主成分分析法辨识与焚风相关的探空数据,得到焚风的概率预报模型。该指数能够区分焚风和非焚风事件,有效率接近91%。最佳模型由安第斯山脉两侧的四个变量组成。从已发生的焚风事件的统计结果看,混合模型的探测效率在97%以上,空探测率低于7%,漏探测率低于1%。从预报的角度来看,模型的空报率低于7%,漏报率低于1.5%,预报准确率高于93%。安第斯山脉两侧的纬向风分量和迎风坡气温是判断能否形成焚风的关键参量。焚风发生时的垂直结构特征为背风坡中层的两个风速峰值区和不稳定温度递减率以及迎风坡700 hPa处的风速峰值和接近山顶处的相对稳定层。

    • Zonda wind (Argentinian foehn) is a strong, warm, and very dry wind associated with adiabatic compression upon descending over the eastern slopes of the Andes Cordillera and occurs most often in winter and spring, mainly in the provinces of Mendoza and San Juan (Norte, 2015; Otero and Norte, 2015). Despite the constant improvement of numerical weather prediction (NWP) models and the advances in the understanding of mountain meteorology dynamics over the past decades, downslope windstorms forecasting is still limited by several factors. These factors include the dependency and sensitivity to the model’s resolution (e.g., Reinecke and Durran, 2009b; Jackson et al., 2013), numerical schemes, vertical coordinates and diffusion parametrizations (e.g., Smith, 2007), physical formulations (the boundary layer especially, see Smith, 2007), and initial condition uncertainties (e.g., Reinecke and Durran, 2009a). Thus, it is necessary to predict occurrence using statistical forecasting models specifically developed for these stations, which can improve or contribute to the NWP models.

      In the 1960s, Widmer developed a “foehn test” for the Altdorf, Switzerland foehn station that was refined by Courvoisier and Gutermann (1971). This test remains as the operational tool used today. Later, Dürr (2008) developed an automated method for identifying foehn (i.e., nowcasting foehn). His procedure is based on 10-min real-time data from the automated Swiss surface network. Quite recently, Drechsel and Mayr (2008) developed an objective, probabilistic forecasting method for foehn in the Wipp Valley (Innsbruck) based on the ECMWF model output. In Sprenger et al. (2017), a new objective method for foehn prediction based on a machine learning algorithm (called AdaBoost, short for adaptive boosting) is proposed to distinguish between foehn and non-foehn events. Further improvement will require the use of not only deterministic, but also statistical methods. In particular, ongoing work shows that model output statistics (MOS) are a promising tool for improving foehn forecasting.

      Considering the influence on the flow response when interacting with a topographic barrier, the development of the phenomenon is closely related to the vertical structure of the atmosphere, where mountain height, buoyancy frequency, and incident wind control the mountain waves activity and the dynamics of downslope wind (Durran, 1990; Smith and Skyllingstad, 2011; Damiens et al., 2018, among others). The application of statistical techniques, such as the study of Empirical Orthogonal Functions (EOF) or the Principal Component Analysis (PCA), allow the objective statistical characterization of vector and scalar variables or other physical variables, like temperature, humidity, wind, and stability. Otero et al. (2018) obtained an index for predicting the Zonda occurrence through the vertical sounding of the lee side of the Andes. A PCA is used to identify the patterns of the vertical structure of the atmosphere leading up to a Zonda wind event and used to construct the index model. A Zonda/non-Zonda index is calculable from T and Td profiles and is dependent on the climatological features of the region.

      In the present work, the methodology of Otero et al. (2018) is followed. Here, a substantial improvement is presented with respect to the previous work, where only two combined leeward variables were used (temperature and dewpoint temperature). In this new version, not only are windward soundings (Chilean side) added, but so are new and different combinations between variables and soundings, combining up to four variables on each side of the Andes. The newly added variables are wind (U and V) and the squared Brunt–Väisälä frequency (N2), which are key factors in the atmospheric conditions for the development of this kind of downslope windstorm. Likewise, new metrics of the prediction model (not presented in Otero et al., 2018) are calculated such as the probability of detection (POD), probability of false detection (PODF), missing ratio (MR), false alarm rate (FAR), missing alarm ratio (MAR), and correct alarm ratio (CAR). In this case, the PCA is used to characterize the vertical structure on both sides of the Andes Mountains prior to the onset of a downslope windstorm (Zonda). A complete description of the vertical structure of wind, temperature, dewpoint, and stability for Zonda and non-Zonda events and the characteristic structure that discriminates between both classes are obtained.

      The structure of this paper is as follows: Section 2 describes the data and methodology, where the PCA and index model are described. Section 3 presents the results including mean and Zonda soundings, the index model efficiency and metrics, and the discriminant sounding. Finally, the discussion and conclusions and their applications are presented in section 4.

    2.   Data and methodology
    • For this study, available daily sounding data at 1200 UTC for the 1981–2019 period are taken for both sides of the Andes Mountains. The Santo Domingo surface station (33.65°S, 71.61°W, 75 m A. S. L.) on the Chilean side (Andes windward) and the Mendoza Airport surface station (32.83°S, 68.77°W, 704 m A. S. L.) on the Argentinean side (Andes lee side) are used (Fig. 1). The selected pressure levels for Mendoza´s soundings are those standard levels between 850 hPa and 300 hPa (i.e., 850, 700, 500, 400, and 300 hPa) and for Santo Domingo those between 1000 hPa and 200 hPa (i.e., 1000, 850, 700, 500, 400, 300 and 200 hPa). The sectioned levels are due to the vertical resolution of the vertical soundings of each side, those on the windward side having the highest vertical (as well as temporal) resolution, as well as the altitude of the locations. The selected variables are temperature (T), dewpoint temperature (Td), zonal (U) and meridional (V) wind components, and the squared Brunt–Väisälä frequency (N2). A data consistency is made by removing those inconsistent values. Missing data are detected and marked along with suspicious and out of range values, which is extremely challenging due to the event’s extreme nature.

      Figure 1.  (a) South America region with topographic height (shading, units: m) and (b) zoomed-in region over surface weather stations. Black dots and white line correspond to sounding stations and filled black dots correspond to the stations used for Zonda wind classification.

    • The Zonda wind classification is made using hourly data from three surface stations on the lee side (Fig. 1, black dots) belonging to the National Weather Service of Argentina (SMN): Mendoza Airport, Mendoza Observatory (32.9°S, 68.866°W, 827 m A. S. L.), and San Juan Airport (31.56°S, 68.5°W, 598 m A. S. L.). The predictor variables are temperature (T), dewpoint temperature (Td), surface pressure (P), and 10-m wind speed (V). The surface stations report hourly and have data records of 35 years for Mendoza Airport, 24 years for San Juan Airport, and 13 years for Mendoza Observatory. This classification is made manually (subjective) according to an abrupt increase in temperature and decrease in dewpoint temperature in conjunction with an increase in wind speed.

      Figure 2a shows the Zonda climatology for the three surface stations and for the sounding data. Note that the Zonda wind presents higher frequencies of occurrence in winter and spring (mainly from June to October). The soundings used in this work must meet the requirements of section 2.1, in addition to having soundings on both sides of the Andes on the same day. This implies a total of 58 soundings (previous to the onset of the events) that presents the maximum frequency in October. The duration of Zonda events shows maximum frequencies between two to eight hours for Mendoza’s station and longer for San Juan Airport (Fig. 2b). The onset time presents highest frequencies in the afternoon, from 1500 UTC to 2100 UTC during winter, and a second maximum between 0000 UTC to 0300 UTC in spring (Fig. 2c).

      Figure 2.  (a) Zonda events distribution for the surface stations and for sounding data used. (b) Zonda duration annual frequency distribution. (c) Zonda onset time annual frequency distribution. The straight line corresponds to the sounding hour (1200 UTC).

      Once all Zonda events and their onset times have been found, the sounding data series is constructed. For this, the 1200 UTC sounding closest to the onset time of each event is considered (according to the onset time frequency, the 1200 UTC sounding is more likely to be closer to the event than the 0000 UTC sounding, and at the Mendoza Airport, no sounding for 0000 UTC is carried out, at least in a large part of the record). For example, if the event starts at 1500 UTC, the 1200 UTC sounding on the same day is considered (i.e., three hours after the event). If the event starts at 0900 UTC, the 1200 UTC sounding of the previous day is considered (i.e., 21 hours after the event). It is in this way that these soundings represent the characteristics of the atmosphere prior to the development of the Zonda wind.

      The selection of the days without Zonda is done in such a way that the closest Zonda day is at least five days away. In this way, it is ensured that the synoptic conditions are as different as possible from those of a Zonda day. Data from a total of 116 soundings associated with Zonda and non-Zonda events are chosen to perform the PCA. This dataset is the same used in Otero and Araneo (2021).

    • The methodology for the PCA follows that used in Araneo et al. (2011) and Otero et al. (2018). The Zonda/non-Zonda probability index is obtained from a logistic regression between the PCA loading components and a vector of 0 and 1, associated with Zonda/non-Zonda. To detect the patterns capable of discriminating between Zonda and non-Zonda cases, they must first be compared with random non-Zonda conditions. The selection of those dates is carefully made so that they are not close to any Zonda events. As a necessary condition, it is considered that these days are at least five days away from any Zonda events.

      For the calculation of the PCA, the sounding data are arranged forming a matrix ${\left[\boldsymbol{X}\right]}_{5\times 116}$ ( $[{\boldsymbol{X}]}_{5\times 116}$ is a matrix of size 5 × 116) in which the rows contain the values corresponding to each pressure level for the variable to evaluate (T, Td, U, V, or N2), and each column represents the sounding for a given day. After that, the mean sounding is removed, obtaining the deviations matrix ${\tilde{\left[{{\boldsymbol{X}}}\right]}}_{5\times 116}$. This matrix is then standardized by columns. Of the 116 days, half correspond to Zonda soundings, and the other half correspond to non-Zonda soundings. In the case of Santo Domingo, the matrix has dimensions of [7 × 116] because more vertical levels are considered.

      The probability model (index) is built with the fit coefficients of a logistic regression for a binomial function. So, the logistic regression is transformed into probability values between 0 and 1. Then, for the ith element, the probability will be given by:

      where $ {w}_{i} $ is the ith element of $\boldsymbol{w}={{b}_{0}+b}_{1}{\boldsymbol{f}}_{1}+{b}_{2}{\boldsymbol{f}}_{2}+\dots $$ +{b}_{n}{\boldsymbol{f}}_{n}$, ${\boldsymbol{f}}_{1},{\boldsymbol{f}}_{2}, \dots ,{\boldsymbol{f}}_{n}$ are the principal component loadings (columns of F), $ {b}_{0},{b}_{1},\dots \dots ,{b}_{n} $ are the regression coefficients (fitted with the maximum likelihood method), and $ n $ is the number of significant components retained. Then, $ {\widehat{c}}_{i} $ is an estimator of the ith coefficient of $ \mathit{c} $ (known vector of zeros and ones, associated with Zonda/non-Zonda events) corresponding to the ith day. This estimator represents the probability that day has of being classified as Zonda or non-Zonda based on a preset cutoff threshold of the index.

      The component loadings matrix can be written as $\boldsymbol{F}={{\tilde{\boldsymbol{X}}}_{\mathrm{s}}}'{\boldsymbol{Z}}_{\mathrm{s}}/(m-1)$ where $ m $ is the number of rows (days) of $\boldsymbol{X}$. So, $\boldsymbol{w}={{\tilde {\boldsymbol{X}}}_{\mathrm{s}}}'{{\boldsymbol{Z}}^{*}_{\mathrm{s}}}{\boldsymbol{b}}^{*}/(m-1)+{b}_{0}\left[1\right]$ where ${{\boldsymbol{Z}}^{*}_{\mathrm{s}}}$ is the matrix containing the standardized score components corresponding to the predictor variables used in the model (i.e., all those components related to the significant fit coefficients $ {b}_{i} $), ${\boldsymbol{b}}^{*}$ is the vector matrix containing the coefficients $ {b}_{0},{b}_{1},\dots \dots ,{b}_{n} $, and [1] is a column vector of elements equal to 1.

      The vector $\boldsymbol{A}={{\boldsymbol{Z}}^{*}_{\mathrm{s}}}{\boldsymbol{b}}^{*}/(m-1)$ only depends on the PCA results and the regression analysis. Once $\boldsymbol{A}$ and $ {b}_{0} $ are determined from the statistical analysis described, given any standardized anomaly sounding ${\tilde {\boldsymbol{x}}}_{\mathrm{s}}$ (not necessarily belonging to this analysis), the Zonda/non-Zonda index can be estimated by the equation:

      where $\hat{c}$ represents the Zonda wind occurrence probability, serving as a useful forecast tool for that particular day.

      In order to estimate prediction errors, the leave-one-out cross-validation method is implemented. In other words, for the construction of the Zonda index, all the dates except one are used in each step, which is used for its verification. Thus, in each step, the set formed by $ n-1 $ observations is considered the fitting set. The observation which is left out is then used to test the regression model obtained with the remaining ones. This procedure is performed with each date, obtaining a verification for each particular date.

      This procedure can be generalized using all the predictor variables in a single prediction index to improve the efficiency of the forecast index. Suppose the case with 2 variables, in principle with the same number of cases $ {n} $, ${\boldsymbol{X}}_{1}$ and ${\boldsymbol{X}}_{2}$, with dimensions $ t $ and $ r $, respectively. For example, suppose that we have $ n $ days, ${\boldsymbol{X}}_{1}$ is the matrix containing the Mendoza station’s soundings for each day’s temperatures at $ t $ vertical levels, and ${\boldsymbol{X}}_{2}$ is the matrix containing the Santo Domingo station’s soundings for each day’s squared Brunt–Väisälä frequency at $ r $ vertical levels; then:

      From a separate PCA for each variable, the components of each variable $ {\boldsymbol{Z}}_{{s}_{1}} $ and $ {\boldsymbol{Z}}_{{s}_{2}} $ are obtained. Neglecting the non-significant components of each PCA (suppose $ {d}_{1} $and $ {d}_{2} $ components, respectively), the significant standardized score components are ${{\boldsymbol{Z}}^{*}_{{s}_{1}}}={\left[{{\boldsymbol{Z}}^{*}_{{s}_{1}}}\right]}_{t\times (n-{d}_{1})}$ and ${{\boldsymbol{Z}}^{*}_{{s}_{2}}}= $$ {\left[{{\boldsymbol{Z}}^{*}_{{s}_{2}}}\right]}_{r\times (n-{d}_{2})}$, and the associated matrices of eigenvectors and eigenvalues are ${{\boldsymbol{Q}}^{*}_{1}}={\left[{{\boldsymbol{Q}}^{*}_{1}}\right]}_{n\times (n-{d}_{1})}$, ${{\boldsymbol{Q}}^{*}_{2}}={\left[{{\boldsymbol{Q}}_{2}}^{*}\right]}_{n\times (n-{d}_{2})}$, ${{\boldsymbol{D}}^{*}_{1}}={\left[{{\boldsymbol{D}}^{*}_{1}}\right]}_{n\times (n-{d}_{1})}$, and ${{\boldsymbol{D}}^{*}_{2}}={\left[{{\boldsymbol{D}}^{*}_{2}}\right]}_{n\times (n-{d}_{2})}$. With these matrices as a whole, the logistic regression is carried out, correlating the $ \boldsymbol{c} $ vector with the joint matrix $\tilde {\boldsymbol{Q}}= $$ {\left({{\boldsymbol{Q}}^{*}_{1}}|{{\boldsymbol{Q}}^{*}_{2}}\right)}_{n\times (2n-{d}_{1}-{d}_{2})}$ from which the regression coefficients $\boldsymbol{b}={\left[{\boldsymbol{b}}\right]}_{(2n-{d}_{1}-{d}_{2})\times 1}$ are obtained, where $ {b}_{0} $ is the independent coefficient, the following $ {b}_{1},\dots \dots ,{b}_{n-{d}_{1}} $ correspond to the first variable $ {\boldsymbol{X}}_{1} $ (Temperature for Mendoza’s sounding), and the following $ {b}_{n-{d}_{1}+1},\dots \dots ,{b}_{n-{d}_{2}} $ correspond to the second variable $ {\boldsymbol{X}}_{2} $ (squared Brunt–Väisälä frequency for Santo Domingo’s sounding). With this data, the matrices $ {\left[{\boldsymbol{A}}_{1}\right]}_{t\times 1} $and $ {\left[{\boldsymbol{A}}_{2}\right]}_{r\times 1} $ are calculated as:

      where ${\boldsymbol{b}}_{1}$ and ${\boldsymbol{b}}_{2}$ are the column vectors formed by the coefficients $ {b}_{1},\dots \dots ,{b}_{n-{d}_{1}} $ and $ {b}_{n-{d}_{1}+1},\dots \dots ,{b}_{n-{d}_{2}} $, respectively. Finally, the joint index for these variables is obtained as:

      Note that ${\left[{\boldsymbol{A}}_{1}\right]}_{t\times 1}$ and ${\left[{\boldsymbol{A}}_{2}\right]}_{r\times 1}$ have the same dimensions of the predictor variable (i.e., number of vertical levels of Mendoza and Santo Domingo’s soundings, respectively) and that the $ \widehat{c} $ value depends on the dot product with the particular variable of the day to forecast. The value of $ \widehat{c} $ will be closer to 1 as the larger the scalar products between the particular variables and the vectors $\boldsymbol{A}$ with a positive sign, and it will be closer to 0 as less the smaller the scalar products with a negative sign be. Therefore, the $\boldsymbol{A}$ vectors can be interpreted as a "discriminant" vertical sounding of Zonda or non-Zonda situations.

    • To compare among the models and the predictive skills of each one, different metrics, obtained from the confusion matrix, are used following the corrigendum of Barnes et al. (2007) (Tables 1 and 2).

      Predicted
      Yes (Zonda)No (non-Zonda)
      Observed (Zonda)TPFN
      (Hit Zonda)(surprise)
      No event (non-Zonda)FPTN
      (Zonda False Alarm)(Hit non-Zonda)

      Table 1.  Contingency table. TP (true positives), FP (false positives), FN (false negatives), and TN (true negatives).

      ExpressionDescription
      PODTP/(TP+FN)Probability Of Detection (event-based)
      MRFN/(TP+FN) = 1−PODMiss Ratio (event-based)
      CARTP/(TP+FP)Correct Alarm Ratio (alarm-based)
      FARFP/(TP+FP) = 1−CARFalse Alarm Ratio (alarm-based)
      POFDFP/(TN+FP)Probability Of False Detection (event-based)
      MARFN/(TN+FN)Missed Alarm Ratio (alarm-based)

      Table 2.  Description of metrics used for the validation of the model.

    3.   Results
    • The objective of this work is to detect those vertical profiles that may be able to detect the development of a Zonda event. For the statistical analysis (i.e., PCA), soundings anomalies with respect to the mean sounding of each station are used.

      Figure 3a shows Mendoza’s mean sounding (full line) and the mean sounding for Zonda events on a skew-T diagram. The mean sounding is a statically stable profile throughout the vertical. The vertical profile associated with Zonda is also stable, but in this case, it is less stable (than the mean sounding) due to the lower levels warming. Dry conditions throughout the profile are observed, with a dew point depression of approximately 10°C and higher at midlevels. Surface winds are weak up to 700 hPa from a northwest direction, rotating to the west with a maximum of 45 kt (knot; where 1 kt = 0.51 m s−1) at 300 hPa. The mean sounding associated with Zonda events shows positive temperature anomalies from 850 hPa up to 600 hPa in conjunction with negative dewpoint anomalies, with a maximum dewpoint depression at 700 hPa. Above 500 hPa (approximately the height of the orographic barrier), there are no significant differences with the mean sounding, other than small negative temperature anomalies in the Zonda soundings. Positive wind anomalies are observed from 700 hPa to upper levels. The observed speeds are generally greater than mean conditions, exceeding 60 kt at 300 hPa, while directions are almost the same as in the mean sounding.

      Figure 3.  (a) Mean (solid line) and Zonda (dotted line) vertical soundings at 1200 UTC for Mendoza Airport and (b) Santo Domingo (right). Wind barbs on the left correspond to the Zonda sounding, and wind barbs on the right correspond to the mean sounding.

      The Santo Domingo station’s mean sounding presents a vertical profile with a quasi-isothermal layer between 1000 hPa and 850 hPa, mainly associated with subsidence due to the presence of the Semipermanent Pacific Anticyclone (Fig. 3b). Near-surface winds are weak and rotating to the NW at 700 hPa with a maximum speed of 60 kt at 200 hPa. The vertical windward profile associated with Zonda events presents a relatively wetter and colder environment throughout the vertical. The surface layer is less stable and the winds are more intense, with values of 45 kt at 500 hPa and a maximum of 95 kt at 200 hPa. The windward mid and upper level winds are greater than those observed on the lee side, indicating the incidence of the mountain waves (Durran, 1990).

    • The PCA is primarily used as an exploratory tool in data analysis and for making predictive models. In this work, this analysis is used to characterize the vertical structure of the atmosphere on both sides of the Andes Mountains previous to a Zonda wind event. Also, by implementing a multiple logistic regression model for a binomial function between the response vector $ \mathit{c} $ and the loadings components ${\boldsymbol{f}}_{j}$, a probability index of Zonda occurrence is defined (probabilistic predictive model) as shown in the methodology.

      For the PCA and the index construction, 116 soundings from the Mendoza and Santo Domingo stations at 1200 UTC are taken, constructing the anomaly matrix for the selected variables and levels. Half of these soundings correspond to Zonda wind events and the other half to random dates, where the presence of Zonda was not recorded (see their selection in methodologies). To obtain the index, an iteration is carried out in the PCA, with 115 soundings and leaving 1 out to evaluate the efficiency of the model for each cut-off value of the index. This procedure is carried out for each date, obtaining the Principal Components ($\boldsymbol{F}$), the regression coefficients ($ {b}_{0},{b}_{1},\dots \dots ,{b}_{n} $, related to the predictors used in the regression), the discriminant sounding ($\boldsymbol{A}$ vector, see section 2.2.3), the standardized score components (${{\boldsymbol{Z}}^{*}_{\mathrm{s}}}$), and the eigenvalues ($\boldsymbol{D}$) and eigenvectors ($\boldsymbol{Q}$).

      After performing the PCA, an index value is obtained for each date. Taking into account that the Zonda and non-Zonda events are known in advance, the obtained index values for each date are assigned to each class, and their distribution is analyzed. The model efficiency depends on the selected cut-off value of the index. Let’s firstly consider the model for the Mendoza soundings (lee side) for temperature.

    • The index distribution for Zonda (dark) and for non-Zonda (light) classes, using only the vertical profile of temperature at Mendoza, the total efficiency for each cut-off value and a boxplot that indicates the dispersion of the index for the two classes are shown in Figs. 4a, 4b and 4c respectively. An index value equal to one corresponds to the Zonda class and an index value equal to zero corresponds to the non-Zonda class, whiskers represent the 3%–97% interval). If, for example, an index cut-off value of 0.5 is chosen, the number of hits for Zonda events is 80.4%, while the non-Zonda events hits represent 58.4%, and the total efficiency is below 70%. The total efficiency (Fig. 4b) reveals that, for this particular model, the best cut-off point (i.e., the value associated to the maximum efficiency) is located at 0.41, giving a total efficiency of 72.23% (86.44% hits of Zonda events and 53.84% for non-Zonda). Thus, the maximum effectiveness of the index is 72.23%, with an error of 27.77%, divided into 6.77% probability of surprise (i.e., Zonda cases that present an index value lower than the cut-off and therefore are predicted as non-Zonda) and 21% probability of false alarm (i.e., cases of non-Zonda that present an index value higher than the cut-off, so they are predicted as Zonda events). Those error values can be modified by changing the cut-off value of the index. If this value is closer to one, the non-Zonda surprises are reduced, but Zonda false alarms are augmented. Likewise, if the cut-off value is closer to zero, the Zonda false alarms are reduced and the non-Zonda surprises are augmented.

      Figure 4.  Leeside model for temperature. (a) Index values distribution for Zonda (dark) and for non-Zonda (light) events, (b) model efficiency according to each cut-off value, and (c) index boxplot.

      For example, a cut-off value of 0.78 (50.54% efficiency) yields 1.89% and 47.56% for false alarms and surprises, respectively. Furthermore, looking at the boxplot diagrams (Fig. 4c) and their dispersion, index values higher than 0.78 correspond to the right tail of the 3rd percentile for the non-Zonda cases, while index values lower than 0.2 correspond to the left tail of the 97th percentile for Zonda cases. Therefore, for a given new sounding for which the index is calculated, values greater than 0.78 would indicate a Zonda occurrence, with an error lower than 3%. Likewise, an index value lower than 0.2 would indicate a non-Zonda occurrence with the same error. The range between those index values represents an uncertainty interval in which the index fails to discriminate between classes (at least with the previous error rates).

      The same analysis is done for the windward side using the Santo Domingo vertical soundings and using both soundings at the same time (same variables). Figure 5 shows the index distribution for the zonal component of wind for the leeward (Fig. 5a), windward (Fig. 5b), and mixed (Fig. 5c) models. The windward model efficiency surpasses that for the lee side. However, the windward true positives (82.36%) are considerably lower than those obtained for the lee side (91.53%), but the true negatives and false alarms results are better. If both soundings are combined using the same variables, the model efficiency increases for some variables. The total efficiency (Fig. 5) reaches 83.57% for the zonal component of wind, slightly higher than the windward model. Again, the true positives are lower than for the leeside model, but higher than for the windward model, and false alarms are substantially reduced. Observe that the separation between classes (boxplot) is more evident in the mixed model. This indicates that moving the cut-off value of the index to the left leads to a large increase in true positives with relatively fewer surprises, but there is a higher false alarm rate and vice versa for moving the cut-off value to the right.

      Figure 5.  Same as Fig. 4 but for the zonal component model. (a) Leeside sounding, (b) windward sounding, and (c) mixed model.

      If different variables for each side are combined, a new mixed model is obtained. Figure 6a shows the mixed model using temperature (T) for the lee side and U-wind for the windward side. Here, the model efficiency is maximum (86.23%), and separation between classes is greater. Note that an index value lower than 0.18 leads to 3% of Zonda surprises and a value greater than 0.9 leads to 3% of Zonda false alarms (whiskers are 3%–97%). The total efficiency for all possible combinations with one variable for each side is shown in Fig. 6b. The highest efficiency is achieved using temperature for the lee side and the U-wind for the windward side (a total efficiency of 86.31%). It is worth noting that the zonal component of the wind as a predictor is a key factor on both sides, followed by temperature on the lee side and by the meridional component on the windward side. Although this model increases the total efficiency (i.e., decreases the total error) and decreases the false alarms, the true positives and surprises results are better when only considering the leeside soundings, and the true negatives results are better for the windward soundings.

      Figure 6.  (a) 1-var mixed model for temperature for the lee side and zonal component of wind for the windward side and (b) the total efficiency for all possible combinations with one variable on each side.

      Table 3 shows the total efficiency, the Zonda hits (true positives), the non-Zonda hits (true negatives), and the total error separated into surprises and false alarms for all variables and models. These results show that the U-wind component is the best variable predictor for all models, with a maximum efficiency of 83.57% for the mixed model. Also, the true positives are maximized for the leeward model with 91.53% and a minimum of surprises of 4.23% (leeside model, U), and true negatives and a minimum of false alarms are best predicted for the mixed model (U).

      (a) Leeside Model 1-var
      var MDZTotal efficiencyBest cut-off value
      True positivesTrue negativesErrorSurprisesFalse alarms
      T71.6486.4553.8428.366.7821.59
      Td70.5688.9548.1229.445.5323.91
      U79.3591.5364.2820.654.2316.41
      V57.1146.1261.442.8926.9415.95
      N267.3365.965.1632.6717.0515.62
      (b) Windward Model 1-var
      var CHTotal efficiencyBest cut-off value
      True positivesTrue negativesErrorSurprisesFalse alarms
      T65.9149.4577.5934.0925.288.82
      Td68.3267.2466.931.6816.3815.3
      U83.0682.3682.6716.948.828.12
      V72.16726927.841413.84
      N258.1568.5241.3341.8515.7426.11
      (c) Mixed Model 1-var
      var MDZ-CHTotal EfficiencyBest cut-off value
      True PositivesTrue NegativesErrorSurprisesFalse Alarms
      T78.8486.8367.3621.166.5914.57
      Td70.8883.1654.9529.128.4220.7
      U83.5777.9786.5716.4311.025.41
      V72.5272.6769.4127.4813.6613.82
      N267.2162.9168.0332.7918.5414.25

      Table 3.  Model metrics for the best cut-off index value. MDZ, Mendoza’s sounding, CH, Santo Domingo’s sounding and MDZ-CH, mixed model for same variable on each side.

    • To compare models’ skills, their metrics are shown in Table 4. Following an event-based statistic, with a POD value of 95.58% the leeward U model is the best of all. This model also has the lower MR. The POFD result is lower for the mixed model U. From an alarm-based perspective (which may be more important for a forecasting point of view), the models’ performances show that the mixed model result is better than the others, except for the MR. The False Alarm Ratio (FAR) is below 8%, and the CAR is above 92% for the mixed model U.

      (a) Leeside Model 1-var
      Leeside varPODBest cut-off value
      POFDMRFARMARCAR
      T92.7330.007.2721.0711.1878.93
      Td94.1535.035.8522.5810.3077.42
      U95.5821.754.4216.336.1883.67
      V63.1323.9236.8729.5030.5070.50
      N279.4421.1020.5620.9120.7479.09
      (b) Windward Model 1-var
      Windward varPODBest cut-off value
      POFDMRFARMARCAR
      T66.1712.6233.8318.4824.5781.52
      Td80.4119.8319.5919.7519.6780.25
      U90.339.499.679.529.6490.48
      V83.7218.3416.2817.7116.8782.29
      N281.3241.5218.6829.9827.5870.02
      (c) Mixed Model 1-var
      Combined varPODBest cut-off value
      POFDMRFARMARCAR
      T92.9519.507.0515.828.9184.18
      Td90.8029.089.2021.3213.2978.69
      U87.627.2012.387.9311.2992.07
      V84.1718.0515.8317.3916.4582.61
      N277.2419.0222.7620.2621.4279.74

      Table 4.  Model metrics for the best-cut value. MDZ, Mendoza’s sounding, CH, Santo Domingo’s sounding, and mixed model for same variable on each side. POD (probability of detection), POFD (probability of false detection), MR (miss ratio), CAR (correct alarm ratio), FAR (false alarm ratio), and MAR (missed alarm ratio).

      Figure 7 shows all metrics used for the mixed model, where the dark colors represent an enhancement of the models. The true positives are mainly controlled by the lee side (Fig. 7a). The leeside model surpasses the mixed model for T, Td, and N2. For the mixed model, the true positives are increased using U-wind, followed by temperature and dewpoint on the lee side. On the windward side, the squared Brunt–Väisälä frequency, aside from U, also increases the true positives. To the contrary, the true negatives (Fig. 7d) appear to be controlled by the windward side, in a similar case as the true positives. According to the definition of the metrics, the surprises (Fig. 7g), POD (Fig. 7e), and MR (Fig. 7c) result in similar patterns to those of the true positives and, false alarms (Fig. 7b), and FAR (Fig. 7f), and POFD (Fig. 7h) results in similar patterns to the true negatives. The MAR (Fig. 7i) is a combination of both patterns.

      Figure 7.  1-var mixed model metrics: true positives (a) and negatives (d), surprises (g) and false alarms (b), POD (e), POFD (h), MR (c), FAR (f), and MAR (i).

      So, it is clear that the leeside soundings are better predictors for the detection of Zonda events, while the windward side helps to improve false detections and alarms.

    • The discriminant sounding ($\boldsymbol{A}$ vector, see section 2.2.3) between Zonda and non-Zonda classes obtained from the PCA is shown in Fig. 8. For the lee side (Fig. 8a), this vertical sounding is characterized by positive temperature anomalies between 850 hPa and 600 hPa and negative anomalies in upper levels. The low-level temperature increase is accompanied by positive anomalies of dewpoint near the surface and negative anomalies between 800 hPa and 500 hPa. The zonal component of wind shows negative anomalies at low levels with two relative maximums of positive anomalies at 700 hPa and 400 hPa, while the meridional component is the opposite of the previous one (not shown). The squared Brunt–Väisälä frequency has positive anomalies at low and upper levels with negative anomalies at midlevels.

      Figure 8.  (a) Discriminant sounding between Zonda and non-Zonda classes for the 1-var models for the lee side and (b) for the windward side. Since the discriminant soundings are anomalies with respect to the mean profile, temperature and dewpoint temperature variables are multiplied by a factor of five and added to the mean sounding to better appreciate the differences with the mean sounding. The zonal component (U) of the wind is multiplied by a factor of 10 as anomalous values (i.e., the mean sounding is not added). The meridional component is omitted due to its lesser relevance in the models. The squared Brunt–Väisälä frequency presents the original values of the discriminant sounding.

      This leeside discriminant sounding represents a sounding that tends to become unstable at midlevels due to strong heating at low levels and cooling at midlevels, with greater drying at 700 hPa and moistening near the surface. Above this layer, the sounding tends to normalize in humidity and temperature, approaching the climatological one, but with positive anomalies of zonal wind and more static stability at 400 hPa. According to this profile, a stable layer near the surface is present and the windstorm probably is already present at 700 hPa (the soundings are previous to the event at land level) with a NW component of wind. Also, the upper-level jet streak could be present at this time.

      For the windward model (Fig. 8b), the vertical sounding is characterized by a temperature positive anomaly near the mountain top, between 500 hPa and 400 hPa, with a stable region above this layer. The layer below this region has negative anomalies of the squared Brunt–Väisälä frequency, with U-wind anomaly maximums at 700 hPa and near the surface with opposite values. The meridional component acts only at low levels (not shown).

      So, in accordance with the highest efficiency values to discriminate between classes, a vertical sounding that presents positive U-wind (westerly) anomalies at 700 hPa on the windward side and two maximums of positive anomalies on the lee side are the key predictors for Zonda occurrence. The leeside U-wind anomalies correspond to the presence of Zonda at midlevels and to the jet streak in upper levels. This vertical structure could be accompanied by a temperature inversion (and a stable layer above) near the mountain top windward side and significant drying at midlevels on the lee side.

    • If more variables are added to the model, an efficiency gain is observed, especially for the lee side. Figure 9 (a, d, g and j) shows the best result (i.e., best variables combinations) of the metrics for the leeside models by adding more variables. The total efficiency and the true negatives present a considerable increase from one variable to two, with little variation when adding more than two. The efficiency goes from 80% to 88%, and the true negatives go from 65% to 82%. The true positives seem not to change much by adding more variables and present a maximum for two variables. False alarms are also reduced, but surprises are not. The event-based statistic presents a considerable diminution in the POFD and a rise in the POD between one and two variables, but the MR remains almost constant. From an alarm-based perspective, the CAR and the FAR are improved, and the missing alarm ratio does not present any changes.

      Figure 9.  Total efficiency, true positives and true negatives, false alarms, POD (right axis), POFD, MR, CAR (right axis), FAR, and MAR for the lee side (a, d, g, and j), for the windward side (b, e, h, and k), and for mixed models (c, f, i, and l).

      For the windward side (Figs. 9b, e, h and k), the total efficiency and true negatives increase almost linearly by adding more variables, while true negatives sightly increase adding more variables. Surprises decrease 2 point up to four variables, and the false alarms highly improve going from one to two variables. The event-based statistic presents almost no change in the POFD (or even decrease), the POD linearly rises, and the Missing Ratio presents a diminution. From an alarm-based perspective, the False Alarm Ratio and the Missing Alarm Ratio decrease, improving the model. The Correct Alarm Ratio increases 1.5 times going from two to four variables.

      The mixed model (Figs. 9c, f, i, and l) presents its maximums of true negatives and positives for two and three variables, while the total efficiency reaches its maximum for four variables but almost with a small change for two, three, and four variables. In this model, the false alarms seem not to improve by adding variables, as observed for the previous models. From an event-based perspective, the POD increases 4 point (from 94.94% to 98.9%) from one to four variables and the MR also presents an improvement, but the POFD does not. Finally, the MAR is highly reduced, while the CAR and FAR reach their optimal values for three variables.

      It is clear that the mixed models turn out to be significantly better than the separate models. On the other hand, adding a greater number of variables to the mixed model does not always improve the metrics, as is the case of the true positives and negatives, POFD, CAR, and FAR, whose maximums are reached with two or three variables and not with four (Table 5).

      MixedEfficiencyTrue PositivesTrue NegativesSurprisesFalse AlarmsPODPOFDMRFARMARCAR
      1-var86.3190.3681.834.825.4194.947.205.067.935.7192.07
      2-var89.6694.3486.972.836.5297.096.973.346.913.7493.09
      3-var90.4194.1687.662.915.8197.006.173.006.363.3693.64
      4-var90.7790.8881.380.866.5698.907.471.057.921.3692.94

      Table 5.  Mixed model metrics for the best combination.

      Figure 10 shows the best mixed models (considering the total efficiency) for two, three, and four variables. As can be noted, the separation between classes is more evident (than the 1-var mixed model), causing the uncertainty interval to decrease considerably. The Zonda 25th–75th percentile intervals are reduced to the index interval [0.8–1] (the 1-var mixed model is [0.7–0.9]), while for non-Zonda cases, this interval is reduced to [0–0.2] (the 1-var mixed model is [0.0.5–0.45]). This value is greater than the 3rd percentile of the Zonda cases. For the 2-var mixed model (Fig. 10a), an index value lower than 0.19 leads to only 3% surprises for Zonda events. Unlike the 1-var mixed model, the 25th percentile, which was located at 0.7, is above 0.78, and the 75th percentile moves towards a value of 0.98 in the 4-var mixed model (Fig. 10c). In this way, surprises are considerably reduced. The total efficiencies for all two-variable mixed models (figure not shown) clearly show maximums by considering T-U (89.66%) and T-N2 (89.2%) for the lee side combined with T-U for the windward side.

      Figure 10.  Best mixed models considering the total efficiency for two, three, and four variables. (a) 2-var mixed model for temperature and zonal component for each side. (b) 3-var mixed model for temperature, zonal component, and squared Brunt–Väisälä for both sides. (c) 4-var mixed model for temperature, zonal and meridional components, and squared Brunt–Väisälä for the lee side and temperature, dewpoint, zonal component, and squared Brunt–Väisälä for the windward side.

      Table 6 shows the best combinations of variables for the leeside, windward, and mixed models with two, three, and four variables. It can be seen that the variables U, N2, and T and Td (to a lesser extent) are the best classifiers between Zonda and non-Zonda soundings. In addition, note the little difference that exists by adding a greater number of variables. For example, for the mixed model, there is a difference of 1.02 and 1.11 points between the 2-var model and 3-var and 4-var models, respectively. These differences are greater when considering the leeward model, where the differences in efficiency between two and three variables is almost 2.5%, and between three and four variables the difference is 0.5%. The windward models do not seem to be very sensitive to the addition of variables.

      (a) Leeside Model 2-var
      Efficiencywindward model 2-varEfficiencymixed model 2-varEfficiency
      U-N284.83U-V84.12T-U−T-U89.66
      T-U83.28U-N283.21T-N2T-U89.21
      U-V79.83Td-U83.08T-N2−U-V89.07
      Td-U79.44T-U82.78T-U−U-V88.84
      T-Td77.11T-V75.35T-U−U-N288.84
      T-N273.75V-N275.06T-U−Td-U88.68
      T-V72.46Td-V73.78T-Td−U-V88.41
      Td-N271.08T-Td70.93T-V−T-U88.33
      Td-V70.78T-N268.59T-Td−T-U88.27
      V2-N269.18Td-N268.53T-N2−U-N288.25
      (b) Leeside Model 3-var
      Efficiencywindward model 3-varEfficiencymixed model 3-varEfficiency
      T-U-N287.26U-V-N285.34T-U-N2−T-U-N290.68
      Td-U-N284.38Td-U-N284.97T-Td-U−T-U-N290.41
      T-U-V83.87T-U-V83.71T-Td-U−T- Td -U90.03
      U-V-N283.81Td-U-V83.66T- Td -U−T-U-V89.94
      T-Td-U83.47T-U-N283.28T-V-N2−T-U-N289.86
      Td-U-V80.72T-Td-U83.25T-U-N2−T- Td -U89.78
      T-Td-N278.10T-V-N277.51T-U-V−T-U-N289.63
      T-Td-V78.01T-Td-V76.62T-U-N2− Td -U-N289.39
      T-V-N274.25Td-V-N274.85T- Td -V−T-U-N289.22
      Td-V-N272.75T-Td-N274.43T-U-N2U-V-N289.13
      (c) Leeside Model 4-var
      Efficiencywindward model 4-varEfficiencymixed model 4-varEfficiency
      T-Td-U-N287.75Td-U-V-N286.34T-U-V-N2−T-Td-U-N290.77
      T-U-V-N287.25T-U-V-N285.65T-Td-U-V−T-Td-U-N290.26
      Td -U-V-N285.48T-Td-U-N285.60T-Td-U-N2−T-Td-U-N290.22
      T-Td-U-V84.64T-Td-U-V84.86T-Td-U-N2T-U-V-N289.99
      T-Td-V-N279.01T-Td-V-N279.47T-Td-V-N2−T-U-V-N289.83

      Table 6.  Best 10 mixed models considering the total efficiency for two and three variables and best 5 mixed models for four variables. First column: leeside 2-, 3- and 4-var best 10 models, with second column for the windward side and third column for the mixed models.

      From Table 6, it is also concluded that, regardless of whether there is a single sounding or soundings for both (windward and leeward) for the forecasting day, the efficiency values of the models presented here can be high, with a difference of less than 5% (this is the case between the windward and mixed models). However, the presence of the leeside sounding is of greater importance in obtaining greater precision in the detection of Zonda events and in reducing false alarms.

    • The combination of soundings on both sides of the Andes produces a slight change in the obtained profiles due to the change in the fit coefficients of the new model. The discriminant sounding between Zonda and non-Zonda classes obtained from the PCA is shown in Fig. 11 for the 4-var mixed model.

      Figure 11.  Discriminant sounding between Zonda and non-Zonda classes for the 4-var model (a) for the lee side and (b) for windward side.

      This model presents a leeside (Fig. 11a) vertical structure characterized by positive temperature anomalies between 850 hPa and 650 hPa and a minimum of negative anomalies at midlevels, which tends to represent an unstable profile. The vertical structure of dewpoint presents negative anomalies at 700 hPa and a maximum of positive anomalies near the surface. The vertical structure of U anomalies correlates well with the presence of vertically (backward) propagating waves, with an alternating pattern of negative and positive anomalies, with a maximum of negative anomalies at 850 hPa and positive anomalies at 400 hPa. The vertical sounding is accompanied by a stable layer at 400 hPa and a statically more unstable layer at midlevels.

      The windward (Fig. 11b) discriminant sounding presents positive anomalies at 850 hPa and negative anomalies in upper levels. Similarly to the lee side, this sounding becomes unstable. This vertical sounding does not present a temperature inversion like the 1-var model (Fig. 8b). Positive anomalies of dewpoint can be seen near the surface and at 500 hPa, while the negative anomalies are observed at 800 hPa and at 400 hPa. The U-wind anomalies present strong maximums at 700 hPa and near the surface, with opposite values. Like for the lee side, positive and negative changes in the zonal component anomalies can be seen and the vertical sounding is accompanied by a stable layer above 450 hPa and a statically more unstable layer at 500 hPa.

    4.   Conclusions and discussion
    • This paper represents an improvement over work previously carried out. In Otero et al. (2018), soundings were used for T and Td along with surface values for the same variables only for the lee side. The development of the Zonda phenomenon is closely related to the vertical structure of the atmosphere, where the mountain height, the buoyancy frequency, and incident wind control the mountain wave activity and the dynamics of downslope wind, which the previous work did not consider. Here, the characteristics on both sides of the barrier of stability, wind, temperature, and humidity that are more favorable for this type of event could be known. Thus, in this new paper, progress in understanding the characteristics of the atmosphere, both windward and leeward, prior to the development of a Zonda event, is made. The efficiency values obtained in this new work are even higher, without the need of surface data. Statistical values (event-based and alarm-based model metrics) associated with the model were also recognized and can provide additional information for decision makers.

      The results show that this index model is able to discriminate between Zonda and non-Zonda events with an effectiveness close to 80% using only one variable on one side of the barrier, increasing to almost 84% by using one variable on each side of the barrier. These values of effectiveness in class separation increase to almost 91% using four variables on each side of the Andes. The number of Zonda hits is mainly captured by the leeside soundings, while the non-Zonda hits are mostly controlled by the windward soundings. The soundings combination in the mixed model produces better class separation and significantly reduces false alarms. The correlation values between the real index and the estimated index increase considerably when adding variables, reaching values of more than 0.84.

      It is found that the zonal component of the wind (U-wind) on both sides and the windward temperature are the key variables in class discrimination. The true positives and surprises are mainly controlled by the leeside sounding, while the POFD improves with the mixed model (i.e., by adding the windward sounding). Even though mixed model maximum efficiency, surprises, POD, MR, and MAR turn out to be with four variables on each side, the true positives and negatives, POFD, CAR, and FAR reach their maximums with two or three variables.

      The vertical profile that best succeeds to discriminate between Zonda and non-Zonda is characterized by a leeside sounding that tends to become unstable at midlevels and presents two wind maxima, possibly associated with the foehn effect at low levels and the presence of the jet streak at high levels. On the windward side, there is also a wind maximum at 700 hPa, accompanied by a relatively more stable layer near the top of the barrier.

      The methodology used in this work could be replicated in any region that has sounding data (preferable on both sides of the barrier). Although the results obtained may vary from region to region, it could represent a valuable tool for this type of phenomena that are so difficult to forecast. After determining the discriminant vector A from the PCA and the multiple regression model, the probability index for Zonda/non-Zonda occurrence can be easily obtained to forecast the Zonda occurrence up to 24 hours in advance.

      This work ensures, in a preliminary way, that the PCA is a useful tool in the detection of the Zonda phenomenon. However, using more data (such as surface values or reanalysis) could significantly improve the efficiency of the model. In addition, the monthly anomalous soundings were tested, but the model did not significantly improve, so it was decided not to consider separate seasons. The seasonality could affect the frequency of events but not the atmospheric conditions (synoptic and local) under which they originate.

Reference

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return