Advanced Search
Article Contents

A Tutorial Review of the Solar Power Curve: Regressions, Model Chains, and Their Hybridization and Probabilistic Extensions

doi:  10.1007/s00376-024-3229-4

  • Owing to the persisting hype in pushing toward global carbon neutrality, the study scope of atmospheric science is rapidly expanding. Among numerous trending topics, energy meteorology has been attracting the most attention hitherto. One essential skill of solar energy meteorologists is solar power curve modeling, which seeks to map irradiance and auxiliary weather variables to solar power, by statistical and/or physical means. In this regard, this tutorial review aims to deliver a complete overview of those fundamental scientific and engineering principles pertaining to the solar power curve. Solar power curves can be modeled in two primary ways, one of regression and the other of model chain. Both classes of modeling approaches, alongside their hybridization and probabilistic extensions, which allow accuracy improvement and uncertainty quantification, are scrutinized and contrasted thoroughly in this review.
  • 加载中
  • Figure 1.  A typical scatter between 100-m hub-height wind speed and wind power; data is obtained from a real wind power plant. Brighter colors denote more points in the neighborhood.

    Figure 2.  A typical scatter between GHI and PV power; data are obtained from a real PV plant. Brighter colors denote more points in the neighborhood. (The normalized solar power does not reach 1 because the standard test condition, under which the nominal power is determined, is almost impossible to meet during operation.)

    Figure 3.  Schematic of irradiance-to-power conversion via a typical model chain. A model chain takes GHI as the main input and outputs PV power. An arrow going into a block indicates a required input, whereas an arrow leaving a block indicates the output.

    Figure 4.  Clear-sky GHI time series modeled using REST2, at Table Mountain (40.125° N, 105.237° W), United States, over September 2018, alongside the satellite-derived irradiance from the National Solar Radiation Database (NSRDB) and ground-based measurements from the Surface Radiation Budget Network (SURFRAD). The time on the $x$-axis is local time.

    Figure 5.  NSRDB’s BNI (powered by MERRA-2) time series plot for six selected days in 2018 and 2019, at Bondville (40.052° N, 88.373° W), United States. This plot exemplifies the exceptional, but legitimate, sudden changes in clear-sky BNI, which are caused by surges and a lack of temporal interpolation of the hourly aerosol optical depth.

    Figure 6.  One-minute diffuse fraction prediction using the logistic function, BRL, Engerer2, and Yang4 models, using data from Carpentras (44.083°N, 5.059°E), France, over 2017. Measurements are shown as the gray background, and predictions are shown as scatters. Brighter colors denote more points in the neighborhood.

    Figure 7.  Illustration of a differential solid angle and its representation in polar coordinates.

    Figure 8.  Illustration of the three-part geometrical framework used in the original Perez model, with respect to a horizontal plane.

    Figure 9.  Illustration of the three-part geometrical framework used in the original Perez model, with respect to a tilted plane.

    Figure 10.  Incidence and refraction angles in media with refractive indices $ n_1 $ and $ n_2 $.

    Figure 11.  Relative transmittances for beam radiation ($ \tau_b $) estimated using different models, as functions of $ 1/\cos\theta $, with $ 0^\circ\leqslant \theta\leqslant 85^\circ $. Model parameters are: $ b_0 = 0.05 $ is used for the ASHRAE model (Duffie and Beckman, 2013); $ \beta_0=1 $, $ \beta_1 = -2.438\times10^{-3} $, $\beta_2= $ $ 3.1003\times 10^{-4} $, $ \beta_3=-1.246\times10^{-5} $, $ \beta_4=2.11\times10^{-7} $, $ \beta_5=-1.36\times10^{-9} $ are used for the King model (King et al., 2004); and $ a_r = 0.173 $ is used for the Martin model (Martin and Ruiz, 2001). The physical model follows Eq. (41) with $ n_\text{PV} = 1.3 $.

    Figure 12.  Relative transmittances for diffuse radiation ($ \tau_d $) and ground-reflected radiation ($ \tau_g $) estimated using different models, as functions of module tilt angle $ S $, with $ 0^\circ\leqslant S\leqslant 90^\circ $. Model parameters are: $ a_r = 0.173 $ is used for the Martin model (Martin and Ruiz, 2001); $ n_\text{PV} = 1.526 $ and $ n_T = 1.4585 $ are used for the Xie model (Xie et al., 2022). The references for the other two models appearing in this figure are Brandemuehl and Beckman (1980) and Marion (2017).

    Figure 13.  The $ I $$ V $ curves of a Canadian Solar CS5P-220M module, under the incident irradiance of 800 W m−2, and varying cell temperature (0°C–60°C).

    Figure 14.  The design of an actual roof-top PV system with a total DC capacity of 103.04 kWp: (a) module layout and (b) single-line diagram. Information courtesy of Licheng LIU, RENOVA, Inc., Singapore.

    Figure 15.  Equivalent circuit of a PV module/cell—the multi-diode model (see text for a description of the symbols in this figure.

    Figure 16.  (a) The $ I $$ V $ curves and (b) the corresponding $ P $$ V $ curves of a JA Solar JAM72S20-460/MR module, under various operating conditions. The maximum power points are marked with dots. Three $\rm{pvlib}$ functions are used: "$\rm{fit\_desoto}$" estimates the five parameters of the one-diode model at STC according to the electrical parameters given in the datasheet, $\rm{calcparams\_desoto}$" estimates the five parameters for various operating conditions, and "$\rm{singlediode}$" retrieves the $ I $$ V $ curves.

    Figure 17.  Illustration of diffuse self-shading.

    Figure 18.  Average masking angle (a) and relative diffuse irradiance (b) as a function of tilt angle for various ground coverage ratios.

    Figure 19.  Illustration of beam self-shading.

    Figure 20.  (top) The efficiency curves of the Huawei SUN2000-100KTL-USH0 inverter under three fixed $ V_\text{dc, inv} $ values each with varying $ P_\text{dc, inv} $, modeled using the AC model of King et al. (2004). (bottom) Zoomed view over the region $ 5\% < \eta_\text{inv}^\text{King} < 100\% $.

    Figure 21.  Outline of a general hybrid PV power forecasting process, involving the post-processing of the raw NWP output, the optimal model chain selection, and creating the PV power forecasts based on the physical predictors by a machine-learning model (Mayer, 2022a).

    Figure 22.  Schematics of (a) ensemble NWP and (b) ensemble model chain, where each circle represents a component model. The red paths mark the “best-guess” predictions, whereas the blue paths exemplify the member trajectories (Mayer and Yang, 2023b).

  • Abdeen, E., M. Orabi, and E. S. Hasaneen, 2017: Optimum tilt angle for photovoltaic system in desert environment. Solar Energy, 155, 267−280,
    Acikgoz, H., 2022: A novel approach based on integration of convolutional neural networks and deep feature selection for short-term solar radiation forecasting. Applied Energy, 305, 117912
    Ahmed, R., V. Sreeram, Y. Mishra, and M. D. Arif, 2020: A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization. Renewable and Sustainable Energy Reviews, 124, 109792
    Antonanzas-Torres, F., R. Urraca, J. Polo, O. Perpiñán-Lamigueiro, and R. Escobar, 2019: Clear sky solar irradiance models: A review of seventy models. Renewable and Sustainable Energy Reviews, 107, 374−387,
    Appelbaum, J., and J. Bany, 1979: Shadow effect of adjacent solar collectors in large scale systems. Solar Energy, 23, 497−507,
    Appelbaum, J., Y. Massalha, and A. Aronescu, 2019: Corrections to anisotropic diffuse radiation model. Solar Energy, 193, 523−528,
    Armstrong, J. S., 2001: Combining forecasts. Principles of Forecasting: A Handbook for Researchers and Practitioners, J. S. Armstrong, Ed., Springer, 417−439,
    Ayompe, L. M., A. Duffy, S. J. McCormack, and M. Conlon, 2010: Validated real-time energy models for small-scale grid-connected PV-systems. Energy, 35, 4086−4091,
    Bacher, P., H. Madsen, and H. A. Nielsen, 2009: Online short-term solar power forecasting. Solar Energy, 83, 1772−1783,
    Barry, J., D. Böttcher, K. Pfeilsticker, A. Herman-Czezuch, N. Kimiaie, S. Meilinger, C. Schirrmeister, H. Deneke, J. Witthuhn, and F. Gödde, 2020: Dynamic model of photovoltaic module temperature as a function of atmospheric conditions. Advances in Science and Research, 17, 165−173,
    Beyer, H. G., J. Betcke, A. Drews, D. Heinemann, E. Lorenz, G. Heilscher, and S. Bofinger, 2004: Identification of a general model for the MPP performance of PV-modules for the application in a procedure for the performance check of grid connected systems. Proceedings of the 19th European Photovoltaic Solar Energy Conference, Paris, France, 1−5.
    Blaga, R., A. Sabadus, N. Stefu, C. Dughir, M. Paulescu, and V. Badescu, 2019: A current perspective on the accuracy of incoming solar energy forecasting. Progress in Energy and Combustion Science, 70, 119−144,
    Blanc, P., and L. Wald, 2012: The SG2 algorithm for a fast and accurate computation of the position of the Sun for multi- decadal time period. Solar Energy, 86, 3072−3083,
    Brandemuehl, M. J., and W. A. Beckman, 1980: Transmission of diffuse radiation through CPC and flat plate collector glazings. Solar Energy, 24, 511−513,
    Bright, J. M., X. Y. Bai, Y. Zhang, X. X. Sun, B. Acord, and P. Wang, 2020: Irradpy: Python package for MERRA-2 download, extraction and usage for clear-sky irradiance modelling. Solar Energy, 199, 685−693,
    Bugler, J. W., 1977: The determination of hourly insolation on an inclined plane using a diffuse irradiance model based on hourly measured global horizontal insolation. Solar Energy, 19, 477−491, 03-7.
    Burger, B., and R. Rüther, 2006: Inverter sizing of grid-connected photovoltaic systems in the light of local solar resource distribution characteristics and temperature. Solar Energy, 80, 32−45,
    Cabrera-Tobar, A., E. Bullich-Massagué, M. Aragüés-Peñalba, and O. Gomis-Bellmunt, 2016: Topologies for large scale photovoltaic power plants. Renewable and Sustainable Energy Reviews, 59, 309−319,
    Cañadillas, D., H. Valizadeh, J. Kleissl, B. González-Díaz, and R. Guerrero-Lemus, 2021: EDA-based optimized global control for PV inverters in distribution grids. IET Renewable Power Generation, 15, 382−396,
    Cano, D., J. M. Monget, M. Albuisson, H. Guillard, N. Regas, and L. Wald, 1986: A method for the determination of the global solar radiation from meteorological satellite data. Solar Energy, 37, 31−39,
    Causi, S. L., C. Messana, G. Noviello, A. Parretta, A. Sarno, W. Freiesleben, W. Palz, H. A. Ossenbrink, and P. Helm, 1995: Performance analysis of single crystal silicon modules in real operating conditions. Proceedings of the 13th European Photovoltaic Solar Energy Conference, Nice, France, 1469 pp,
    Ceylan, İ., S. Yilmaz, Ö. Inanç, A. Ergün, A. E. Gürel, B. Acar, and A. İlker Aksu, 2019: Determination of the heat transfer coefficient of PV panels. Energy, 175, 978−985,
    Chen, S., P. Li, D. Brady, and B. Lehman, 2013: Determining the optimum grid-connected photovoltaic inverter size. Solar Energy, 87, 96−116,
    Chowdhury, B. H., and S. Rahman, 1987: Forecasting sub-hourly solar irradiance for prediction of photovoltaic output. Proceedings of the 19th IEEE Photovoltaic Specialists Conference, 171−176.
    Chu, Y. H., D. Z. Yang, H. X. Yu, X. Zhao, and M. Y. Li, 2024: Can end-to-end data-driven models outperform traditional semi-physical models in separating 1-min irradiance?. Applied Energy, 356, 122434
    Conceição, R., J. González-Aguilar, A. A. Merrouni, and M. Romero, 2022: Soiling effect in solar energy conversion systems: A review. Renewable and Sustainable Energy Reviews, 162, 112434
    Corripio, J. G., 2021: Insol: Solar radiation. R Package Version 1.2.2.
    Creutzig, F., P. Agoston, J. C. Goldschmidt, G. Luderer, G. Nemet, and R. C. Pietzcker, 2017: The underestimated potential of solar energy to mitigate climate change. Nature Energy, 2, 17140
    De Prada Gil, M., J. L. Domínguez-García, F. Díaz-González, M. Aragüés-Peñalba, and O. Gomis-Bellmunt, 2015: Feasibility analysis of offshore wind power plants with DC collection grid. Renewable Energy, 78, 467−477,
    De Soto, W., S. A. Klein, and W. A. Beckman, 2006: Improvement and validation of a model for photovoltaic array performance. Solar Energy, 80, 78−88,
    Dobos, A. P., 2012: An improved coefficient calculator for the California Energy Commission 6 parameter photovoltaic module model. Journal of Solar Energy Engineering, 134, 021011
    Dobos, A. P., 2014: PVWatts version 5 manual. Technical Report NREL/TP-6A20-62641.
    Dong, Z. B., D. Z. Yang, T. Reindl, and W. M. Walsh, 2013: Short-term solar irradiance forecasting using exponential smoothing state space model. Energy, 55, 1104−1113,
    Doubleday, K., S. Jascourt, W. Kleiber, and B. -M. Hodge, 2021: Probabilistic solar power forecasting using Bayesian model averaging. IEEE Transactions on Sustainable Energy, 12, 325−337,
    Duffie, J. A., and W. A. Beckman, 2013: Solar Engineering of Thermal Processes. John Wiley & Sons,
    Engerer, N. A., 2015: Minute resolution estimates of the diffuse fraction of global irradiance for southeastern Australia. Solar Energy, 116, 215−237,
    Engerer, N. A., and F. P. Mills, 2014: K PV: A clear-sky index for photovoltaics. Solar Energy, 105, 679−693,
    Engerer, N. A., and F. P. Mills, 2015: Validating nine clear sky radiation models in Australia. Solar Energy, 120, 9−24,
    Evans, D. L., and L. W. Florschuetz, 1977: Cost studies on terrestrial photovoltaic power systems with sunlight concentration. Solar Energy, 19, 255−262,
    Faiman, D., 2008: Assessing the outdoor operating temperature of photovoltaic modules. Progress in Photovoltaics: Research and Applications, 16, 307−315,
    Fu, D. S., M. Q. Liu, D. Z. Yang, H. Z. Che, and X. G. Xia, 2022: Influences of atmospheric reanalysis on the accuracy of clear-sky irradiance estimates: Comparing MERRA-2 and CAMS. Atmos. Environ., 277, 119080
    Fuentes, M., G. Nofuentes, J. Aguilera, D. L. Talavera, and M. Castro, 2007: Application and validation of algebraic methods to predict the behaviour of crystalline silicon PV modules in Mediterranean climates. Solar Energy, 81, 1396−1408,
    Fuentes, M. K., 1987: A simplified thermal model for flat-plate photovoltaic arrays. Technical Report SAND85-0330.
    Gernaat, D. E. H. J., H. S. de Boer, V. Daioglou, S. G. Yalew, C. Müller, and D. P. van Vuuren, 2021: Climate change impacts on renewable energy supply. Nature Climate Change, 11, 119−125,
    Gilman, P., A. Dobos, N. DiOrio, J. Freeman, S. Janzou, and D. Ryberg, 2018: SAM photovoltaic model technical reference update. Technical Report NREL/TP-6A20-67399.
    Gneiting, T., and M. Katzfuss, 2014: Probabilistic forecasting. Annual Review of Statistics and its Application, 1, 125−151,
    Gneiting, T., F. Balabdaoui, and A. E. Raftery, 2007: Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69, 243−268,
    Grena, R., 2012: Five new algorithms for the computation of sun position from 2010 to 2110. Solar Energy, 86, 1323−1337,
    Gschwind, B., L. Wald, P. Blanc, M. Lefèvre, M. Schroedter-Homscheidt, and A. Arola, 2019: Improving the McClear model estimating the downwelling solar radiation at ground level in cloud-free conditions – McClear-v3. Meteor. Z., 28, 147−163,
    Gueymard, C. A., 2008: REST2: High-performance solar radiation model for cloudless-sky irradiance, illuminance, and photosynthetically active radiation – Validation with a benchmark dataset. Solar Energy, 82, 272−285,
    Gueymard, C. A., 2009: Direct and indirect uncertainties in the prediction of tilted irradiance for solar engineering applications. Solar Energy, 83, 432−444,
    Gueymard, C. A., 2017a: Cloud and albedo enhancement impacts on solar irradiance using high-frequency measurements from thermopile and photodiode radiometers. Part 1: Impacts on global horizontal irradiance. Solar Energy, 153, 755−765,
    Gueymard, C. A., 2017b: Cloud and albedo enhancement impacts on solar irradiance using high-frequency measurements from thermopile and photodiode radiometers. Part 2: Performance of separation and transposition models for global tilted irradiance. Solar Energy, 153, 766−779,
    Gueymard, C. A., and J. A. Ruiz-Arias, 2016: Extensive worldwide validation and climate sensitivity analysis of direct irradiance predictions from 1-min global irradiance. Solar Energy, 128, 1−30,
    Gueymard, C. A., V. Lara-Fanego, M. Sengupta, and Y. Xie, 2019: Surface albedo and reflectance: Review of definitions, angular and spectral effects, and intercomparison of major data sources in support of advanced solar irradiance modeling over the Americas. Solar Energy, 182, 194−212,
    Hafez, B., H. S. Krishnamoorthy, P. Enjeti, U. Borup, and S. Ahmed, 2014: Medium voltage AC collection grid for large scale photovoltaic plants based on medium frequency transformers. Proceedings of 2014 IEEE Energy Conversion Congress and Exposition (ECCE), Pittsburgh, PA, USA, IEEE, 5304−5311,
    Haffaf, A., F. Lakdja, D. Ould Abdeslam, and R. Meziane, 2021: Monitoring, measured and simulated performance analysis of a 2.4 kWp grid-connected PV system installed on the Mulhouse campus, France. Energy for Sustainable Development, 62, 44−55,
    Hansen, C., 2015: Parameter estimation for single diode models of photovoltaic modules. Technical Report SAND2015-2065,
    Hay, J., and J. Davies, 1980: Calculation of the solar radiation incident on a inclined surface. Proceedings of the First Canadian Solar Radiation Data Workshop, Toronto, Ontario, Canada, 59−72.
    Heusinger, J., A. M. Broadbent, D. J. Sailor, and M. Georgescu, 2020: Introduction, evaluation and application of an energy balance model for photovoltaic modules. Solar Energy, 195, 382−395,
    Hoadley, D., 2021: Efficient calculation of solar position using rectangular coordinates. Solar Energy, 220, 80−87,
    Holmgren, W. F., C. W. Hansen, and M. A. Mikofski, 2018: Pvlib python: A python package for modeling solar energy systems. Journal of Open Source Software, 3, 884
    Hong, T., P. Pinson, S. Fan, H. Zareipour, A. Troccoli, and R. J. Hyndman, 2016: Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond. International Journal of Forecasting, 32, 896−913,
    Hottel, H. C., and A. F. Sarofim, 1967: Radiative Transfer. McGraw Hill.
    Hu, A. X., S. Levis, G. A. Meehl, W. Q. Han, W. M. Washington, K. W. Oleson, B. J. van Ruijven, M. Q. He, and W. G. Strand, 2016: Impact of solar panels on global climate. Nature Climate Change, 6, 290−294,
    Huang, J., and M. Perry, 2016: A semi-empirical approach using gradient boosting and k-nearest neighbors regression for GEFCom2014 probabilistic solar power forecasting. International Journal of Forecasting, 32, 1081−1086,
    Huang, Y. H., J. Lu, C. Liu, X. Y. Xu, W. S. Wang, and X. X. Zhou, 2010: Comparative study of power forecasting methods for PV stations. Proceedings of 2010 International Conference on Power System Technology, Zhejiang, China, IEEE, 1−6,
    Huld, T., G. Friesen, A. Skoczek, R. P. Kenny, T. Sample, M. Field, and E. D. Dunlop, 2011: A power-rating model for crystalline silicon PV modules. Solar Energy Materials and Solar Cells, 95, 3359−3369,
    Hussain, N., N. Shahzad, T. Yousaf, A. Waqas, A. Hussain Javed, S. Khan, M. Ali, and R. Liaquat, 2021: Designing of homemade soiling station to explore soiling loss effects on PV modules. Solar Energy, 225, 624−633,
    Hyndman, R. J., and G. Athanasopoulos, 2018: Forecasting: Principles and Practice. 2nd ed. OTexts.
    Ineichen, P., and R. Perez, 2002: A new airmass independent formulation for the Linke turbidity coefficient. Solar Energy, 73, 151−157, 10.1016/S0038-092X(02)000 45-2.
    Ja in, A., and A. Kapoor, 2004: Exact analytical solutions of the parameters of real solar cells using Lambert W-function. Solar Energy Materials and Solar Cells, 81, 269−277,
    Jerez, S., I. Tobin, R. Vautard, J. P. Montávez, J. M. López-Romero, F. Thais, B. Bartok, O. B. Christensen, A. Colette, M. Déqué, G. Nikulin, S. Kotlarski, E. van Meijgaard, C. Teichmann, and M. Wild, 2015: The impact of climate change on photovoltaic power generation in Europe. Nature Communications, 6, 10014 10014.
    Juban, R., H. Ohlsson, M. Maasoumy, L. Poirier, and J. Z. Kolter, 2016: A multiple quantile regression approach to the wind, solar, and price tracks of GEFCom2014. International Journal of Forecasting, 32, 1094−1102,
    Kamphuis, N. R., C. A. Gueymard, M. T. Holtzapple, A. T. Duggleby, and K. Annamalai, 2020: Perspectives on the origin, derivation, meaning, and significance of the isotropic sky model. Solar Energy, 201, 8−12,
    Kardakos, E. G., M. C. Alexiadis, S. I. Vagropoulos, C. K. Simoglou, P. N. Biskas, and A. G. Bakirtzis, 2013: Application of time series and artificial neural network models in short-term forecasting of PV power generation. Proceedings of the 48th International Universities' Power Engineering Conference (UPEC), Dublin, Ireland, IEEE, 1−6,
    King, D. L., W. E. Boyson, and J. A. Kratochvil, 2004: Photovoltaic array performance model. Technical Report SAND2004-3535,
    Laudani, A., G. M. Lozito, F. Mancilla-David, F. Riganti-Fulginei, and A. Salvini, 2015: An improved method for SRC parameter estimation for the CEC PV module model. Solar Energy, 120, 525−535,
    Lee, G., Y. Ding, M. G. Genton, and L. Xie, 2015: Power curve estimation with multivariate environmental factors for inland and offshore wind farms. Journal of the American Statistical Association, 110, 56−67,
    Lefèvre, M., A. Oumbe, P. Blanc, B. Espinar, B. Gschwind, Z. Qu, L. Wald, M. Schroedter-Homscheidt, C. Hoyer-Klick, A. Arola, A. Benedetti, J. W. Kaiser, and J. -J. Morcrette, 2013: McClear: A new model estimating downwelling solar radiation at ground level in clear-sky conditions. Atmospheric Measurement Techniques, 6, 2403−2418,
    Lim, L. H. I., Z. Ye, J. Ye, D. Z. Yang, and H. Du, 2015a: A linear identification of diode models from single I V characteristics of PV panels. IEEE Transactions on Industrial Electronics, 62, 4181−4193,
    Lim, L. H. I., Z. Ye, J. Y. Ye, D. Z. Yang, and H. Du, 2015b: A linear method to extract diode model parameters of solar panels from a single I V curve. Renewable Energy, 76, 135−142,
    Liu, L. B., G. He, M. X. Wu, G. Liu, H. R. Zhang, Y. Chen, J. S. Shen, and S. C. Li, 2023: Climate change impacts on planned supply–demand match in global wind and solar energy systems. Nature Energy, 8, 870−880,
    Lundstrom, L., 2016: CamsRad: Client for CAMS radiation service. R Package Version 0.3.0.
    Luoma, J., J. Kleissl, and K. Murray, 2012: Optimal inverter sizing considering cloud enhancement. Solar Energy, 86, 421−429,
    Macêdo, W. N., and R. Zilles, 2007: Operational results of grid-connected photovoltaic system with different inverter's sizing factors (ISF). Progress in Photovoltaics: Research and Applications, 15, 337−352,
    Malamaki, K. N. D., and C. S. Demoulias, 2014: Analytical calculation of the electrical energy losses on fixed-mounted PV plants. IEEE Transactions on Sustainable Energy, 5, 1080−1089,
    Maor, T., and J. Appelbaum, 2012: View factors of photovoltaic collector systems. Solar Energy, 86, 1701−1708,
    Marion, B., 2002: A method for modeling the current–voltage curve of a PV module for outdoor conditions. Progress in Photovoltaics: Research and Applications, 10, 205−214,
    Marion, B., 2017: Numerical method for angle-of-incidence correction factors for diffuse radiation incident photovoltaic modules. Solar Energy, 147, 344−348,
    Markovics, D., and M. J. Mayer, 2022: Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renewable and Sustainable Energy Reviews, 161, 112364
    Martin, N., and J. M. Ruiz, 2001: Calculation of the PV modules angular losses under field conditions by means of an analytical model. Solar Energy Materials and Solar Cells, 70, 25−38,
    Masters, G. M., 2013: Renewable and Efficient Electric Power Systems. 2nd ed. John Wiley & Sons.
    Mattei, M., G. Notton, C. Cristofari, M. Muselli, and P. Poggi, 2006: Calculation of the polycrystalline PV module temperature using a simple method of energy balance. Renewable Energy, 31, 553−567,
    Maxwell, E. L., 1987: A quasi-physical model for converting hourly global horizontal to direct normal insolation. Technical Report SERI/TR-215-3087.
    Mayer, M. J., 2021: Influence of design data availability on the accuracy of physical photovoltaic power forecasts. Solar Energy, 227, 532−540,
    Mayer, M. J., 2022a: Benefits of physical and machine learning hybridization for photovoltaic power forecasting. Renewable and Sustainable Energy Reviews, 168, 112772
    Mayer, M. J., 2022b: Impact of the tilt angle, inverter sizing factor and row spacing on the photovoltaic power forecast accuracy. Applied Energy, 323, 119598
    Mayer, M. J., and G. Gróf, 2020: Techno-economic optimization of grid-connected, ground-mounted photovoltaic power plants by genetic algorithm based on a comprehensive mathematical model. Solar Energy, 202, 210−226,
    Mayer, M. J., and G. Gróf, 2021: Extensive comparison of physical models for photovoltaic power forecasting. Applied Energy, 283, 116239
    Mayer, M. J., and D. Z. Yang, 2022: Probabilistic photovoltaic power forecasting using a calibrated ensemble of model chains. Renewable and Sustainable Energy Reviews, 168, 112821
    Mayer, M. J., and D. Z. Yang, 2023a: Calibration of deterministic NWP forecasts and its impact on verification. International Journal of Forecasting, 39, 981−991,
    Mayer, M. J., and D. Z. Yang, 2023b: Pairing ensemble numerical weather prediction with ensemble physical model chain for probabilistic photovoltaic power forecasting. Renewable and Sustainable Energy Reviews, 175, 113171
    Mazorra Aguiar, L., B. Pereira, P. Lauret, F. Díaz, and M. David, 2016: Combining solar irradiance measurements, satellite- derived data and a numerical weather prediction model to improve intra-day solar forecasting. Renewable Energy, 97, 599−610,
    Mejia, F. A., and J. Kleissl, 2013: Soiling losses for solar photovoltaic systems in California. Solar Energy, 95, 357−363,
    Mermoud, A., 1994: PVsyst: A user-friendly software for PV-systems simulation. Proceedings of the Twelfth European Photovoltaic Solar Energy Conference, HS Stephens, 1703−1706.
    Messenger, R. A., and J. Ventre, 2004: Photovoltaic Systems Engineering. CRC Press.
    Michalsky, J. J., 1988: The Astronomical Almanac's algorithm for approximate solar position (1950−2050). Solar Energy, 40, 227−235, 45-X.
    Micheli, L., E. F. Fernández, M. Muller, and F. Almonacid, 2020: Extracting and generating PV soiling profiles for analysis, forecasting, and cleaning optimization. IEEE Journal of Photovoltaics, 10, 197−205,
    Mondol, J. D., Y. G. Yohanis, and B. Norton, 2006: Optimal sizing of array and inverter for grid-connected photovoltaic systems. Solar Energy, 80, 1517−1539,
    Mora Segado, P., J. Carretero, and M. Sidrach-de-Cardona, 2015: Models to predict the operating temperature of different photovoltaic modules in outdoor conditions. Progress in Photovoltaics: Research and Applications, 23, 1267−1282,
    Muzathik, A. M., 2014: Photovoltaic modules operating temperature estimation using a simple correlation. International Journal of Energy Engineering, 4, 151−158.
    Na gy, G. I., G. Barta, S. Kazi, G. Borbély, and G. Simon, 2016: GEFCom2014: Probabilistic solar and wind power forecasting using a generalized additive tree ensemble approach. International Journal of Forecasting, 32, 1087−1093,
    Narang, D., R. Mahmud, M. Ingram, and A. Hoke, 2021: An overview of issues related to IEEE Std 1547−2018 requirements regarding voltage and reactive power control. Technical Report NREL/TP-5D00-77156,
    Notton, G., V. Lazarov, and L. Stoyanov, 2010: Optimal sizing of a grid-connected PV system for various PV module technologies and inclinations, inverter efficiency characteristics and locations. Renewable Energy, 35, 541−554,
    Ogliari, E., A. Dolara, G. Manzolini, and S. Leva, 2017: Physical and hybrid methods comparison for the day ahead PV output power forecast. Renewable Energy, 113, 11−21,
    Osterwald, C. R., 1986: Translation of device performance measurements to reference conditions. Solar Cells, 18, 269−279,
    Passias, D., and B. Källbäck, 1984: Shading effects in rows of solar cell panels. Solar Cells, 11, 281−291,
    Pedro, H. T. C., D. P. Larson, and C. F. M. Coimbra, 2019: A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods. Journal of Renewable and Sustainable Energy, 11, 036102
    Peratikou, S., and A. G. Charalambides, 2022: Estimating clear-sky PV electricity production without exogenous data. Solar Energy Advances, 2, 100015
    Perez, R., R. Stewart, C. Arbogast, R. Seals, and J. Scott, 1986: An anisotropic hourly diffuse radiation model for sloping surfaces: Description, performance validation, site dependency evaluation. Solar Energy, 36, 481−497,
    Perez, R., R. Seals, P. Ineichen, R. Stewart, and D. Menicucci, 1987: A new simplified version of the Perez diffuse irradiance model for tilted surfaces. Solar Energy, 39, 221−231,
    Perez, R., R. Stewart, R. Seals, and T. Guertin, 1988: The development and verification of the Perez diffuse radiation model. Technical Report SAND88-7030,
    Perez, R., P. Ineichen, R. Seals, J. Michalsky, and R. Stewart, 1990: Modeling daylight availability and irradiance components from direct and global irradiance. Solar Energy, 44, 271−289,
    Persson, C., P. Bacher, T. Shiga, and H. Madsen, 2017: Multi-site solar power forecasting using gradient boosted regression trees. Solar Energy, 150, 423−436,
    Pierro, M., F. Bucci, M. De Felice, E. Maggioni, D. Moser, A. Perotto, F. Spada, and C. Cornaro, 2016: Multi-model ensemble for day ahead prediction of photovoltaic power generation. Solar Energy, 134, 132−146,
    Pierro, M., D. Gentili, F. R. Liolli, C. Cornaro, D. Moser, A. Betti, M. Moschella, E. Collino, D. Ronzio, and D. van der Meer, 2022: Progress in regional PV power forecasting: A sensitivity analysis on the Italian case study. Renewable Energy, 189, 983−996,
    Quan, H., and D. Z. Yang, 2020: Probabilistic solar irradiance transposition models. Renewable and Sustainable Energy Reviews, 125, 109814
    Reda, I., and A. Andreas, 2008: Solar position algorithm for solar radiation applications. Technical Report NREL/TP-560-34302,
    Ridley, B., J. Boland, and P. Lauret, 2010: Modelling of diffuse solar fraction with multiple predictors. Renewable Energy, 35, 478−483,
    Rigollier, C., M. Lefèvre, and L. Wald, 2004: The method Heliosat-2 for deriving shortwave solar radiation from satellite images. Solar Energy, 77, 159−169,
    Rodríguez-Gallegos, C. D., H. H. Liu, O. Gandhi, J. P. Singh, V. Krishnamurthy, A. Kumar, J. S. Stein, S. T. Wang, L. Li, T. Reindl, and I. M. Peters, 2020: Global techno-economic performance of bifacial and tracking photovoltaic systems. Joule, 4, 1514−1541,
    Ross, R. G., 1982: Flat-plate photovoltaic module and array engineering. Proceedings of 1982 Annual Meeting of the American Section of the International Solar Energy Society, 909−914.
    Roulston, M. S., and L. A. Smith, 2003: Combining dynamical and statistical ensembles. Tellus A: Dynamic Meteorology and Oceanography, 55, 16−30,
    Ruiz-Arias, J. A., and C. A. Gueymard, 2018: Worldwide inter-comparison of clear-sky solar radiation models: Consensus-based review of direct and global irradiance components simulated at the earth surface. Solar Energy, 168, 10−29,
    Sauer, K. J., T. Roessler, and C. W. Hansen, 2015: Modeling the irradiance and temperature dependence of photovoltaic modules in PVsyst. IEEE Journal of Photovoltaics, 5, 152−158,
    Schlick, C., 1994: An inexpensive BRDF model for physically-based rendering. Computer Graphics Forum, 13, 233−246,
    Skoplaki, E., and J. A. Palyvos, 2009a: On the temperature dependence of photovoltaic module electrical performance: A review of efficiency/power correlations. Solar Energy, 83, 614−624,
    Skoplaki, E., and J. A. Palyvos, 2009b: Operating temperature of photovoltaic modules: A survey of pertinent correlations. Renewable Energy, 34, 23−29,
    Sobri, S., S. Koohi-Kamali, and N. A. Rahim, 2018: Solar photovoltaic generation forecasting methods: A review. Energy Conversion and Management, 156, 459−497,
    Sun, X. X., J. M. Bright, C. A. Gueymard, B. Acord, P. Wang, and N. A. Engerer, 2019: Worldwide performance assessment of 75 global clear-sky irradiance models using principal component analysis. Renewable and Sustainable Energy Reviews, 111, 550−570,
    Sun, X. X., J. M. Bright, C. A. Gueymard, X. Y. Bai, B. Acord, and P. Wang, 2021: Worldwide performance assessment of 95 direct and diffuse clear-sky irradiance models using principal component analysis. Renewable and Sustainable Energy Reviews, 135, 110087
    TamizhMani, G., L. Ji, Y. Tang, L. Petacci, and C. Osterwald, 2003: Photovoltaic module thermal/wind performance: Long-term monitoring and model development for energy rating. Technical Report NREL/CP-520-35645.
    Testa, A., S. De Caro, R. La Torre, and T. Scimone, 2012: A probabilistic approach to size step-up transformers for grid connected PV plants. Renewable Energy, 48, 42−51,
    Toreti Scarabelot, L., G. Arns Rampinelli, and C. R. Rambo, 2021: Overirradiance effect on the electrical performance of photovoltaic systems of different inverter sizing factors. Solar Energy, 225, 561−568,
    Ullah, A., A. Amin, T. Haider, M. Saleem, and N. Z. Butt, 2020: Investigation of soiling effects, dust chemistry and optimum cleaning schedule for PV modules in Lahore, Pakistan. Renewable Energy, 150, 456−468,
    Valerino, M., M. Bergin, C. Ghoroi, A. Ratnaparkhi, and G. P. Smestad, 2020: Low-cost solar PV soiling sensor validation and size resolved soiling impacts: A comprehensive field study in western India. Solar Energy, 204, 307−315,
    Varga, N., and M. J. Mayer, 2021: Model-based analysis of shading losses in ground-mounted photovoltaic power plants. Solar Energy, 216, 428−438,
    Vignola, F., J. Michalsky, and T. Stoffel, 2019: Solar and Infrared Radiation Measurements. 2nd ed. CRC Press,
    Visser, L., T. AlSkaif, and W. van Sark, 2022: Operational day-ahead solar power forecasting for aggregated PV systems with a varying spatial distribution. Renewable Energy, 183, 267−282,
    Visser, L., T. AlSkaif, J. Hu, A. Louwen, and W. van Sark, 2023: On the value of expert knowledge in estimation and forecasting of solar photovoltaic power generation. Solar Energy, 251, 86−105,
    Voyant, C., G. Notton, S. Kalogirou, M. L. Nivet, C. Paoli, F. Motte, and A. Fouilloy, 2017: Machine learning methods for solar radiation forecasting: A review. Renewable Energy, 105, 569−582,
    Wang, Y., Q. H. Hu, L. H. Li, A. M. Foley, and D. Srinivasan, 2019: Approaches to wind power curve modeling: A review and discussion. Renewable and Sustainable Energy Reviews, 116, 109422
    Wolff, B., J. Kühnert, E. Lorenz, O. Kramer, and D. Heinemann, 2016: Comparing support vector regression for PV power forecasting to a physical modeling approach using measurement, numerical weather prediction, and cloud motion data. Solar Energy, 135, 197−208,
    Wu, Y.-Y., S.-Y. Wu, and L. Xiao, 2017: Numerical study on convection heat transfer from inclined PV panel under windy environment. Solar Energy, 149, 1−12,
    Xie, Y., M. Sengupta, A. Habte, and A. Andreas, 2022: The "Fresnel Equations" for Diffuse radiation on Inclined photovoltaic Surfaces (FEDIS). Renewable and Sustainable Energy Reviews, 161, 112362
    Yagli, G. M., D. Z. Yang, and D. Srinivasan, 2019: Automatic hourly solar forecasting using machine learning models. Renewable and Sustainable Energy Reviews, 105, 487−498,
    Yagli, G. M., D. Z. Yang, and D. Srinivasan, 2022: Ensemble solar forecasting and post-processing using dropout neural network and information from neighboring satellite pixels. Renewable and Sustainable Energy Reviews, 155, 111909
    Yang, D. Z., 2016: Solar radiation on inclined surfaces: Corrections and benchmarks. Solar Energy, 136, 288−302,
    Yang, D. Z., 2018a: A correct validation of the National Solar Radiation Data Base (NSRDB). Renewable and Sustainable Energy Reviews, 97, 152−155,
    Yang, D. Z., 2018b: SolarData: An R package for easy access of publicly available solar datasets. Solar Energy, 171, A3−A12,
    Yang, D. Z., 2019a: Post-processing of NWP forecasts using ground or satellite-derived data through kernel conditional density estimation. Journal of Renewable and Sustainable Energy, 11, 026101
    Yang, D. Z., 2019b: SolarData package update v1.1: R functions for easy access of Baseline Surface Radiation Network (BSRN). Solar Energy, 188, 970−975,
    Yang, D. Z., 2020: Choice of clear-sky model in solar forecasting. Journal of Renewable and Sustainable Energy, 12, 026101
    Yang, D. Z., 2021a: Temporal-resolution cascade model for separation of 1-min beam and diffuse irradiance. Journal of Renewable and Sustainable Energy, 13, 056101
    Yang, D. Z., 2021b: Validation of the 5-min irradiance from the National Solar Radiation Database (NSRDB). Journal of Renewable and Sustainable Energy, 13, 016101
    Yang, D. Z., 2022: Estimating 1-min beam and diffuse irradiance from the global irradiance: A review and an extensive worldwide comparison of latest separation models at 126 stations. Renewable and Sustainable Energy Reviews, 159, 112195
    Yang, D. Z., and Z. B. Dong, 2018: Operational photovoltaics power forecasting using seasonal time series ensemble. Solar Energy, 166, 529−541,
    Yang, D. Z., and J. Boland, 2019: Satellite-augmented diffuse solar radiation separation models. Journal of Renewable and Sustainable Energy, 11, 023705
    Yang, D. Z., and C. A. Gueymard, 2020: Ensemble model output statistics for the separation of direct and diffuse components from 1-min global irradiance. Solar Energy, 208, 591−603,
    Yang, D. Z., and C. A. Gueymard, 2021a: Probabilistic merging and verification of monthly gridded aerosol products. Atmospheric Environment, 247, 118146
    Yang, D. Z., and C. A. Gueymard, 2021b: Probabilistic post-processing of gridded atmospheric variables and its application to site adaptation of shortwave solar radiation. Solar Energy, 225, 427−443,
    Yang, D. Z., and D. van der Meer, 2021: Post-processing in solar forecasting: Ten overarching thinking tools. Renewable and Sustainable Energy Reviews, 140, 110735
    Yang, D. Z., P. Jirutitijaroen, and W. M. Walsh, 2012: Hourly solar irradiance time series forecasting using cloud cover index. Solar Energy, 86, 3531−3543, 6/j.solener.2012.07.029.
    Yang, D. Z., Z. Ye, A. M. Nobre, H. Du, W. M. Walsh, L. I. Lim, and T. Reindl, 2014: Bidirectional irradiance transposition based on the Perez model. Solar Energy, 110, 768−780,
    Yang, D. Z., V. Sharma, Z. Ye, L. I. Lim, L. Zhao, and A. W. Aryaputera, 2015: Forecasting of global horizontal irradiance by exponential smoothing, using decompositions. Energy, 81, 111−119,
    Yang, D. Z., S. Alessandrini, J. Antonanzas, F. Antonanzas-Torres, V. Badescu, H. G. Beyer, R. Blaga, J. Boland, J. M. Bright, C. F. M. Coimbra, M. David, Â. Frimane, C. A. Gueymard, T. Hong, M. J. Kay, S. Killinger, J. Kleissl, P. Lauret, E. Lorenz, D. van der Meer, M. Paulescu, R. Perez, O. Perpiñán-Lamigueiro, I. M. Peters, G. Reikard, D. Renné, Y.-M. Saint-Drenan, Y. Shuai, R. Urraca, H. Verbois, F. Vignola, C. Voyant, and J. Zhang, 2020: Verification of deterministic solar forecasts. Solar Energy, 210, 20−37,
    Yang, D. Z., W. T. Wang, and T. Hong, 2022a: A historical weather forecast dataset from the European Centre for Medium-Range Weather Forecasts (ECMWF) for energy forecasting. Solar Energy, 232, 263−274,
    Yang, D. Z., W. T. Wang, and X. Xia, 2022b: A concise overview on solar resource assessment and forecasting. Adv. Atmos. Sci., 39, 1239−1251,
    Yang, D. Z., Y. Z. Gu, M. J. Mayer, C. A. Gueymard, W. T. Wang, J. Kleissl, M. Y. Li, Y. H. Chu, and J. M. Bright, 2024: Regime-dependent 1-min irradiance separation model with climatology clustering. Renewable and Sustainable Energy Reviews, 189, 113992
    Yang, P. P., L. H. C. Chua, K. N. Irvine, and J. Imberger, 2021: Radiation and energy budget dynamics associated with a floating photovoltaic system. Water Research, 206, 117745
    You, S. M., Y. J. Lim, Y. J. Dai, and C. H. Wang, 2018: On the temporal modelling of solar photovoltaic soiling: Energy and economic impacts in seven cities. Applied Energy, 228, 1136−1146,
    Zhong, X. H., and J. Kleissl, 2015: Clear sky irradiances using REST2 and MODIS. Solar Energy, 116, 144−164,
  • [1] Dazhi YANG, Wenting WANG, Xiang'ao XIA, 2022: A Concise Overview on Solar Resource Assessment and Forecasting, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 1239-1251.  doi: 10.1007/s00376-021-1372-8
    [2] HE Jinhai, JU Jianhua, WEN Zhiping, L\"U Junmei, JIN Qihua, 2007: A Review of Recent Advances in Research on Asian Monsoon in China, ADVANCES IN ATMOSPHERIC SCIENCES, 24, 972-992.  doi: 10.1007/s00376-007-0972-2
    [3] GAO Yongqi, SUN Jianqi, LI Fei, HE Shengping, Stein SANDVEN, YAN Qing, ZHANG Zhongshi, Katja LOHMANN, Noel KEENLYSIDE, Tore FUREVIK, SUO Lingling, 2015: Arctic Sea Ice and Eurasian Climate: A Review, ADVANCES IN ATMOSPHERIC SCIENCES, 32, 92-114.  doi: 10.1007/s00376-014-0009-6
    [4] Chong-yu XU, Elin WIDN, Sven HALLDIN, 2005: Modelling Hydrological Consequences of Climate Change-Progress and Challenges, ADVANCES IN ATMOSPHERIC SCIENCES, 22, 789-797.  doi: 10.1007/BF02918679
    [5] Wang Shaowu, Zhu Jinhong, 2001: A Review on Seasonal Climate Prediction, ADVANCES IN ATMOSPHERIC SCIENCES, 18, 197-208.  doi: 10.1007/s00376-001-0013-5
    [6] Hong-Li REN, Qing BAO, Chenguang ZHOU, Jie WU, Li GAO, Lin WANG, Jieru MA, Yao TANG, Yangke LIU, Yujun WANG, Zuosen ZHAO, 2023: Seamless Prediction in China: A Review, ADVANCES IN ATMOSPHERIC SCIENCES, 40, 1501-1520.  doi: 10.1007/s00376-023-2335-z
    [7] Xinping XU, Shengping HE, Huijun WANG, 2020: Relationship between Solar Wind−Magnetosphere Energy and Eurasian Winter Cold Events, ADVANCES IN ATMOSPHERIC SCIENCES, 37, 652-661.  doi: 10.1007/s00376-020-9153-3
    [8] Yao Keya, Liu Chunlei, 1996: ICE Particle Size and Shape Effect on Solar Energy Scattering Angular Distribution, ADVANCES IN ATMOSPHERIC SCIENCES, 13, 505-510.  doi: 10.1007/BF03342040
    [9] WANG Huijun, FAN Ke, SUN Jianqi, LI Shuanglin, LIN Zhaohui, ZHOU Guangqing, CHEN Lijuan, LANG Xianmei, LI Fang, ZHU Yali, CHEN Hong, ZHENG Fei, 2015: A Review of Seasonal Climate Prediction Research in China, ADVANCES IN ATMOSPHERIC SCIENCES, 32, 149-168.  doi: 10.1007/s00376-014-0016-7
    [10] Chunlin HUANG, Hongrong SHI, Ling GAO, Mengqi LIU, Qixiang CHEN, Disong FU, Shu WANG, Yuan YUAN, Xiang′ao XIA, 2022: Fengyun-4 Geostationary Satellite-Based Solar Energy Nowcasting System and Its Application in North China, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 1316-1328.  doi: 10.1007/s00376-022-1464-0
    [12] Xiaoqing WU, Xiaofan LI, 2008: A Review of Cloud-Resolving Model Studies of Convective Processes, ADVANCES IN ATMOSPHERIC SCIENCES, 25, 202-212.  doi: 10.1007/s00376-008-0202-6
    [13] Wang Huijun, Zhou Guangqing, Lin Zhaohui, Zhao Yan, Guo Yufu, Ma Zhuguo, 2001: Recent Researches on the Short-Term Climate Prediction at IAP-A Brief Review, ADVANCES IN ATMOSPHERIC SCIENCES, 18, 929-936.
    [14] M. Y. Totagi, 1994: Power and Cross-Spectra for the Turbulent Atmospheric Motion and Transports in the Domain of Wave Number Frequency Space: Theoretical Aspects, ADVANCES IN ATMOSPHERIC SCIENCES, 11, 491-498.  doi: 10.1007/BF02658170
    [15] Yang Yang, Minqiang Zhou, Wei Wang, Zijun Ning, Feng Zhang, Pucai Wang, 2024: Quantification of CO2 emissions from three power plants in China using OCO-3 satellite measurements, ADVANCES IN ATMOSPHERIC SCIENCES.  doi: 10.1007/s00376-024-3293-9
    [16] Qiu Jinhuan, 2002: A Simple Yet More Accurate Model to Calculate Solar Radiative Flux in the Inhomogeneous Atmosphere, ADVANCES IN ATMOSPHERIC SCIENCES, 19, 433-447.  doi: 10.1007/s00376-002-0077-x
    [17] LIU Weiyi, QIU Jinhuan, 2012: A Parameterized yet Accurate Model of Ozone and Water Vapor Transmittance in the Solar-to-near-infrared Spectrum, ADVANCES IN ATMOSPHERIC SCIENCES, 29, 599-610.  doi: 10.1007/s00376-011-1076-6
    [18] LIANG Hong, ZHANG Renhe, LIU Jingmiao, SUN Zhian, CHENG Xinghong, 2012: Estimation of Hourly Solar Radiation at the Surface under Cloudless Conditions on the Tibetan Plateau Using a Simple Radiation Model, ADVANCES IN ATMOSPHERIC SCIENCES, 29, 675-689.  doi: 10.1007/s00376-012-1157-1
    [19] JIANG Dabang, YU Ge, ZHAO Ping, CHEN Xing, LIU Jian, LIU Xiaodong, WANG Shaowu, ZHANG Zhongshi, YU Yongqiang, LI Yuefeng, JIN Liya, XU Ying, JU Lixia, ZHOU Tianjun, YAN Xiaodong, 2015: Paleoclimate Modeling in China: A Review, ADVANCES IN ATMOSPHERIC SCIENCES, 32, 250-275.  doi: 10.1007/s00376-014-0002-0
    [20] David H. BROMWICH, Matthew A. LAZZARA, Arthur M. CAYETTE, Jordan G. POWERS, Kirstin WERNER, John J. CASSANO, Steven R. COLWELL, Scott CARPENTIER, Xun ZOU, 2022: The 16th Workshop on Antarctic Meteorology and Climate and 6th Year of Polar Prediction in the Southern Hemisphere Meeting, ADVANCES IN ATMOSPHERIC SCIENCES, 39, 536-542.  doi: 10.1007/s00376-021-1384-4

Get Citation+


Share Article

Manuscript History

Manuscript received: 27 September 2023
Manuscript revised: 07 January 2024
Manuscript accepted: 23 January 2024
通讯作者: 陈斌,
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

A Tutorial Review of the Solar Power Curve: Regressions, Model Chains, and Their Hybridization and Probabilistic Extensions

    Corresponding author: Dazhi YANG,
    Corresponding author: Martin János MAYER,
  • 1. School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
  • 2. Key Laboratory for Middle Atmosphere and Global Environment Observation (LAGEO), Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China
  • 3. Department of Energy Engineering, Faculty of Mechanical Engineering, Budapest University of Technology and Economics, Műegyetem rkp. 3, Budapest H-1111, Hungary

Abstract: Owing to the persisting hype in pushing toward global carbon neutrality, the study scope of atmospheric science is rapidly expanding. Among numerous trending topics, energy meteorology has been attracting the most attention hitherto. One essential skill of solar energy meteorologists is solar power curve modeling, which seeks to map irradiance and auxiliary weather variables to solar power, by statistical and/or physical means. In this regard, this tutorial review aims to deliver a complete overview of those fundamental scientific and engineering principles pertaining to the solar power curve. Solar power curves can be modeled in two primary ways, one of regression and the other of model chain. Both classes of modeling approaches, alongside their hybridization and probabilistic extensions, which allow accuracy improvement and uncertainty quantification, are scrutinized and contrasted thoroughly in this review.

    • Tackling anthropogenic climate change is a long-lasting research hotspot, and the morphing of the global energy mix, from one that is predominated by fossil fuels to one where renewable energy contributes the most, is widely perceived as one of the most important enablers of the pathway towards carbon neutrality. Because the two most abundant forms of renewable energy, that is, solar and wind, are both weather dependent, it is known a priori that the multi-disciplinary domain of study called “energy meteorology” is going to play a cardinal role in advancing the utilization of renewable energy. Insofar as solar energy meteorology is concerned, it encompasses two key topics, namely, solar resource assessment and solar forecasting, both of which have been reviewed recently, in a way that is easily comprehensible by the atmospheric science community (Yang et al., 2022b). In short, the central aim of solar resource assessment is to estimate the long-term power generation potential of a (prospective) solar farm of interest, whereas that of solar forecasting is to predict the power generation of a solar farm in the near future. Both investigations depend for their success upon the granularity and precision of the omnichannel information pertaining to irradiance conditions.

      Irradiance information can be acquired through ground-based measurement, remote sensing retrieval, and numerical weather modeling. These three complementary approaches have concerned those who are now known as solar energy meteorologists since the 1960s, and the body of literature is notoriously gigantic. Whereas the acquisition of irradiance data is one aspect, the optimal utilization of solar energy also relies on the ability to convert the irradiance information to power output information. Stated differently, it is the power output of a solar energy system, such as a photovoltaic (PV) plant or a concentrating solar power (CSP) plant, that is of eventual interest to most solar engineering endeavors. Indeed, most atmospheric scientists are well acquainted with irradiance, but the irradiance-to-power conversion has hitherto been handled by solar engineers. Examining the numerous recent studies on the impacts of climate change on solar power generation [e.g., Jerez et al. (2015); Gernaat et al. (2021); Liu et al. (2023)] or vice versa [e.g., Hu et al. (2016); Creutzig et al. (2017)], a common trait is the overly simplistic modeling approaches for solar power production used to derive the conclusions, which can be misleading due to the uncertainty introduced during modeling. To that end, it is thought beneficial to briefly outline the scientific and engineering principles regarding the conversion to atmospheric scientists, such that a more holistic understanding of the status quo of solar energy meteorology can be established.

      The need for converting meteorological variables to power output is not unique to solar applications; another obvious case is wind, insofar as wind speed needs to beconverted to the power output of a turbine through a wind power curve. The theoretical power $ P $ that can be extracted from the wind at speed $ V $ is $ P=0.5C_p\rho\pi R^2V^3 $, where $ C_p $, $ \rho $, and $ R $ are the power coefficient of the turbine, air density, and turbine rotor radius, respectively. However, because there would be various uncertainty factors and loss mechanisms affecting the actual operation of a wind turbine, the theoretical wind power curve rarely has real-life appeal, and much effort has been pouring into developing more appropriate mathematical relationships that can explain the mapping from wind speed to wind power [see Wang et al. (2019) for a review]. To give perspective on the challenges confronting such curve fitting tasks, Fig. 1 shows a typical relationship between wind speed and wind power, with data taken from an actual wind farm; the relationship is evidently non-injective. Indeed, although wind speed is the most influential factor affecting wind power, other meteorological variables, such as wind direction, air density, or humidity, could all have an impact on wind power generation, and thus should be considered jointly during the wind power curve modeling (Lee et al., 2015). In this regard, the term “wind power curve” is almost always used in the broad sense—instead of restricting its meaning to a one-dimensional curve, when multiple predictors are involved, it suggests a power response surface.

      Figure 1.  A typical scatter between 100-m hub-height wind speed and wind power; data is obtained from a real wind power plant. Brighter colors denote more points in the neighborhood.

      In contrast to the ubiquitously accepted phrase “wind power curve,” the term “solar power curve,” which should be analogously used to denote the mapping function from solar irradiance to solar power, is somewhat less popular. There are two reasons for this. First, there is an alternative and more descriptive terminology for the irradiance-to-power conversion framework, that is, the model chain, which cascades a series of energy meteorology models to convert irradiance into PV power in a step-by-step fashion. The second reason is that the irradiance-to-power conversion is even more intricate than the wind-speed-to-power conversion, such that a one-dimensional curve would be grossly insufficient to narrate the mapping. Be that as it may, this review should use the phrase “solar power curve” throughout, to denote the mapping from irradiance (and auxiliary variables) to PV power. (The conversion from irradiance to CSP power is not considered in this review.) Figure 2 depicts the relationship between the global horizontal irradiance (GHI) and the power output of an actual PV farm. Unlike the s-shaped wind-speed-to-power relationship, the scatter shown in Fig. 2 does not seem to be linked to the shape of any well-known mathematical functions. Therefore, the remaining part of this tutorial should elucidate how such a relationship between GHI and PV power can be narrated through modeling means.

      Figure 2.  A typical scatter between GHI and PV power; data are obtained from a real PV plant. Brighter colors denote more points in the neighborhood. (The normalized solar power does not reach 1 because the standard test condition, under which the nominal power is determined, is almost impossible to meet during operation.)

    2.   Two classes of approaches for solar power curve modeling
    • After half a century of research, the basic scientific and engineering principles governing electricity generation from PV are now known very well and very widely. However, it is also true that such principles do not belong to a single subject of study. For instance, the principle governing the transposition of horizontal irradiance components onto a tilted surface is one of physics; governing the composition and deposition of particulate matter on the PV panel surface is one of chemistry; and governing the DC/AC power inversion efficiency is one of electrical engineering. If all information relevant to operating those principles is known to a high exactitude, one may in theory calculate the PV power in a deterministic fashion with an exceptional quantitative precision. Unfortunately, this most delicate form of irradiance-to-power conversion faces two practical challenges: (1) the information needed to conform to all those principles is, more often than not, unknown, due to a lack of appropriate equipment and monitoring skill; and (2) the principles themselves may be incomplete or imperfect, which necessarily leads to conversion error. On this point, irradiance-to-power conversion via a solar power curve becomes relevant. There are two distinct classes of approaches with which one may construct a solar power curve, first of regression and second of model chain. Some authors also make a distinction between these two classes of techniques through the words “statistical” and “physical,” or through “single-stage” and “multiple-stage,” for obvious literal reasons (Yang and van der Meer, 2021; Markovics and Mayer, 2022).

      The regression approach to solar power curve modeling should be straightforward to comprehend, as it establishes a regressive relationship between the weather variables and PV power through statistical and machine-learning models. Because fitting a regression is a one-step procedure, it is a direct way of constructing a solar power curve. In contrast, model chain, as mentioned in the introduction, arranges a bag of energy meteorology models in cascade, each being responsible for a single conversion stage/mechanism within the whole process. Figure 3 visualizes a typical model chain, which takes as input time and location, GHI, ground albedo, ambient temperature, and wind speed, 1 and issues as output the AC power. For instance, with time and location information, one may compute the solar zenith angle and the extraterrestrial irradiance via solar positioning; with GHI and its extraterrestrial counterpart, one could split GHI into a diffuse component and a beam component through a separation model; and with the DC power estimated by the PV model and the solar position information, the power loss due to row-to-row shading may be calculated. Generally, Fig. 3 clearly suggests that the output of a preceding model is used as input for a succeeding model, and the entire procedure resembles a chain-like assembly, which leads to the coining of the term “model chain.” Model chain signifies an indirect way of constructing a solar power curve.

      Figure 3.  Schematic of irradiance-to-power conversion via a typical model chain. A model chain takes GHI as the main input and outputs PV power. An arrow going into a block indicates a required input, whereas an arrow leaving a block indicates the output.

      Comparing the utilities of these two classes of approaches, neither strictly dominates the other. In terms of complexity, model chain most certainly requires more domain knowledge to execute. If we are to assume an increase in energy meteorology knowledge is accompanied by an increase in wisdom, model chain should result in higher irradiance-to-power conversion accuracy. The difficulty nevertheless is that model chain requires the design information and operating conditions of the PV plant, such as the panel wiring schematics, row spacing, inverter manufacturer and model, or the soiling condition, to be known, which is not always the case, especially for smaller distributed PV systems managed by individuals. In such situations, the straightforward option is to leverage the regression alternative. However, in most of the stages of a model chain, there are general model options that can be used without detailed design data, which makes it possible to rely on model chains even if the design information on the PV plants of interest is limited. The effect of the design data on the model chain accuracy was investigated by Mayer (2021), who compared five different scenarios of design-information availability, based on data collected at 16 PV plants in Hungary. The overarching conclusion of that study is that the full model chain encompassing all design parameters could give the best conversion results, but the most critical ones are only the site location, module orientation, and nameplate capacities. On the other hand, it must be highlighted that regression fitting demands a long-enough dataset, which can only be gathered as time passes. Therefore, it is not possible to use regression-based solar power curves for prospective or newly commissioned plants. Inasmuch as the present research can show, the two classes of approaches are both indispensable, as their accuracies are situation-dependent and are often comparable (Markovics and Mayer, 2022).

      The rivalry between statistical and physical modeling naturally leads to a third option—hybrid solar power curves. The principle underpinning the hybridization is very simple: One should use model chain up to a stage that the available information can support, and then leave the remaining fraction of the conversion process to regression. For example, one may use solar positioning, separation, and transposition models to obtain the global tilted irradiance (GTI)—cf. Fig. 3—and then apply a neural network of some sort to map GTI to the PV power output. This kind of hybrid solar power curve has been investigated by Mayer (2022a). Earlier works on model chains revealed that the two most critical stages are clearly the separation and transposition modeling (Mayer, 2021; Mayer and Gróf, 2021). Logically, as long as the GTI and nominal power of the PV plant are known, the hybridized conversion would not be dramatically worse than a full-information case (Mayer, 2022a). Another driver for hybrid modeling is the fact that not even the most detailed model chains are perfectly accurate, and thus combining them with a regression method can help to eliminate the error patterns that can be identified from the historical data. Moreover, as shown by Mayer (2022b), even a perfectly accurate model chain will introduce bias in the PV forecasts depending on the plant design parameters due to the errors in the input GHI forecasts, which again calls for a data-driven correction step. However, since hybrid solar power curves also rely on a certain amount of historical data for training the regression part of the conversion, it is constrained by the same limitations as those limiting a regression-based solar power curve.

      As hybridization is a conspicuous extension of the solar power curve modeling, probabilistic modeling constitutes another. Since both classes of approaches and their hybrids provide by default just point (i.e., deterministic) estimates of the PV power, it is attractive to inquire into ways to quantify any uncertainty associated with solar power curve modeling; this leads to a very new but exceptionally useful concept known as the probabilistic solar power curve. The idea of probabilistic or ensemble modeling should be well understood by atmospheric scientists, so nor is there a need to explain further. However, given the fact that the notion of probability can be introduced in all too many ways into solar power curve modeling, with many being redundant and inefficient, the relevance of probabilistic modeling of solar power curves is in identifying the optimal strategy of doing so. In the remaining pages of this tutorial, we provide a thorough rundown on the two main classes of techniques for solar power curve modeling in sections 3 and 4, respectively. Then in sections 5 and 6, the hybridization and probabilistic extensions of solar power curve modelings are thoroughly elaborated, before concluding the tutorial.

    3.   Regression-based solar power curves
    • The setup behind regression-based solar power curve modeling is very simple, in that, one seeks to establish a mathematical mapping between the (normalized) output power of a PV system and a set of predictor variables (such as GHI, zenith angle, ambient temperature, or wind speed), and once the mapping is fitted/trained using historical data, one can estimate the (normalized) power output of the same system for any new vector of predictor variables. Mathematically, denoting the vector of predictor variables corresponding to instance $ i $ as $ {\bf{x}}_i = \left(x_{i}^{(1)}, x_{i}^{(2)}, \dots, x_{i}^{(m)}\right)^{{\top}} $, where symbol “$\top $” denotes the transpose of a vector/matrix, $ i = 1, \dots, n $ indexes the training samples that may or may not be time ordered, and $ j = 1, \dots, m $ indexes the elements in the $ m $-dimensional input vector, a regression-based solar power curve can be written as:

      where $ f $ is the mapping function to be established, $ {\text{θ}} $ is the vector of parameters of $ f $, and $ y_i $ is $ i \text{th} $ (normalized) power output value in the training set. Using the training set, the estimated value of $ {\text{θ}} $, as denoted by $ \hat{{\text{θ}}} $, can be found. Then, with any new $ {\bf{x}}_t $, i.e., $ t > n $, the predicted (normalized) power output at that new instance $ t $ would be:

      To give perspective, suppose $ f $ is a linear function of GHI ($ G_h $) and solar zenith angle ($ Z $), Eq. (2) would simply be:

      where $ {\text{θ}} = (\beta_0, \beta_1, \beta_2)^{{\top}} $ are the linear regression coefficients. Certainly, moving beyond the simple linear model of Eq. (3), there are countless variants of input vector $ {\bf{x}} $ and choices of mapping function $ f $, which makes the regression-based solar power curve modeling exceedingly versatile.

      Regression-based solar power curve modeling is not a new concept, and various works have existed before 2010 [e.g., Bacher et al. (2009); Huang et al. (2010)]. One of the most influential initiatives in promoting regression-based solar power curves is the Global Energy Forecasting Competition 2014 (GEFCom 2014) set up by Hong et al. (2016). The competition endorsed the very much celebrated two-step solar forecasting procedure, first of numerical weather prediction (NWP), for coming up with the predictor variables for the regression, and second of the solar power curve modeling, for converting the forecast weather variables to forecast PV power. A total of 12 NWP forecast variables, including GHI, 2-m temperature, and total cloud cover, from the European Centre for Medium-range Weather Forecasts (ECMWF), were disclosed to the contestants in a rolling manner spanning several weeks, and the contestants were tasked to construct a solar power curve, and thus forecast the PV power output at three Australian sites based on newly released NWP forecasts. GEFCom2014 brought to light several strategies for enhancing the performance of regression-based solar power curve modeling, which, also in view of other evidence from the literature, are thought to be quite general. They are, in order of importance: (1) utilization of clear-sky information; (2) feature selection and engineering; (3) probabilistic and ensemble modeling; and (4) other known general guidelines for regression applications in solar engineering, such as opting for nonparametric and/or tree-based methods [e.g., see conclusions of Yagli et al. (2019); Yang (2019a); Yang and Gueymard (2021a, b)].

      The clear-sky condition refers to a cloud-free atmosphere—one should not confuse that with an “atmosphere-less” condition. Stated differently, perfectly modeled clear-sky irradiance would account for all transmittances except for that of clouds. The reason that the utilization of clear-sky information is ranked with the highest importance in solar power curve modeling is this: The winning team of GEFCom2014 (Huang and Perry, 2016) was the only team that integrated clear-sky information into its modeling process, which explains to a large extent the substantial leading margin between forecast performance of the winning team and that of the other teams. In fact, clear-sky irradiance/PV power is the best way to describe the seasonal and diurnal variable in irradiance/PV power, which has been recognized as such since at least the 1980s (Chowdhury and Rahman, 1987). The second most important aspect of solar power curve modeling is feature selection and engineering, which is evidenced by the fact that such strategies in one form or another were adopted by all top-five teams in GEFCom2014. The importance of feature selection and engineering has also been confirmed by Markovics and Mayer (2022), who compared 24 machine-learning-based solar power curves and concluded that feature selection and engineering have an even higher effect on accuracy than the function forms of the curves themselves. Stated differently, proper feature selection and engineering outweigh the choice of regression method. On the third order of importance is probabilistic and ensemble modeling of PV power output, which serves as an uncertainty quantification tool. As the GEFCom2014 requested the PV power forecasts to be submitted in the form of quantiles, all top-five teams chose nonparametric approaches, among which variants of quantile regression and gradient boosting were most popular. These strategies of different orders of importance are elaborated further in the next few subsections.

    • In a purely regressive setting, clear-sky information can be represented in either irradiance terms or power terms. A very large amount of effort has been devoted to clear-sky irradiance modeling. About 100 clear-sky irradiance models have been proposed to date, ranging from simple empirical models to those physical ones that explicitly consider broadband radiative transfer using effective parameterizations. The performance of clear-sky irradiance models, just like that of any other family of energy meteorology models, varies across geographical locations and time periods. In a recent pair of works, Sun et al. (2021, 2019) compared 75 models for global irradiance and 95 models for diffuse and direct irradiance, which are by far the most inclusive and informative documents on this topic, despite the existence of other smaller efforts [e.g., Engerer and Mills (2015); Ruiz-Arias and Gueymard (2018); Antonanzas-Torres et al. (2019)]. The overarching conclusion of Sun et al. (2021, 2019) is that physical models, with the REST2 model (Gueymard, 2008) being the highest-performing one, have clear ascendancy over empirical models in terms of accuracy. However, as argued by Yang (2020), the benefits of the highest-accuracy clear-sky irradiance models are not always quantifiable when such models are used in solar applications. For example, it has been shown that even REST2-modeled irradiance is unable to result in clear-sky index—the ratio between GHI and its clear-sky counterpart—that is second-order stationary, perhaps due to the lack of accurate information on aerosols and water vapor (Yang, 2020). As such, the choice of clear-sky models, as being used in a solar power curve modeling context, needs further discussion.

      In what follows, we briefly discuss three clear-sky irradiance models for global irradiance, in increasing order of model performance, they are, the Ineichen–Perez model (Ineichen and Perez, 2002), the McClear model (Lefèvre et al., 2013; Gschwind et al., 2019), and the REST2 model (Gueymard, 2008). The Ineichen–Perez model relies only on calculable and climatology inputs, which implies that the model is static, and thus can be computed over any time period. This model is less elaborate in terms of formulation as compared to the other two, and thus welcomes direct implementation. The entire model depends on just one exogenous variable to operate, namely, Linke turbidity. Because Linke turbidity is not a commonly available variable, its monthly climatology values are used. Gridded monthly average Linke turbidity can be downloaded from the SoDa website 2 in the format of georeferenced TIFF maps. These maps have global coverage, and the user just needs to read off the value for the location of interest. The model can be implemented with just a few lines of code, and standard versions are available in the $\rm{pvlib}$ library of Python (Holmgren et al., 2018) and the $\rm{SolarData}$ package of R (Yang, 2018b, 2019b).

      The McClear model is the only clear-sky irradiance model that is not open source, in that, it can only be accessed from the SoDa website via a web service. On a positive note, this web service allows browser-based downloading of all three clear-sky irradiance components (global horizontal or $ G_{hc} $, diffuse horizontal or $ D_{hc} $, and beam normal or $ B_{nc} $) for global locations, for a time range of 2004 to two days ago, in 1-min to 1-month resolutions. Knowing that surface radiation is marginally affected by altitude, McClear applies an on-the-fly altitude correction to the radiation values. Although the web service is occasionally unavailable due to scheduled maintenance or server downtime, the service is free of charge. In situations where the McClear irradiance needs to be integrated with other computer tasks, one can also access McClear via a programming means, with the help of the official R package $\rm{camsRad}$ (Lundstrom, 2016). The obvious drawback of McClear lies in the proprietary nature of its implementation, which paralyzes other researchers from modifying the model. For forecasting applications, McClear only offers forecast clear-sky irradiance up to two days ahead, which is shorter than the typical day-ahead horizon required for grid integration; this is another major shortcoming of McClear.

      The highest-performance REST2 model, being a physical model, demands nine atmospheric parameters to operate: extraterrestrial beam normal irradiance ($ E_{0n} $), zenith angle ($ Z $), ground albedo ($ \rho_g $), surface pressure ($ p $), aerosol optical depth at 550 nm ($ \tau_{550} $), Ångström exponent ($ \alpha $), total column ozone ($ u_{\text{O}_3} $), total nitrogen dioxide amount ($ u_{\text{NO}_2} $), and total precipitable water vapor ($ u_{\text{H}_2\text{O}} $). To that end, if REST2 is to be used as a predictor of the solar power curve, all nine parameters have to be solicited in advance. The current recommendation for using REST2 is to power it with reanalysis data, such as the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) or Copernicus Atmosphere Monitoring Service [CAMS; Fu et al. (2022)]. Nonetheless, as reanalyses are not real-time, using MERRA-2 or CAMS limits the REST2 model to resource assessment applications, whereas for forecasting applications, alternative sources of input parameters need to be sought. Among the nine inputs to REST2, $ E_{0n} $ and $ Z $ can be calculated via solar positioning, whereas $ \rho_g $, $ p $, and $ u_{\text{H}_2\text{O}} $ are common output fields of NWP models. Information related to aerosol (i.e., $ \tau_{550} $ and $ \alpha $) and other chemical species (e.g., $ u_{\text{O}_3} $ and $ u_{\text{NO}_2} $), however, is usually not available in regular NWP models, but in atmospheric composition models. Indeed, neither ECMWF’s High Resolution (HRES) model nor the National Centers for Environmental Prediction’s (NCEP’s) North American Mesoscale (NAM) model offers forecasts of aerosol and other chemical species. One possible source of acquiring the forecast aerosol, ozone, and nitrogen dioxide is the CAMS Global Atmospheric Composition Forecasts from ECMWF, which produces forecasts twice daily at an approximately 40-km spatial grid on 137 vertical levels, and has been operational since July 2015. Implementation-wise, REST2 is complex, and only a handful of researchers other than the inventor himself have attempted doing so [e.g., Engerer and Mills (2015); Zhong and Kleissl (2015); Sun et al. (2019)]. Among these implementations, the only reproducible one is offered by Sun et al. (2019), who made the R code open source, which has subsequently attracted much pragmatism. Other initiatives, such as the MERRA-2 downloading Python library $\rm{irradpy}$ (Bright et al., 2020), have followed. To visualize the clear-sky GHI modeled by REST2, Fig. 4 provides an example, in which the REST2 GHI at Table Mountain, United States, over September 2018, is displayed alongside the satellite-derived irradiance from the National Solar Radiation Database (NSRDB) and ground-based measurements from the Surface Radiation Budget Network (SURFRAD). Worth noting is that REST2 is able to compute the clear-sky expectations for all three irradiance components; nevertheless, its clear-sky beam normal irradiance (BNI) estimates are extremely sensitive to aerosol loading, as evidenced by Fig. 5. Those “artifacts” are caused by rapid changes in aerosol (Yang, 2021b), the ability of REST2 to capture such changes is therefore commendable.

      Figure 4.  Clear-sky GHI time series modeled using REST2, at Table Mountain (40.125° N, 105.237° W), United States, over September 2018, alongside the satellite-derived irradiance from the National Solar Radiation Database (NSRDB) and ground-based measurements from the Surface Radiation Budget Network (SURFRAD). The time on the $x$-axis is local time.

      Figure 5.  NSRDB’s BNI (powered by MERRA-2) time series plot for six selected days in 2018 and 2019, at Bondville (40.052° N, 88.373° W), United States. This plot exemplifies the exceptional, but legitimate, sudden changes in clear-sky BNI, which are caused by surges and a lack of temporal interpolation of the hourly aerosol optical depth.

      Moving beyond acquiring clear-sky irradiance, one may also choose to obtain the clear-sky PV power output, which can describe the bell-shaped diurnal transient of PV power better than the clear-sky irradiance. There are three ways to obtain clear-sky PV power. The most intuitive option is to pass, instead of the all-sky irradiance, the clear-sky irradiance and other auxiliary variables through a model chain [e.g., see Engerer and Mills (2014)]. However, due to the involvement of a model chain, this approach should no longer be considered as purely regressive,and so its discussion is deferred to section 5, in which hybrid solar power curves are reviewed. The second approach involves the identification of clear-sky situations within a PV power time series. Once clear-sky situations are identified, one may thence construct a separate regression for those situations only. Notwithstanding, the identification of clear-sky situations in terms of PV power is not as straightforward a task as one may perceive—the reader should refer to Peratikou and Charalambides (2022) for an example of such methods. The last and simplest option is to invoke statistical time series decomposition methods, to retrieve the seasonal components of the time series; this was in fact the approach of Huang and Perry (2016), which is directly responsible for the team's success in GEFCom2014. In their approach, the seasonal components were represented by Fourier terms with low-pass frequency components determined from the data. Although using Fourier and other time series decompositions to model clear-sky instances can be dated back at least to the early 2010s (Yang et al., 2012; Dong et al., 2013), they are known to be inferior to proper clear-sky models. The reason why Huang and Perry (2016) chose the Fourier-based method is that the site locations were not revealed during the competition, which is thought to be an oversight of the organizers, so solar positioning was not possible. However, in hindsight, the choice must be deemed fruitful and hence offers some penetrating insights into circumstances of this sort—the design parameters of PV plants are not always available, or may be too inhomogeneous for a model chain to be effective (e.g., sites over complex terrain or mixed use of panels and inverters).

      The reason why clear-sky modeling is said to have absolute importance to solar power curve modeling lies in its ability to de-seasonalize (i.e., remove both the seasonal and diurnal, or “double-seasonal,” cycles) the GHI and PV power time series. It has been known that for a considerable history, during the modeling of time series, one should seek the best way to stabilize the variance of the time series being modeled; in other words, de-seasonalizing a time series before modeling is a general principle that can enhance a model’s predictive performance (Armstrong, 2001; Hyndman and Athanasopoulos, 2018). The seasonal components of GHI and PV power are both multiplicative rather than additive, which implies that the de-seasonalized quantities should be acquired through division. More specifically, the clear-sky index (often denoted as $ \kappa $) is the ratio of GHI and clear-sky GHI, which may be regarded as a normalized version of GHI, for its value usually falls between 0 and 1.2 (Pedro et al., 2019). Similarly, the clear-sky index of PV (denoted as $ k_\text{PV} $ as advocated by Engerer and Mills, 2014) is the ratio of PV power and clear-sky PV power. In regression-based solar power curve modeling, the predictand, i.e., the quantity being regressed, ought to be $ k_\text{PV} $, whenever its determination is possible. This is a rule-of-thumb principle that must not be overlooked at times. Naturally, if the predictand is $ k_\text{PV} $, one should use $ \kappa $ as a predictor instead of GHI itself.

    • The next important discussion to have about regression-based solar power curve modeling is feature selection and engineering. First, one ought to acknowledge the fact that the weather system is complex in the sense that everything is related to everything. However, although a typical NWP system issues forecasts of hundreds of variables, one must not expect all of those variables to be meaningful (i.e., statistically significant) predictors for PV power. Scanning through the literature, statistical methods for feature selection and dimension reduction are abundant and have been popularly applied to weather variables, to identify the relevant and appropriate ones that can contribute to the explanatory power of a regression [e.g., Juban et al. (2016); Nagy et al. (2016); Persson et al. (2017)]. The strategy is so intuitive or even trivial for anyone with basic literacy in data science to see. In comparison, a relatively smaller amount of attention is paid to feature engineering, especially to how meteorological knowledge can be best integrated into the modeling and forecasting of solar power. Indeed, most feature engineering approaches found in the relevant literature have hitherto been limited to those general-purpose ones, such as deriving lagged versions of predictors [e.g., Persson et al. (2017)], statistical aggregation and smoothing of predictors [e.g., Pedro et al. (2019)], or automatic feature generation and extraction via machine learning [e.g., Acikgoz (2022)]. Since there are infinitely many ways of doing feature selection and engineering, it is not possible at any rate to conclude with high certainty that one method dominates the other, especially when the test dataset used by one work is seldom used in another. Thus, instead of conducting any further the “who did what” kind of monotonous enumeration of references, which has been done all too many times with high repetitiveness [e.g., Voyant et al. (2017); Sobri et al. (2018); Ahmed et al. (2020)], this section presents in the most concise manner those features that are thought absolutely essential for PV power forecasting.

      There are five classes of meteorological variables, i.e., relevant features that enter a regression-based solar power curve, that are thought to be beneficial to PV power prediction. The rationale for selecting each class is as follows:

      Irradiance: It must be universally accepted that GHI is the most influential parameter determining the PV power output. However, other shortwave and longwave irradiance components are also more useful than not in solar power curve modeling. For instance, BNI and the diffuse horizontal irradiance (DHI) contribute to irradiance on an inclined surface by different mechanisms: Whereas the former simply follows geometry, the latter is related to the sky-view factor (see section 4.3 for information). This has been known since at least the 1960s (Kamphuis et al., 2020). As for longwave radiation, research has shown that its impact on the energy budget and temperature dynamics of PV is profound (Heusinger et al., 2020; Barry et al., 2020; Yang et al., 2021). Most importantly, as mentioned earlier, clear-sky irradiance (and to a certain extent, extraterrestrial irradiance) is able to explain very well the multiplicative seasonal components in GHI and PV power time series, and thus ought to be included as a predictor. One should take special note that, even if the predictand is $ \kappa $ or $ k_\text{PV} $, including clear-sky irradiance or extraterrestrial irradiance is still thought important, for the de-seasonalized series could still contain some small-scale cyclic component due to the deficiency in clear-sky modeling (Yang, 2020).

      Temperature: Besides irradiance, the second most influential class of variables to PV power is temperature, which primarily includes ambient temperature, module temperature, and cell temperature, among which the latter two may be derived from the first (see section 4.5 for detail). It is customary to account for the effect of temperature on PV power output through temperature coefficients. An increase in temperature reduces the bandgap of a semiconductor, which correspondingly increases the energy of the electrons in the material. Solar cells under higher temperatures have a slightly elevated short-circuit current but a much lower open-circuit voltage, which translates to an overall 0.2–0.45% °C−1 decrease in cell efficiency, and thus an eventual drop in PV power.

      Wind: Near-surface wind speed and to a lesser extent wind direction, which are commonly output at a height of 10 m by NWP models, 3 have a noticeable effect on the module temperature. Because temperature directly affects the module and cell temperature, the cooling effect enabled by wind is but secondary in affecting PV power. Heat removal from PV panels through convection has been studied thoroughly [e.g., Wu et al. (2017); Ceylan et al. (2019)], and in-depth scientific details, such as how 10-m wind translates to rear-side wind or how the convection Nusselt number varies with tilt angle, wind direction and velocity, have been understood to a great extent. Notwithstanding, to what degree the complex heat transfer process can be captured by regression models remains largely unclear. Additionally, the mapping between the 10-m wind information and that flowing across a flat inclined surface is impossible to be modeled without complex fluid mechanics simulation.

      Albedo: For the purpose of maximizing the total annual direct radiation, PV panels are installed on an inclined surface with a tilt comparable to the site’s latitude. Due to this inclination, the portion of GHI due to ground reflection is not to be neglected. The surface albedo, which determines the fraction of global upwelling and downwelling irradiance (Gueymard et al., 2019), is thus identified as another important variable for solar power curve modeling. Whereas the irradiance due to ground reflection may be calculated with a simple formulation without losing too much accuracy, albedo is also responsible for another mechanism called backscattering. When backscattering is strong, the GTI is further boosted, of which the phenomenon is known as albedo enhancement (Gueymard, 2017a, b). During solar power curve modeling of conventional PV, the broadband albedo is usually sufficient, and it can be acquired by remote sensing means.

      Cloud: It has been argued that a good clear-sky irradiance model should account for all sources of variability in solar irradiance except for that of clouds, and the a priori cardinal importance of cloud information goes without saying. Cloud cover, for instance, is a main statistic describing the clouds and has been deemed useful in time series forecasting (Yang et al., 2012, 2015). As such, both the GEFCom2014 dataset (Hong et al., 2016) and the ECMWF HRES dataset for solar forecasting research (Yang et al., 2022a) include cloud cover as a variable. In another case, since in NWP, the cloud radiative effect is dominantly determined by the liquid (ice) water path and effective radius, one may directly employ those variables instead of cloud cover, which is more statistical than physical. NWP models and remote-sensing techniques provide a wide range of variables related to clouds, such as the cloud optical depth or cloud phase. All of those are thought useful for regression-based solar power curve modeling, although their degree of usefulness may vary.

      In selecting the meteorological features for solar power curve modeling, the domain knowledge of energy meteorology surely plays a part. This is also true for feature engineering. For instance, if the predictand is PV power, one may choose to simply multiply cloud cover with the clear-sky PV power, to arrive at a more meaningful feature; if the predictand is $ k_\text{PV} $, one may choose to map the cloud cover, which represents a form of cloud index, to clearness or clear-sky index, through some predefined function [the concept is similar to the mapping function used in the Heliosat family of methods for radiation retrieval from satellites, see Passias and Källbäck (1984); Cano et al. (1986); Rigollier et al. (2004)]. Similarly, for ambient temperature, instead of using it directly, one may convert it to a percentage representing the reduction in PV power, through, e.g., the well-known expression “$ 1-\gamma_{P_\text{mpp}}(T_\text{cell} -25) $,” which is more discussed in section 4.6.1. For wind, according to the Sandia Array Performance Model [SAPM; King et al. (2004)], the rare-side module temperature relates to wind speed through scaled exponents; hence, linearly scaling the wind speed and then taking the exponential is likely to be more effective than letting the regression to figure out such relationship on its own. At this stage, one should notice from these examples the fact that the kind of feature engineering that we have considered is closely coupled with energy meteorology models, in that, the feature-engineered regression could be viewed as a hybrid solar power curve, for it integrates the regression concept with some of the component models of a model chain. Section 5 elaborates such possibilities further. Another highly effective feature engineering tactic is to consider spatial information. Although spatio-temporal information is already embedded in irradiance forecasts from physics-based methods, it has been reported that using forecasts from pixels or lattice points neighboring to the focal location are able to drive the forecast accuracy even higher (Mazorra Aguiar et al., 2016; Pedro et al., 2019; Yagli et al., 2022).

    • Insofar as the regression methodology is concerned, solar power curve modeling is conceptually identical to post-processing of weather forecasts; it is just that the predictand, instead of being the measured/remote-sensed weather parameter, is now (normalized) PV power. On this point, the vast majority of knowledge and insights derived and gathered from forecast post-processing investigations can be directly transferred to regression-based solar power curve modeling. Because forecasts can be either deterministic or probabilistic, of which the latter can be sub-categorized into interval, quantile, ensemble and distributional forecasts, post-processing can be summarized into four mutually exclusive but collectively exhaustive types: (1) deterministic-to-deterministic (D2D), (2) probabilistic-to-deterministic (P2D), (3) deterministic-to-probabilistic (D2P), and (4) probabilistic-to-probabilistic (P2P) post-processing. This typology originally proposed by Yang and van der Meer (2021) is thought readily applicable to the current task.

      D2D solar power curve modeling is possibly the most abundant and definitely the most fundamental case in the present literature. Early demonstrations of D2D solar power curves focused on using weather variables as a supplement to extrapolative time series methods (Bacher et al., 2009; Huang et al., 2010; Kardakos et al., 2013). Stated differently, the time series methods themselves are able to project PV power into the future, but weather variables are thought to offer additional information for that projection. With the lapse of time, solar forecasters have expanded the boundary of modeling in several directions, in that, the methods are now more numerous, procedures more tortuous, and comparisons more thorough. For example, 11 regression-based solar power curves were compared by Visser et al. (2022) to a model chain, on data from 152 PV systems in the Netherlands, in a setting dealing with day-ahead market 4 (DAM) operations. With a total of 17 variables from the ECMWF HRES and site-related information, the study concluded that ensemble learning and deep learning are more advantageous than simple linear regression and support vector regression, which is kind of expected. An interesting note, however, is that although ensemble learning and deep learning methods were able to outperform model chain in terms of mean absolute error, from an economic perspective, which considers both the initial revenues made on the DAM and the net imbalance costs due to the observed forecast error, the model chain was found to be superior.

      P2D solar power curve modeling requires more consideration. When probabilistic weather variables, e.g., the outcome of ensemble forecasting, are available, there are two alternative ways with which they can be mapped to deterministic PV power. One of those is to first summarize the ensembles into a deterministic set of weather variables, and proceed with a D2D conversion procedure. The other is to apply the D2D conversion procedure to each ensemble member, and then summarize the ensemble PV power into a deterministic one. Ensemble, as a very general strategy, has been widely shown to be able to reduce the uncertainty in the data, model, and parameters; the reader is referred to Mayer and Yang (2022), Yang and Gueymard (2021b), Yang and Dong (2018), and Wolff et al. (2016) for a few solar case studies, while noting that searching the literature would result in hundreds of similar works. In the case of solar power curve modeling, Pierro et al. (2016) considered ECMWF and an original and a post-processed version of the Weather Research and Forecasting model (WRF) as three sources of NWP inputs, which, when paired with four regression models, resulted in numerous combinations, each giving a distinct set of PV power predictions. Their conclusion suggests that the discrimination in the final forecast performance is mainly due to the choice of NWP input, whereas various regressions with the same NWP input yield highly similar forecasts. Paring ensemble inputs with ensemble power curves is more discussed in section 6.

      Similar to P2D solar power curve modeling, one also faces in D2P modeling the choice of whether the deterministic set of weather variables should be processed into a probabilistic set and then converted to probabilistic PV power with one power curve, or should be directly converted into a probabilistic set of PV power via ensemble power curves. Again, a formal discussion is deferred to section 6. For now, one should note that D2P solar power curve modeling is exemplified by the setup of GEFCom2014, where the contestants were tasked to convert the deterministic NWP forecasts into quantiles (Hong et al., 2016). In regressing the weather variables into quantiles, one may simply adopt the quantile regression and its variants, with the predictand being the (normalized) PV power. However, besides probabilistic regressions, there are two other forms of D2P solar power curves, namely, analog ensemble (AnEn) and method of dressing. Whereas AnEn seeks to search for weather patterns in history that are similar to the one at hand, and then uses the corresponding historical PV power measurements as predictions for the current PV power, the method of dressing leverages the errors of historical PV power predictions and dresses them onto the current prediction. The reader is referred to Pierro et al. (2022) for an AnEn-based solar power curve, but the literature seems to lack an example of the method of dressing at the time of writing.

      The last category of solar power curves is P2P. This category of curves requires the input to be an ensemble representation of weather variables, which can be produced either by running the same weather model with perturbed initial conditions or by assembling predictions from several weather models. Doubleday et al. (2021) presented the first PV power forecasting application using Bayesian model averaging (BMA), which is a form of dressing method that places a parametric distribution around each deterministic forecast. In their approach, forecasts from a poor man’s ensemble with four NWP models were individually converted to PV power using a solar power curve similar to the regression used by Ayompe et al. (2010). With those ensemble PV power forecasts, each member is dressed with a two-part density function, which is a discrete–continuous mixture, to explicitly model the effect of inverter clipping, which refers to the trimming of the power output when the maximum capacity of the inverter is reached. The method has been thoroughly compared to, and showed superiority over, the ensemble model output statistics (EMOS), which is another P2P method. One drawback of their proposal may be the lack of comparison to nonparametric approaches.

    4.   Model-chain-based solar power curves
    • The initial conceptualization and the subsequent uptake of model chain predate the advent of modern solar forecasting, in both academia and industry. This is because resource assessment, as a long-standing procedure recognized as necessary for any PV plant development and performance evaluation, also requires a model chain. To give perspective, Beyer et al. (2004) had already employed the concept of model chain in their early work on PV system performance evaluation, and the results were quite stimulating, for the scatter plots between the measured power and modeled AC power were tightly packed around the identity line, indicating a high degree of correspondence between the two. Similarly, in industry, the commercial software $\rm{PVSyst}$ (Mermoud, 1994), which is still ubiquitously accepted for bankability reports today, was already well developed back in 1992, and the capabilities of the software in terms of 3D shading analysis, simulation of a stand-alone PV system, and pumping PV systems were already quite powerful. Most certainly, with the development and popularization of the $\rm{pvlib}$ Python library (Holmgren et al., 2018), model chain has had another wave of significant advances and uptakes. Given its history, making a compendium on model chain would easily fill a book; therefore, this tutorial is presented on a “need-to-know” basis, and important references that lead to the current discussions are carefully summarized for further reading.

    • The position of the sun relative to an observer on earth can be fully described by two angles: the solar zenith angle ($ Z $) and the azimuth angle ($ \phi_s $). In many early papers and introductory textbooks [e.g., Michalsky, (1988); Masters (2013); Vignola et al. (2020)], the calculation formulas for these two angles are given as

      where $ \alpha = \pi/2-Z $ is the elevation angle, $ L $ is the latitude of the location at which solar positioning is conducted, $ \delta $ is the solar declination, and $ H $ is the hour angle. There are two things to take note in using these formulas. One of those is that these formulas give only approximations rather than represent the exact astronomical expressions, and more precise alternatives are available [e.g., Blanc and Wald (2012); Grena (2012); Hoadley (2021)]. Secondly, there may be some computation issues resulting from the trigonometry involved. More specifically, $ \phi_s $ in Eq. (5) follows a zero-north–east-positive convention, in that, its range is from $ 0^\circ $ to $ 360^\circ $, whereas in many solar applications a zero-south–east-positive–west-negative convention is assumed. In both cases, because the inverse sine function is ambiguous, i.e., $ \sin \varphi_s = \sin(\pi - \varphi_s) $, a test is needed to determine the correct solution [see pg. 17 of (Vignola et al., 2020), for detail].

      Aside from $ Z $ and $ \varphi_s $, a third angle is needed for model chain, that is, the incidence angle ($ \theta $), which is the angle between the sun and the normal of the inclined surface on which PV is installed. The formula of $ \theta $ is (Masters, 2013; Vignola et al., 2020)

      where $ S $ and $ \varphi_c $ are the tilt and azimuth angles of the inclined collector surface, and $ \varphi_c $ again follows the zero-north–east-positive convention. Unlike the case of Eq. (5), no ambiguity emerges from the trigonometry in Eq. (6).

      Existing algorithms for solar positioning differ from one another in accuracy and computational complexity. The most insightful reference on this matter is the one by Hoadley (2021), who gave a table listing the accuracy and computational complexity of all major algorithms, from which the trade-off between the two properties is immediately obvious. For instance, the solar position algorithm (SPA) of Reda and Andreas (2008) takes 13 623 steps to compute, whereas the algorithm of Michalsky (1988) only requires 530 steps. Nevertheless, the difference in accuracy is also quite significant: SPA has an accuracy of $ \pm0.0003^\circ $, whereas that of Michalsky is accurate only up to $ \pm0.01^\circ $. Given that computational power nowadays is no longer a major issue as compared to several decades ago, it is advised to use whenever possible the highest-accuracy algorithm, that is, SPA, which is available in both $\rm{pvlib}$ of Python (Holmgren et al., 2018) and $\rm{insol}$ of R (Corripio, 2021).

      Solar positioning as the foremost stage of the model chain has to be carried out very carefully, for its validity will impact all subsequent stages. Owing to the different conventions used in different software packages, it is necessary to follow the documentation exactly. For instance, one common source of mistakes in the literature, to our experience, is the time convention, where the choice between Coordinated Universal Time (UTC) and local time could result in a shift in zenith and azimuth angles. Therefore, performing a sanity check is utterly essential. To do so, one can simply plot out the extraterrestrial irradiance, which could be computed with just the zenith angle, versus GHI, and the two bell-shaped curves should align nicely at sunrise and sunset times. Another common mistake is due to the time stamp convention used in irradiance data logging. To represent the average irradiance over an hour, the time stamp could be either ceilinged, centered, or floored. Without careful consideration, the extraterrestrial irradiance could still shift from the GHI data by a small margin depending on the temporal resolution of the data. This time alignment problem is thoroughly discussed in the validation paper by Yang (2018a). One can never be too careful with solar positioning.

    • In attempting to split the GHI ($ G_h $) into DHI ($ D_h $) and BNI ($ B_n $), scientists have proposed more strategies but achieved less success than any other stage of a model chain. In other words, separation modeling introduces by far the highest error among all stages of a model chain. This is largely owing to the non-injective relationship between the GHI and DHI/BNI, i.e., a single GHI value could correspond to an infinite number of DHI–BNI combinations, and the proportion of DHI could range from a mere 10% to nearly 100%. In fact, the ratio between DHI and GHI is called the “diffuse fraction” ($ k = D_h/G_h $), which is what separation models are essentially estimating. Since $ k $ is a normalized version of diffuse radiation, it is logical if GHI is also normalized before modeling. As such, a corresponding quantity called the clearness index ($ k_t $), which is the ratio between GHI and extraterrestrial GHI ($ E_0 $), is almost always the choice. Generally, separation modeling seeks to establish a $ k_t $$ k $ relationship.

      The difficulty of using $ k_t $ as the sole predictor for $ k $ is made apparent in Fig. 6a, in which the $ k_t $$ k $ pairs calculated using 1-min radiometry data collected at Carpentras, France, over the year 2017, are presented as the gray background scatter; the non-injective relationship is evident. Although it is too ambitious to use a single line to represent the entire gray background scatter, numerous attempts were still made, and among those, the logistic-function-based fitting, that is,

      Figure 6.  One-minute diffuse fraction prediction using the logistic function, BRL, Engerer2, and Yang4 models, using data from Carpentras (44.083°N, 5.059°E), France, over 2017. Measurements are shown as the gray background, and predictions are shown as scatters. Brighter colors denote more points in the neighborhood.

      may be deemed as the most representative one. A natural extension of the logistic model is to include additional predictors, such that the fitted line becomes a response surface. In this regard, Ridley et al. (2010) proposed the very famous BRL model—the naming follows the initials of the three authors of that paper—which considered the apparent solar time (AST), elevation angle in degrees ($ \alpha $), the daily average $ k_t $ ($ k_{t, \text{daily}} $), and a variability index ($ \psi $), which is computed by smoothing three successive $ k_t $ values. The BRL model writes:

      where the model coefficients can be obtained via least squares using some data. The $ k $ values predicted by the BRL model for Carpentras data are depicted in Fig. 6b.

      The BRL model itself is not a very high-performing model and its inclusion of AST as a predictor is questionable (Chris GUEYMARD, 2019, personal communication); however, its unique function form gives rise to another model that has achieved unprecedented success in the separation modeling literature, namely, the Engerer2 model (Engerer, 2015). Indeed, prior to 2016, separation models were already large in quantity, and the performance ranking of those models was completely opaque, as every newly proposed model was claimed to be superior to its peers. It was not until the seminal review by Gueymard and Ruiz-Arias (2016) that the heated debate was partially ended. In that review, a total of 140 separation models were compared using worldwide data from 54 research-grade stations, and the Engerer2 model was found to be quasi-universal with the highest accuracy at that time. The Engerer2 model takes a similar form to the BRL model


      which accounts for the cloud-enhancement (i.e., over-irradiance) events [see Gueymard (2017b), for a full analysis], and

      which represents the difference between the clearness index of clear-sky GHI and that of GHI—clear-sky GHI is denoted by $ G_{hc} $, so that $ G_{hc}/E_0 $ would be its clearness index. The reader may refer to the original publication by Engerer (2015) for the rationale and physics behind introducing these new predictors. Figure 6c shows the results of the Engerer2 model, and the benefit of including the cloud-enhancement index $ k_{de} $ into the modeling is evidenced by the points on the right side of the main body of scatter.

      Ever since its championing of the separation modeling contest in 2016, the Engerer2 model has appeared as a benchmark in almost all subsequent proposals of separation models. Numerous new proposals were made, most of which outperformed Engerer2, and opinions on the best separation model again became divided. Consequently, another seminal review was conducted, in which Yang (2022) compared 10 representative models proposed post-2016, on a 5-year dataset consisting of radiometry measurements from 126 stations worldwide, which is larger in size and wider in coverage than the dataset of Gueymard and Ruiz-Arias (2016). The Yang4 model (Yang and Boland, 2019; Yang, 2021a) became the chief of separation models. Formulation-wise, Yang4 resumes from Engerer2, and adds a new form of variability index, namely, $ k_\text{hourly}^\text{Engerer2} $:

      Here, $ k_\text{hourly}^\text{Engerer2} $ is the diffuse fraction predicted by the Engerer2 model on hourly data. The inclusion of this predictor is motivated by the BRL model, which uses the daily mean $ k_t $ as a term representing the variability in global radiation. Since separation modeling deals with diffuse radiation rather than the global one, the hourly diffuse estimate is thought to be, and in fact is, useful. Empirically, Fig. 6d depicts the prediction outcome of Yang4, which has better coverage of the gray scatter than Engerer2. Recently, a regime-switching version of Yang4, namely, Yang5 (Yang et al., 2024), was published, in that, it fits a separate set of model coefficients for each radiation climatology regime and shows further performance improvements.

      Before we close this subsection, it must be noted that the above-mentioned models are not readily available in various model chain software packages. This is due to two reasons: (1) the models were proposed very recently and the software packages have yet to be updated, and (2) the bulk of model chain applications deal with hourly data, whereas the models are most suitable for 1-min data—some antiquated hourly models such as the DISC model (Maxwell, 1987) are still being used during the production of the latest irradiance databases such as the NSRDB (Yu XIE, 2021, personal communications). Be that as it may, since the hourly models become increasingly outdated for today’s solar energy meteorology, it is advised to use the latest models whenever possible to attain the best model chain performance. The possibility of using machine learning for irradiance separation has also been recently investigated (Chu et al., 2024), but the results do not dominate the semi-empirical models.

    • There is a lot of content that should be devised for transposition modeling, as it comprises the third largest class of models in a model chain, after separation modeling and cell temperature modeling (see section 4.5 below). Nonetheless, insofar as opinion on the best transposition model is concerned, the consensus is far stronger than that on separation models. Despite that there being a tiny fraction of people objecting to this view, the 1990 version of the Perez model (Perez et al., 1990) is widely recognized as the quasi-universal transposition modeling choice. Strong empirical evidence has been provided by Yang (2016), who conducted by far the most comprehensive performance comparison of transposition models in terms of the number of models compared and the dimensionality of the dataset used; the Perez model won that contest with flying colors. Therefore, unless there are truly appealing reasons against its use—we shall see one below—the Perez model ought to be prioritized.

      In a nutshell, transposition modeling deals with converting the three horizontal or normal irradiance components, that is, GHI, DHI, and BNI, into those on an inclined surface. As suggested by geometry, the GTI on an inclined surface is composed of three additive components: the beam tilted irradiance ($ B_c $), diffuse tilted irradiance (DTI, $ D_c $), and the ground-reflected irradiance ($ D_g $). Whereas the beam component can be calculated by simple trigonometric projection, the ground-reflected component depends on GHI, surface albedo ($ \rho_g $), and a ground-view factor ($ R_r $, also known as the transposition factor for ground reflection), which can be assumed to be isotropic without too much loss of precision (Gueymard, 2009). To that end, almost all transposition models exclusively model DTI, or to be more precise, the sky-view factor ($ R_d $, also known as the diffuse transposition factor). Mathematically, the transposition equation is given by

      where the isotropic $ R_r $ is given by $ 0.5(1-\cos S) $. In what follows, we closely examine how $ R_d $ is modeled by Perez et al. (1990).

      The original Perez model was proposed in 1986 (Perez et al., 1986), and it underwent several major changes/simplifications after that, becoming the canonical Perez model which has gained massive popularity today. One should be aware that the entire modeling philosophy of the Perez model is established upon integrating the radiance of the hemispheric sky. The sky radiance is anisotropic in nature, and its value depends on the position in the sky, as marked by the polar angle ($ \vartheta $) and azimuthal angle ($ \varphi $). Figure 7 shows a differential solid angle and its representation in polar coordinates. Denoting the coordinate-dependent radiance with $ L(\vartheta, \varphi) $, one may integrate it over the hemispheric solid angle, and thus obtain DHI:

      Figure 7.  Illustration of a differential solid angle and its representation in polar coordinates.

      Clearly then, to arrive at $ D_h $, we are interested in knowing the analytic form of the radiance distribution of $ L(\vartheta, \varphi) $.

      The radiance distribution is complex and dependent on sky conditions. However, there are two phenomena that, once considered, could give a fairly good approximation. One of those is the forward scattering of beam radiation by aerosols, which makes the sky in the vicinity of the sun—or the circumsolar region—appear brighter than the regions of the sky dome far from the sun. The other is that the blue light created by Rayleigh scattering is “diluted” by the white light created by Mie scattering, due to the larger airmass at the horizon as compared to that at the zenith. Hence, on a clear day, the horizon band appears white and bright. Building upon these two phenomena, Perez et al. (1986) proposed a three-part geometrical framework, as shown in Fig. 8. In the original work, the circumsolar region was assumed to have a radius $ \alpha = 15^\circ $, whereas the horizon band was assumed to have an angular thickness of $ \xi = 6.5^\circ $. The overarching assumption of the original Perez model is that the radiances originated from these three parts are different, but remain constant within each part. In Fig. 8, the radiance from each part is represented by $ L $, $ F_1\times L $, and $ F_2\times L $, respectively, where $ F_1 $ and $ F_2 $ are sky-condition-dependent coefficients to be modeled. Through integration, one obtains

      Figure 8.  Illustration of the three-part geometrical framework used in the original Perez model, with respect to a horizontal plane.

      where $ \chi_h(\cdot) $ is a function of $ Z $, denoting the fraction of the circumsolar region above the horizon, which is wholly geometrically obtainable, and $ Z' $ is the average zenith angle of the visible part of the circumsolar region; this is Eq. (1) of Perez et al. (1986).

      Whereas Eq. (15) gives an expression for DHI under this three-part framework, a similar expression can be derived for DTI, following the geometry of Fig. 9, albeit the integration is much more difficult than in the horizontal case:

      Figure 9.  Illustration of the three-part geometrical framework used in the original Perez model, with respect to a tilted plane.

      which is identical to Eq. (2) of Perez et al. (1986). In Eq. (16), $ \chi_c(\cdot) $ is a function of $ \theta $, denoting the fraction of the circumsolar region seen by the collector plane, $ \theta' $ is the average incident angle of the visible part of the circumsolar region, and

      Since the diffuse transposition factor is the ratio of $ D_c $ and $ D_h $, one yields


      It is worth noting that when $ F_1=F_2=1 $, Eq. (18) collapses to $ R_d = 0.5(1+\cos S) $, which is the isotropic transposition model.

      Despite the formulation, to this day the original Perez model still has a series of unresolved issues, such as the double-counting problem during radiance integration when the circumsolar region overlaps with the horizon band at low-sun conditions, or effects of the approximations used for $ \chi_h $ and $ \chi_c $. However, the most challenging aspect is the complexity of the model, which is not conducive to easy uptake and prompted Perez et al. (1987) to make four major changes one year after the original model was proposed. These four changes were: (1) a reparameterization of the model coefficients; (2) allowance for negative coefficients; (3) a simplified geometric framework; and (4) a revised binning strategy for differentiating the sky conditions. Whereas the reader is referred to Perez et al. (1987) for a more detailed explanation of these four changes, it should just be noted here that the simplified geometric framework considers two physical surrogates: The brightening of the horizon band is concentrated at an infinitesimally thin line at the horizon, and the brightening of the circumsolar region is concentrated at a point at the center of the disk.

      The diffuse fraction of the simplified Perez model is given by


      and $ F'_1 $, $ F'_2 $ are new sets of sky-condition-dependent model coefficients, which are not the same but analytically related to $ F_1 $ and $ F_2 $ of the original model. In narrating the sky-condition dependence of $ F'_1 $ and $ F'_2 $, Perez et al. (1990) proposed using the sky’s clearness ($ \varepsilon' $) and sky’s brightness ($ \Delta $), in that,


      The units for $ Z $ in Eq. (28) are radians, and $ E_{0n} $ in Eq. (29) is the extraterrestrial BNI, whereas $ m_r\approx \cos^{-1}Z $ is the relative air mass. The values of $ F'_{11}(\varepsilon'), \dots, F'_{23}(\varepsilon') $ are given in Table 6 of Perez et al. (1990), which were trained using hourly data from nine locations in the United States and Europe. (Unfortunately, that dataset was lost during a hard disk crash that took place in the early 1990s.) This latest “official” set of model coefficients is thought to have reached an asymptotic level of optimization (Yang et al., 2014); however, refitting needs may arise if the model is to be used for higher-resolution applications that require minute data. The fitting procedure for $ F'_{11}(\varepsilon'), \dots, F'_{23}(\varepsilon') $, which is a least-squares approach, can be found in the documents by Perez et al. (1988) and Yang et al. (2014).

      Much effort has been spent on introducing the Perez model, due to its cardinal importance in transposition modeling. There are nevertheless other transposition models, each following certain assumptions and modeling philosophies, that are much simpler but do not perform substantially worse than the Perez model. Typifying such a trade-off between modeling complexity and performance is the Hay model (Hay and Davies, 1980), which has the form

      where $ A = B_n/E_{0n} $ is the anisotropy index as termed by Hay and Davies (1980); more generally, $ A $ should be referred to as the direct transmittance, since it is the ratio of BNI and extraterrestrial BNI. The formulation of Eq. (30) is intuitive, in that, the first part of the formulation gives the isotopic $ R_d $, whereas the second part gives an $ R_d $ that is purely directional. Clearly then, if $ A=1 $, the atmosphere is scattering-free, and all diffuse radiation is represented by the circumsolar collimated component; if $ A=0 $, the sky is overcast, and the isotropic diffuse transposition factor results; and if $ A $ is between 0 and 1, some degree of anisotropy is assumed to exist, and $ R_d^\text{Hay} $ is a convex combination of the isotropic and purely directional $ R_d $’s. Another example is the Bugler model (Bugler, 1977), which assumes the brightening of the circumsolar region is 5% of BNI, which leads to the following model form:

      The $\rm{pvlib}$ Python library offers a good collection of transposition models, which should be sufficient for most model chain applications. That said, a more complete code base for transposition models is the $\rm{SolMod}$ R package, in which implementations of all 26 models as appeared in the review of Yang (2016) are available. At this point, one may question the need for having so many alternatives implemented when the Perez model is known to be quasi-universal. One compelling reason is that the best-performing transposition model may not necessarily lead to the optimal model chain, largely owing to the intricate and untraceable error propagation that takes place within the model chain. This is in fact the main conclusion of Mayer and Gróf (2021), among whom the lead author has since been advocating the viewpoint that theconstruction of a model chain should be treated as a “system” in its entirety, rather than assembling the best-performing component model for each stage. In any case, what comes out of transposition models is the GTI. Yet, GTI is still not exactly the irradiance reaching the solar cells, for most PV modules are encapsulated by glass and other protective materials, which reflect and transmit the incoming irradiance differently, depending on the material properties. To model the effective irradiance reaching the solar cells, reflection loss models are needed, which are explained next.

    • The nominal power of a PV module is determined under the standard test condition (STC), which encompasses an incident irradiance of 1000 W m−2, a reference air mass 1.5 spectra, and a cell temperature of 25°C. On top of these, STC also entails in geometry that the incident light is perpendicular to the module’s surface. Notwithstanding, the incidence angle can vary between 0° to 90° in field conditions, leading to a certain amount of reflection loss as compared to the STC case. This amount of reflection loss ought not to be regarded as negligible. For instance, Causi et al. (1995) reported that reflection may cause a 5%–10% energy loss for beam radiation, whereas this is 11%–15% for diffuse radiation. In another work, Martin and Ruiz (2001) noted that the pyranometer–PV disparity may lead to a 1.3%–14.8% difference in monthly yield estimation, which is due to both reflection loss and mismatch in the spectral response of pyranometer and PV. On this point, it is advised not to pass GTI directly onto the next stage of a model chain without accounting for the reflection loss, because that would lead to an overestimation in the PV power production. Terminology-wise, reflection loss models are also known as relative transmittance models (Xie et al., 2022) or angular loss models (Martin and Ruiz, 2001). In terms of model output, reflection loss models seek to estimate/compute a quantity known as the relative transmittance—herein denoted using $ \tau $—which is to be further elaborated below.

      When light strikes the interface between a medium with refractive index $ n_1 $ and another medium with refractive index $ n_2 $, reflection and refraction occur (see Fig. 10). The physics that governs this process is explained by the Fresnel equations. For a smooth surface, the reflection of unpolarized radiation is

      Figure 10.  Incidence and refraction angles in media with refractive indices $ n_1 $ and $ n_2 $.


      in which $ \theta $ and $ \theta' $ are the incident and refractive angles, respectively, which are linked to the indices of refraction by Snell’s law,

      Recall Eq. (13) in which GTI is written as the sum of three irradiance components, namely, $ B_c $, $ D_c $, and $ D_g $. As such, if the Fresnel equations are to be applied to GTI, the three irradiance components should be considered separately. Applying the Fresnel equations to $ B_c $ is quite direct, especially when $ n_1 $ and $ n_2 $ of many materials are already well known. On the other hand, owing to the omni-directional property of $ D_c $ and $ D_g $, deriving analytic expressions of the transmittance of diffuse and reflected radiation must undergo some forms of integration of the Fresnel equations, which are not straightforward. To that end, many early works resort to using empirical approaches. More generally, studies on reflection loss modeling can be divided into two kinds: those that consider the Fresnel equations and those that do not—Table 1 of Xie et al. (2022) presents a good summary of available models according to this division.

      In what follows, we should use the subscripts $ b $, $ d $, and $ g $ to denote “beam,” “diffuse,” and “ground-reflected,” as per the common convention. As such, $ \tau_b $, $ \tau_d $, and $ \tau_g $ should then represent the relative transmittances of the beam component, diffuse component, and ground-reflected component of GTI. It should be highlighted here that the term “relative transmittance” is not a universally accepted one, as a variety of other names have been used, such as the transmittance–absorptance product (Duffie and Beckman, 2013), angle-of-incidence correction factor (Marion, 2017), (one minus) angular loss factor (Martin and Ruiz, 2001), or incident angle modifier [IAM; De Soto et al. (2006)]. These aliases introduce confusion, but the way in which these relative transmittances affect GTI is clear: By defining the absorbed radiation, $ G'_c $, one may write

      In this equation, $ B_c = B_n\cos\theta $ follows geometry, $ D_g = \rho_gG_h(1-\cos S)/2 $ results from the assumption of Lambertian foreground, and $ D_c $ can be obtained via any transposition model. In other words, once the GTI is obtained after irradiance transposition, if the relative transmittances are estimated next using relative transmittances models, one arrives at the absorbed radiation, which, if spectral mismatch and soiling are to be temporarily neglected, can be regarded as the effective irradiance, a well-known term in PV performance modeling. As mentioned earlier, there are two schools of modeling approaches for relative transmittance: one empirical and the other physical. While acknowledging that there are more options, the two most representative and technically refined models, one from each school, are elaborated next.

      Representing the empirical modeling of relative transmittances is the Martin model (Martin and Ruiz, 2001), which is by far the most popular choice in the literature. For $ \tau_b $, Martin and Ruiz (2001) discovered that an exponential function could somewhat suffice in describing the relationship between the incidence angle ($ \theta $) and the angular loss ($ a_r $) fairly well:

      Here, $ a_r $, which varies for different PV encapsulation designs, needs to be empirically determined. 5 On the other hand, the expressions for $ \tau_d^\text{Martin} $ and $ \tau_g^\text{Martin} $ are calculated by integrating the contribution of each solid angle unit incident on the tilted surface, assuming isotropy:


      Whereas $ c_1 = 4/(3\pi) $, the other two model coefficients $ a_r $ and $ c_2 $ depend on the PV panel encapsulation configuration, e.g., $ a_r=0.173 $ and $ c_2= -0.0675 $ for an air–glass configuration. As the original paper of Martin and Ruiz (2001) offers a comprehensive list of $ a_r $ values, $ c_2 $ could be customarily retrieved using the linear function $ c_2 = 0.5a_r - 0.154 $, as specified by the international standard BS EN IEC 61853-3:2018.

      Whereas the Martin model is empirical, a full physical account for the transmittance of a cover system ought to consider both reflection loss at the interface and absorption within the glazing. The attenuation of a light beam by an optically homogeneous medium is described by Bouguer’s law, which is sometimes referred to as Beer’s law, or the Bouguer–Lambert law. Bouguer’s law, when applied to PV, results in $ \exp(-KL/\cos\theta') $—see Eq. (3.78) of Duffie and Beckman (2013)—where $ K = 4 $ m$ ^{-1} $ is the extinction coefficient of glass, $ L = 2 $ mm is the typical thickness of the glazing, and the refractive angle $ \theta' $, following Snell’s law, is

      where $ n_\text{PV} $ is the refraction index of the PV module cover material, as $ n_\text{PV} = 1.526 $ for normal, and $ n_\text{PV} = 1.3 $ for anti-reflection coated glass (De Soto et al., 2006; Duffie and Beckman, 2013). However, because absorption has a negligible effect compared to reflection, many have chosen to exclude absorption in their modeling [e.g., Marion (2017); Xie et al. (2022)]. Also often neglected is multiple reflection, which is a prominent feature for flat-plate solar collectors, but not for PV. To that end, the physical relative transmittance for beam radiation is only based on Fresnel equations:

      As for $ \tau_d $ and $ \tau_g $, their analytic expressions through integrating the Fresnel equations have troubled physicists over the past century or so. However, just very recently, the situation was relieved by Xie et al. (2022), after discovering an alternative form of the Fresnel equations originally proposed by Schlick (1994), through which the integration became feasible. Whereas readers are referred to the publication of Xie et al. (2022) for details, the expressions are simply listed here. The relative transmittance for diffuse radiation is


      is a weighting function, of which the value depends not only on the refraction index of the PV module cover material, but also on that of the pyranometer cover, which is usually a fused silica dome with $ n_T = 1.4585 $. As for the relative transmittance for ground-reflected radiation, it is

      To give perspective on how various reflection loss models can differ, Figs. 11 and 12 depict the $ \tau_b $, $ \tau_d $, and $ \tau_g $ modeled by various options. Again, the modeling details of models other than the ones by Martin and Ruiz (2001) and Xie et al. (2022) are omitted for brevity, but the references are provided in the caption for those who are interested. From Fig. 11 it is observed that for $ \theta < 60^\circ $, which roughly corresponds to a $ 1/\cos\theta $ value of 2, all models behave similarly. However, the model output transients behave very differently for $ \theta > 60^\circ $, suggesting the empirical models are highly sensitive to the parameter choice. As for Fig. 12, it is evident that, except for the Brandemuehl model, the other three models estimate $ \tau_g $ quite similarly over the entire valid range of $ S $. Quite large deviations are seen, on the other hand, for the $ \tau_d $ of the four models. It should be again noted here that $ \tau_d^\text{Xie} $ is sensitive to $ n_\text{PV} $, as evidenced by Fig. 3b of Xie et al. (2022). Both figures reveal that careful selection of model parameters is essential. At the moment, there is not any study that compares these reflection loss models, which presents a major research gap in the existing literature.

      Figure 11.  Relative transmittances for beam radiation ($ \tau_b $) estimated using different models, as functions of $ 1/\cos\theta $, with $ 0^\circ\leqslant \theta\leqslant 85^\circ $. Model parameters are: $ b_0 = 0.05 $ is used for the ASHRAE model (Duffie and Beckman, 2013); $ \beta_0=1 $, $ \beta_1 = -2.438\times10^{-3} $, $\beta_2= $ $ 3.1003\times 10^{-4} $, $ \beta_3=-1.246\times10^{-5} $, $ \beta_4=2.11\times10^{-7} $, $ \beta_5=-1.36\times10^{-9} $ are used for the King model (King et al., 2004); and $ a_r = 0.173 $ is used for the Martin model (Martin and Ruiz, 2001). The physical model follows Eq. (41) with $ n_\text{PV} = 1.3 $.

      Figure 12.  Relative transmittances for diffuse radiation ($ \tau_d $) and ground-reflected radiation ($ \tau_g $) estimated using different models, as functions of module tilt angle $ S $, with $ 0^\circ\leqslant S\leqslant 90^\circ $. Model parameters are: $ a_r = 0.173 $ is used for the Martin model (Martin and Ruiz, 2001); $ n_\text{PV} = 1.526 $ and $ n_T = 1.4585 $ are used for the Xie model (Xie et al., 2022). The references for the other two models appearing in this figure are Brandemuehl and Beckman (1980) and Marion (2017).

    • As we shall see in a later subsection, the output power of a PV module/cell is characterized by $ I $$ V $ curves. While more information is to be provided below, one should know that the position where the $ I $$ V $ curve cuts the $ y $-axis represents the short-circuit current ($ I_\text{sc} $), and the position where the curve cuts the $ x $-axis represents the open-circuit voltage ($ V_\text{oc} $). It is widely known that when the irradiance reaching the module/cell drops, $ I_\text{sc} $ drops quasi-linearly with it. On the other hand, when the operating temperature of the module/cell rises, $ V_\text{oc} $ drops substantially with a slight increase in $ I_\text{sc} $. In the latter case, considering the scales of change happening to $ V_\text{oc} $ and $ I_\text{sc} $, elevated operating temperature is accompanied by a net decrease in output power. The effect of temperature on the power output is depicted in Fig. 13, in which the $ I $$ V $ curves of a Canadian Solar CS5P-220M module under a constant irradiance of 800 W m−2 but varying cell temperature are plotted. Clearly then, accurate modeling of the cell temperature is essential to model chain. In fact, the ambient temperature may be converted to both module and cell temperatures, which makes it the second most important meteorological variable, right next to GHI, in PV modeling.

      Figure 13.  The $ I $$ V $ curves of a Canadian Solar CS5P-220M module, under the incident irradiance of 800 W m−2, and varying cell temperature (0°C–60°C).

      Deriving the module or cell temperature wholly from physical principles is known a priori to be difficult; for instance, it depends upon the technical specifications of the module, which include but are not limited to material, encapsulation, whether or not aluminum fins are installed as heat sink, as well as the habitat of installation (e.g., mounting, shading, ventilation). Even if these are all known, the derivation still requires immense knowledge of photonics and heat transfer, which is in itself incomplete at the moment to fully address the problem at hand. To that end, all cell temperature models are empirical, insofar as their usage in model chain construction is concerned. Largely owing to the empirical nature of this problem, a quick scan of the literature reveals that all too many possibilities have been proposed, which makes the cell temperature models the second most numerous in count among all steps of model chain. However, most models consider a basic set of meteorological variables during modeling, they are: the ambient temperature ($ T_\text{amb} $), the incident (or effective) irradiance ($ G_c $ or $ G'_c $), and wind speed ($ W $).

      In an ideal scenario, cell temperature should be derived under effective weather conditions. For instance, the effective irradiance reaching the solar cell, which is lower than GTI due to reflection loss, should be used. As for wind, the actual wind blowing across the PV array may be altered, due to the structure of the arrays, from where wind speed and direction are measured, which leads to heat removal effects that can be quite different. However, the difficulty is that, in practice, these effective weather conditions are more often than not inaccessible. In that, cell temperature models are almost always developed based on what is measured, e.g., GTI and wind information acquired at a nearby weather station. Usually, the advice is to check the condition of usage before applying any cell temperature model, and the models should only be used if the same kind of inputs with which they were fitted are available.

      A review on temperature modeling was conducted by Skoplaki and Palyvos (2009b), who classified the available models into implicit ones and explicit ones. Implicit models refer to those based on the heat transfer mechanisms and the thermal properties of the modules. This kind of model considers the energy balance of the module, including the convection loss, and the radiation loss to the sky and to the ground, to estimate an overall heat loss coefficient. Nonetheless, such calculations necessarily assume the module to be in a steady state, which is rarely achievable during operation (Mora Segado et al., 2015). A workaround to the complex energy balance calculation is thus to accompany the implicit models with fitted coefficients and various assumptions; this greatly defies the virtue of considering physics in the first place. In comparison, explicit models directly map weather variables to cell temperature, and a linear function is often thought sufficient. Tables 1 and 2 of Skoplaki and Palyvos (2009b) list some of the cell temperature models available at that time. In this section, considering that the temperature only exerts secondary effects on PV power generation, the principles according to which various temperature models are derived are not reiterated herein; instead, only the results of selected models are presented.

      Linear models are always the simplest. In the present case, one may write the cell temperature ($ T_\text{cell} $) as a linear function of $ T_\text{amb} $, $ G_c $, and $ W $ (TamizhMani et al., 2003; Muzathik, 2014):

      where $ \beta_0, \dots, \beta_3 $ are model coefficients, which change with the location, material, encapsulation type, and mounting type of the module, among other influencing factors. Clearly, there can be an infinite number of sets of coefficients, and the best set should always be the one produced based on the local context. One way to circumvent fitting during such linear modeling is to introduce the notion of nominal parameters, so long as the PV manufacturers comply with the standard and report these nominal parameters. On this point, the concept of nominal operating cell temperature (NOCT) has been used by solar engineers as a convenient means to quantify the thermal design of a PV module and to provide a reference temperature for rating power output. As per the current standard, NOCT is to be determined by the module manufacturers at a standard environment: a wind speed at a PV module height of 1 m s−1, an ambient temperature of $ 20^\circ $C, and an irradiance of 800 W m−2. With NOCT, Ross (1982) rewrote Eq. (45) into an equivalent form:

      where $ \beta_1=1 $, $ \beta_2 = (\text{NOCT}-20 )/800 \text{ W m}^{-2} $, and $ \beta_0= \beta_3=0 $.

      There is an interesting saying: If we are to divide real-world systems and processes into linear and nonlinear ones, it is as if we are classifying animals in a zoo into elephants and non-elephants. There are also undoubtedly many nonlinear cell temperature models, because the underlying process is nonlinear. Typifying the refined nonlinear modeling of cell temperature is the model by Fuentes (1987), who extended the concept of NOCT to “installed” NOCT (INOCT), which accounts for the deviation from NOCT due to the mounting configuration and wind information. The Fuentes model is based upon energy balance: The PV module is treated as a lump of solid material, which receives heat in the form of irradiance, and loses heat in the form of convection to the ambient environment. However, some have argued that this model is overly complex, and a much-simplified model would not result in any intolerable difference (King et al., 2004). Fortunately, the Fuentes model is now available in $\rm{pvlib}$, which contributes much to its uptake. As for a simpler model, King et al. (2004) proposed

      where the model coefficients $ a $, $ b $, and $ \Delta T $ depend on the module encapsulation and mounting. For instance, if a glass/glass sealed module is mounted on a closed roof, $ a = -2.98 $, $ b = -0.0471 $, and $ \Delta T = 1 $; if a glass/polymer sealed module is mounted on an open rack, $ a = -3.56 $, $ b= -0.075 $, and $ \Delta T = 3 $. For other encapsulation and mounting configurations, the reader is referred to Table 1 of King et al. (2004). Accuracy-wise, the King model has a $ \pm5^\circ $C uncertainty, which, though seemingly quite substantial, only results in a less than 3% difference in the final power estimates.

      Another popular nonlinear cell temperature model is the Faiman model (Faiman, 2008):

      where $ u_0 $ and $ u_1 $ are heat loss factor coefficients, which take the values of 25 and 6.84, as per the experimental outcome of Faiman (2008). The drawbacks of this model are quite exposed, in that, the values of $ u_0 $ and $ u_1 $ are situation-dependent, and the model does not distinguish module temperature from cell temperature. Nevertheless, the $\rm{PVSyst}$ software, of which the analysis is acceptable by banks for loan applications, uses a variant of the Faiman model. $\rm{PVsyst}$ includes two additional parameters, namely, the module external efficiency ($ \eta_\text{mod} $) and absorption coefficient ($ \alpha $), as a means to discount the incident irradiance to effective incident irradiance:

      The $\rm{PVsyst}$ implementation also resolves the ambiguity in the temperature of the original Faiman model. The rationale behind including the module efficiency is that the electric power output of the PV module also contributes to the energy balance; in other words, the part of the absorbed irradiance that is converted to electricity does not contribute to the heating of the module. That said, Eq. (50) is associated with a practical difficulty, which originates from the modeling of $ \eta_\text{mod} $:

      where $ A_\text{mod} $ is the area of the module, and $ P_\text{dc} $ is the DC power output by the module under the particular $ G_c $. Inasmuch as model chain construction is concerned, cell temperature modeling precedes DC power modeling, and hence this inter-dependency of $ T_\text{cell}^\text{PVsyst} $ and $ P_{\text{dc}} $ could lead to a chicken-and-egg problem.

      A solution to the inter-dependency of the cell temperature and efficiency can be given by combining a PV module efficiency model into the thermal energy balance equation. A good example of this is the model proposed by Mattei et al. (2006), where the linear temperature-dependent PV efficiency model is integrated into the cell temperature calculation, resulting in the following formula

      where $ U_{PV} = 26.6 + 2.3 W $ is the heat exchange coefficient as a function of the wind speed, $ \gamma_{P_\text{mpp}} $ is the temperature coefficient of the maximum power, and $ \eta_\text{mpp, ref} $ is the module external efficiency at STC (i.e., the nameplate efficiency of the module), which can be obtained from the module datasheet. Finally, $ (\tau\alpha) $ is the transmittance–absorptance product, 6 which expresses that the part of the incident irradiance that is not absorbed by the module does not contribute to its heating. Although the relative transmittances can be modeled for each irradiance component in a time-varying fashion, such an approach could easily lead to a double-counting problem if the reflection losses are already accounted for in the previous modeling step during the calculation of the effective irradiance. To avoid this, the $ (\tau\alpha) $ term in cell temperature modeling is better reduced to a constant that stands for the transmittance under nominal conditions, e.g., a value of 0.81 was recommended for the Mattei model by its authors.

      The last cell temperature model that should be discussed is the one employed by the System Advisory Model [SAM; Gilman et al. (2018)], which is another popular PV modeling software beside $\rm{PVsyst}$ and $\rm{pvlib}$. The SAM model writes

      where $ \text{NOCT}' $ is the adjusted NOCT based on the mounting stand-off,

      $ W' $ is the wind speed adjusted for height above the ground, that is,

      while for the $ (\tau\alpha) $ in this model, both SAM and $\rm{pvlib}$ use a default value of 0.9.

    • In the power systems literature, the DC power output from PV is often abbreviated into one equation:

      The above equation is known as the Evans model (Evans and Florschuetz, 1977) and adopted by PVWatts (Dobos, 2014), which links the DC maximum power point (MPP) output of a PV module to its nameplate power ($ P_\text{dc, mpp, ref} $), the MPP temperature coefficient $ \gamma_{P_\text{mpp}} $, the cell temperature $ T_\text{cell} $, and the effective incident irradiance ($ G'_c $). Whereas various concepts such as MPP or temperature coefficient are to be explained shortly after, it should be first highlighted that, despite the popularity of Eq. (56), refined DC power modeling can be far more complex. To give perspective, Fig. 14 shows the module layout and single-line diagram of an actual roof-top PV system.

      Figure 14.  The design of an actual roof-top PV system with a total DC capacity of 103.04 kWp: (a) module layout and (b) single-line diagram. Information courtesy of Licheng LIU, RENOVA, Inc., Singapore.

      The installed capacity of the roof-top PV system is 103.04 kWp, which suggests that the system generates 103.04 kW of power under STC. The PV module selected for the system is JA Solar JAM72S20-460/MR, and there are 224 pieces of those, each having a nominal power of 460 W. From Fig. 14b, one can see that these 224 modules are arranged into 15 strings, which are respectively tied to the 10 maximum power point trackers (MPPTs) of a Huawei SUN2000-100KTL inverter. It is worth noting here that the MPP denotes the position on an $ I $$ V $ curve that maximizes the power output. As the $ I $$ V $ changes with the irradiance and temperature conditions, the function of the MPPT is to ensure that the MPP is tracked in real-time. As for the inverter, it converts DC power to AC power, and having multiple MPPTs makes the strings operate independently of each other, which is useful in situations where partial shading takes place over some strings.

      Clearly, the most refined way to model a PV system ought to proceed from the exact system design. When the layout is fully specified, it is possible to calculate the output of each string according to the weather conditions, as well as other environmental factors such as shading. In fact, professional/commercial PV system design software, such as $\rm{PVsyst}$ or SAM, has long had such capabilities. However, it should be noted the goal of this kind of software is designing the system, which requires just a typical meteorological year (TMY) dataset, rather than conducting performance evaluation or forecasting, which requires the most up-to-date information in a rolling manner. Therefore, it is still of interest to know the refined way in which DC power could be modeled in a model chain, such that the software dependence can be dismissed, thus broadening the range of applications concerning model chain.

      DC power models can be broadly categorized into two groups: empirical and diode models. Empirical PV models estimate the DC power (or sometimes, efficiency) of the PV system at the MPP using surrogate equations or regression equations with coefficients determined based on measurement data. Empirical DC models have the benefit of being simple, but their validity is limited to the extent of operations under MPP, which might not always be ensured in practice. On the other hand, lumped-circuit models with multiple diodes (as shown in Fig. 15) have been broadly accepted as they can accurately describe the $ I $$ V $ characteristics of a PV module/cell by tracing out the entire $ I $$ V $ curve. In a sense, because the diode-model-based power estimates can be scaled up to the power of the entire system, this approach is physical in nature, which is more accurate and thus should be preferred if system design information is available. System design information refers to the kind of design document such as the one shown in Fig. 14, which encompasses the module choice, the inverter choice, and the series–parallel configuration.

      Figure 15.  Equivalent circuit of a PV module/cell—the multi-diode model (see text for a description of the symbols in this figure.

    • Before the diode model is introduced, the empirical models are first reviewed. The Evans model depicted in Eq. (56) appeared in many works including those of Fuentes et al. (2007) and Marion (2002), but can be further traced to Osterwald (1986) and Evans and Florschuetz (1977), if not earlier. Despite its simple form, many have concluded that its accuracy is acceptable [e.g., Rodríguez-Gallegos et al. (2020); Haffaf et al. (2021)]. Such conclusions might be due to the fact that the model allows some degree of fine-tuning to its formulation. For example, it is possible to multiply the DC power output estimated by the model with an efficiency term or a correction factor, which can compensate to a certain degree the over- or under-predictions resulting from using Eq. (56) alone.

      However, some care must be taken when using the Evans model. Firstly, the model requires as input the effective irradiance, which is by formal definition the GTI subtracted by the reflection loss, soiling loss, and spectral loss. Yet, in PVWatts implementation, the latter two are not involved at this stage. In other words, to be fully consistent with PVWatts, only the reflection loss needs to be subtracted from the GTI. Secondly, the temperature coefficient is an artificial conception rather than a physical one, which implies that its determination is empirical. Besides $ \gamma_{P_\text{mpp}} $, which denotes the MPP temperature coefficient, there are also short-circuit current temperature coefficient $ \alpha_{I_\text{sc}} $ and open-circuit voltage temperature coefficient $ \beta_{V_\text{oc}} $, which ought not to be mixed up in usage. The common range for $ \gamma_{P_\text{mpp}} $ is from $ -0.3 $% °C−1 to −0.5% °C−1, which can be found in the module's datasheet. Alternatively, to receive a better accuracy, it is also common practice to refit a $ \gamma_{P_\text{mpp}} $ value from data, i.e., via outdoor testing. Also important is that when $ \gamma_{P_\text{mpp}} $ is applied to Eq. (56), its value needs to be first divided by 100, to convert percentage to decimal.

      The Evans model as shown in Eq. (56) is expressed as a function of the nameplate power. Alternatively, one may also use the product of the nameplate efficiency ($ \eta_\text{mpp, ref} $) and module area ($ A_\text{mod} $) to write the model equivalently as

      For example, the JA Solar JAM72S20-460/MR module as used in the system in Fig. 14 has $ P_\text{dc, mpp, ref} = 460 $ W and $ A_\text{mod}=2.23 $ m$ ^2 $. With these, one can calculate the nameplate efficiency to be $ \eta_\text{mpp, ref} = 0.206 $ or 20.6%. Because the conversion between nameplate power and efficiency is trivial, the literature usually does not distinguish between modeling the power and modeling the efficiency.

      The main criticism of the Evans model is that it only accounts for the temperature dependence of the PV module efficiency, i.e., the $ \eta_\text{mpp, ref}[1+\gamma_{P_\text{mpp}}(T_\text{cell}-25)] $ part of Eq. (57). But in reality, the efficiency also depends on the incident irradiance. Hence, Huld et al. (2011) proposed to estimate the efficiency as a function of both the cell temperature and relative effective irradiance:

      where $ G'_\text{rel} = {G'_c}/{1000 } $ and $ T'_\text{cell} = T_\text{cell}-25 $. The Huld model has been developed for, and is used in, the widely known online PV simulation tool called PVGIS. 7 A conspicuous shortcoming of this model is the compulsion of parameter fitting. For crystalline silicon PV modules, one may use $ k_1 = -0.017237 $, $ k_2 = -0.040465 $, $ k_3 = -0.004702 $, $ k_4 = 0.000149 $, $ k_5 = 0.000170 $, and $ k_6 = 0.000005 $.

      Another appealing empirical DC power model is the one proposed by Beyer et al. (2004), where the temperature and irradiance dependence of the efficiency are separated into two multiplicative terms:

      The effect of the cell temperature is accounted for in the exact same way as in the Evans model, whereas the irradiance-dependence is modeled by a linear-logarithmic relationship with three parameters. To fit these parameters, one only needs the relative efficiency of the modules at three different irradiance levels, which makes this model fairly easy to be tailored to the PV module of interest. Even though such data are not universally available, many PV manufacturers disclose such information in their product datasheets.

      Besides the Evans, Huld, and Beyer models, other empirical DC power models also exist, and are in fact great in number. Among the various modeling strategies, linear regression and nonlinear regression with interaction terms are the most popular. Skoplaki and Palyvos (2009a) offered a comprehensive list of empirical DC power models. Whereas most empirical models do not substantially differ in functionality from each other, a rather unique one is the model developed by King et al. (2004), which not only models the DC power, but also provides equations to estimate the MPP current ($ I_\text{mpp} $) and voltage ($ V_\text{mpp} $). Because the formulation is fairly tedious and evolves many module-specific empirical coefficients, the details are omitted here. Worth mentioning, however, is that King et al. (2004) is essentially estimating five points on the $ I $$ V $ curve, which makes it a special form of the diode model, which is the next subject of discussion.

    • According to the $ m $-diode model as depicted in Fig. 15, the current $ I $ of a PV module, as governed by Kirchhoff’s current law, is expressed as

      where $ I_\text{L} $ is the photocurrent, which is proportional to the effective irradiance; $ I_{0_j} $ is the reverse saturation of the $ j{\text{th}} $ diode, $ a_j = N_sn_jkT_\text{cell}/q $ is the modified ideality factor for the $ j{\text{th}} $ diode ($ N_s $ is the number of cells connected in series, $ n_j $ is the ideality factor, $ k=1.380649 \times 10^{-23} $ J K−1 is Boltzmann’s constant, $ T_\text{cell} $ is the cell temperature, and $ q=1.60217663\times10^{-19} $ C is the electronic charge); $ R_\text{s} $ and $ R_\text{sh} $ are the series and shunt resistances, respectively.

      Physically, it is well known that diode $ \text{D}_1 $ accounts for the carriers diffusing across the P–N junction and recombination that takes place in the bulk and at the surface. Diode $ \text{D}_2 $ can be attributed to carrier recombination by traps within the depletion region, or to carrier recombination at an unpassivated cell edge. As for the other diodes $ \text{D}_3 $ to $ \text{D}_m $, the motivation for having them is more mathematical than physical, but one may think of them as accounting for distributed and localized effects in solar cells such as Auger recombination. Compared to $ \text{D}_1 $ and $ \text{D}_2 $, the contribution of $ \text{D}_3 $ to $ \text{D}_m $ to DC power modeling is small; the literature is populated with works that deal with one- and two-diode models. Of course, in terms of a mathematical solution, multi-diode modeling, though entirely possible to be solved analytically [see Lim et al. (2015a)] is notoriously intricate, which has hitherto been limiting its uptake. In what follows, only the one-diode model is considered, and Eq. (59) reduces to

      Equation (60) contains five parameters, namely, $ I_\text{L} $, $ I_0 $, $ a $, $ R_\text{s} $, and $ R_\text{sh} $. When these five parameters are all known, the equation should theoretically allow one to obtain $ I $ for any $ V $ value, or vice versa. However, because Eq. (60) is a transcendental equation (i.e., not algebraic), it is difficult to solve. On this point, Jain and Kapoor (2004) showed that the explicit solution for $ I $ and $ V $ can be expressed using the Lambert W function—this bi-directional retrieval method is implemented in the "i_from_v" and "v_from_i" functions of $\rm{pvlib}$. Among the five unknown parameters, $ R_\text{s} $ is a constant, whereas the other four parameters are time-varying and depend on the meteorological conditions under which the module operates. Generally, if the one-diode model is to be used in a model chain, three main steps are involved: (1) estimating the five parameters at some reference condition, which usually refers to STC, (2) estimating the five parameters for the operating conditions, and (3) solving the circuit equation to find the MPP.

      Perhaps surprisingly, a typical datasheet of a PV module does not contain the values of the five parameters at STC. However, solving the one-diode model parameters has been extensively studied, and many strategies are now well known. First and foremost, it is necessary to apply Eq. (60) for the short-circuit, open-circuit, and MPP conditions, which leads to three equations:

      where $ I_\text{mpp} $, $ V_\text{mpp} $, $ I_\text{sc} $, and $ V_\text{oc} $ are the MPP current and voltage, short-circuit current, and open-circuit voltage, respectively. Subsequently, knowing that the derivative of the power at MPP with respect to voltage is zero, the fourth equation obtains,

      In regard to the fifth equation, opinions diverge, which has led to various proposals [e.g., De Soto et al. (2006); Laudani et al. (2015); Lim et al. (2015b)]. It should be made clear that not all methods for solving the fifth equation in the literature have comparable accuracies, because some methods make more suppositions than others, leading to identification errors in the first parameter and thus all subsequent ones. Practically, there are two options to obtain the one-diode model parameters for a given module. First is that the California Energy Commission (CEC) module library, as available in both $\rm{pvlib}$ and SAM, offers the identified parameters for a very wide range of modules on the market, 8 and one may directly search and use those values. If some particular module is not within the database, such as the JA Solar JAM72S20-460/MR module used in the design in Fig. 14, $\rm{pvlib}$ offers various options, including "$\rm{fit\_cec\_sam}$" (Dobos, 2012), "$\rm{fit\_desoto}$" (De Soto et al., 2006), and "$\rm{fit\_pvsyst\_sandia}$" (Hansen, 2015), which can convert the datasheet information into one-diode model parameters.

      Upon successfully estimating the values of the five parameters of the one-diode model under STC, the next step is to translate these parameters in accordance with an arbitrary operating condition as specified by a particular set of $ G'_c $ and $ T_\text{cell} $ values. Again, various options are available for this step [e.g., De Soto et al. (2006); Dobos (2012); Sauer et al. (2015)]. For instance, the system of equations offered by De Soto et al. (2006) is:

      where $ k = 8.61733\times10^{-5} $ is the Boltzmann’s constant in eV (K)−1; $ T_\text{cell} $ is in K; $ E_\text{g, ref} = 1.121 $ eV for crystalline silicon but takes other values for other cell materials; $ \alpha_\text{sc} $ is the short-circuit current temperature coefficient of the module with a unit of A/C, a parameter available from the datasheet; and also recall $ R_\text{s} $ is a constant, $ R_\text{s} = R_\text{s, ref} $. The units here can be a bit confusing, and thus need to be clarified. Equation (66) follows Eq. (10.4) of Messenger and Abtahi (2004), in which the unit of temperature is K, which forces $ T_\text{cell} $ to take the same unit. The bandgap of a semiconductor has the unit of eV, which forces the Boltzmann’s constant to take the unit of eV K−1 instead of the more common J K−1. Interestingly, both Messenger and Abtahi (2004) and De Soto et al. (2006) explicitly stated the unit of Boltzmann’s constant as J K−1, which seems to be an oversight, but the $\rm{pvlib}$ implementation makes a correction.

      After the five parameters $ I_\text{L} $, $ I_0 $, $ a $, $ R_\text{s} $, and $ R_\text{sh} $ are estimated for the particular operating condition of concern, the corresponding $ I $$ V $ curve may be traced out using Eq. (2) or (3) of Jain and Kapoor (2004), who leveraged the Lambert W function in expressing the analytical solution to the one-diode model:

      To conclude the procedure outlined thus far, Fig. 16 demonstrates four $ I $$ V $ curves corresponding to a JA Solar JAM72S20-460/MR module, under four operating conditions, with $ G'_c $ ranging from 400–1200 W m−2 and $ T_\text{cell} $ ranging from 15°C–45°C. Because the module is not listed by the CEC module library, the "$\rm{fit\_desoto}$" function in $\rm{pvlib}$ is used to estimate $ I_\text{L, ref} $, $ I_\text{0, ref} $, $ a_\text{ref} $, $ R_\text{s, ref} $, and $ R_\text{sh, ref} $. Following that, the five parameters of the one-diode model under various operating conditions are acquired via the "${{\mathrm{calcparams}}\_{\mathrm{desoto}}}$" function. Then, the "$\rm{singlediode}$" function, which implements Jain and Kapoor (2004) and a few other alternatives, is used to compute the $ I $$ V $ curve under each operating condition. Finally, from the $ I $$ V $ curves, the $ P $$ V $ curves and MPPs follow, as marked by the black dots in Fig. 16.

      Figure 16.  (a) The $ I $$ V $ curves and (b) the corresponding $ P $$ V $ curves of a JA Solar JAM72S20-460/MR module, under various operating conditions. The maximum power points are marked with dots. Three $\rm{pvlib}$ functions are used: "$\rm{fit\_desoto}$" estimates the five parameters of the one-diode model at STC according to the electrical parameters given in the datasheet, $\rm{calcparams\_desoto}$" estimates the five parameters for various operating conditions, and "$\rm{singlediode}$" retrieves the $ I $$ V $ curves.

      The review of the one-diode model has been focusing on its application for just a single module. Since the PV system consists of many modules connected in series or parallel, scaling of the MPP current and voltage is needed. For series-connected modules, the voltage is additive, whereas the amperage remains unchanged. In contrast, for parallel-connected modules, the current is additive, whereas the voltage remains unchanged. Ideally, the series–parallel scaling should be performed for each MPPT, but this is only possible if the inverter information and MPPT connection configuration are fully known.

    • PV systems that operate under outdoor conditions inevitably suffer from various loss mechanisms. Shading is more often than not a significant loss factor for ground-mounted PV systems, due to limited fleet spacing, and the loss amount is governed by geometry. Soiling may or may not be a significant loss factor, for it depends on the cleaning schedule, the rate at which dust accumulates, and the frequency and severity of rain events. For PV plants installed in cold climates, losses due to snow would be relevant. On the DC side, the equivalent resistance of the wires and the operating current can be used to calculate the ohmic loss in units of power. The inverter loss is a portion of the energy lost during the conversion of DC power to AC power due to potential power clipping and conversion efficiency. On the AC side, the cable that connects an inverter to a transformer also has a resistive component, which results in AC cable losses. In many MW-scale PV farms, the transformer, which acts as an external device between the PV system and the medium-voltage or even high-voltage grid, introduces two main losses just before power is injected into the grid. These losses are the copper loss in the primary and secondary windings and the iron loss caused by hysteresis and eddy currents in the transformer core. The degradation of the systems, which can be categorized into numerous types and to which numerous factors contribute, further complicates the situation. Dobos (2014) gave a comprehensive list of major loss mechanisms as well as their typical values; they are: 2% soiling loss, 3% shading loss, 2% mismatch loss, 2% wiring loss, 0.5% connection loss, 1.5% light-induced degradation, 1% nameplate loss, and 3% availability loss.

      Even though not all loss factors apply to all PV systems—for example, row-to-row shading does not affect PV panels tiled on the roof of a residential house, or low-to-medium voltage transformers collocated among the PV rows make AC cable loss negligible—accurate modeling of the eventual AC power injecting into the grid cannot be dismissed as trivial, even if DC power is known in high confidence. The scientific and engineering principles governing the different loss factors have been studied in the past, but the validity of those conclusions is frequently only limited to the specific experimental setup used by the researchers, making them context-specific rather than generally applicable. Even though some principles such as the geometrical calculations for beam shading loss are in fact general, their applicability is always constrained to some degree, depending on whether or to what extent the modeler has access to information regarding the row spacing, the slope of the mounting surface, or the slant height. For this reason, there does not seem to be much motivation to enumerate who did what and under what boundary conditions. Therefore, only the fundamental ideas underlying the various loss factors are discussed in what follows.

    • Shading loss is a problem concerning geometry (Appelbaum and Bany, 1979), which implies that if the 3D layout of a PV system and its nearby structures is known, at least the beam shading can be calculated to an exactitude. Shading due to diffuse radiation is by nature more challenging, for it involves integration. Diffuse shading (also known as diffuse masking) calculation relies upon the choice of transposition model as well as the accuracy to which the sky-view factor can be determined [e.g., see Maor and Appelbaum (2012); Appelbaum et al. (2019); Varga and Mayer (2021)]. Because of the series connection of cells or modules, shading affects not just the shaded part itself but the whole module or string. Installing bypass diodes, on this point, has been a standard way to mitigate shading loss. Consequently, depending on how bypass diodes are connected, modules installed in portrait and landscape orientation may have a very different response to shading. In any case, owing partly to the complexity of calculation, and more to the unavailability of the 3D layout and the lack of information on the efficacy of bypass diodes under the arrangement of concern, shading loss estimation often relies on various assumptions and simplifications on geometry [e.g., see section 9.1 of Gilman et al. (2018)], or on many occasions, reduces simply to a constant percentage, so as to be subtracted from the PV DC power output.

      Various commercial and research software tools have done well in shading analysis. Users of such tools are trained engineers, who are able to portray a 3D drawing of the PV system of interest using predefined 3D shapes, representing trees, chimneys, and other possible structures that can cause shading (Gilman et al., 2018). The calculation for shading loss is a two-part procedure, dealing with beam and diffuse radiation separately. Coordinate transformation underpins the calculation procedure for beam shading, in that, the entire 3D scene is rotated to align with the sun-ray and then “flattened” to a 2D set of polygons with a back-to-front order. With a 2D polygon clipping algorithm, the shade fraction can be converted into a loss percentage. As for diffuse shading, the procedure is to grid the hemispherical sky into small elements, each acting as a light source, so that the procedure for beam shading can be executed repeatedly. Through integration, the sky-view factor and thus diffuse shading estimates result. It should be clear that the above procedure needs to be conducted for each time instance. Because the procedure has been known for decades, nothing is too difficult, except for the fact that the information needed for the 3D construction of scenes is often proprietary, and those who possess such information are rarely interested or skilled enough to proceed with shading calculation. In any case, since the row-to-row shading is the dictating mechanism for large PV plants, the remaining discussion should focus just on that. The row-to-row diffuse shading is discussed first, followed by row-to-row beam shading.

      A schematic featuring the geometry for diffuse self-shading is shown in Fig. 17. Two adjacent rows of PV arrays are represented by the two thick tilted lines, whereas the arrows pointing inward to the second row of panels represent the omnidirectional diffuse radiation. Following simple geometry, the obscuring angle ($ \psi $), also known as the masking angle, as a function of the slant height $ l $, i.e., the distance from the bottom of the back-row panel to an arbitrary position along that panel, is given by:

      Figure 17.  Illustration of diffuse self-shading.

      The dashed arrows in Fig. 17 denote the fraction of the incoming diffuse radiation that is not seen by that arbitrary point. As $ l $ changes from the top to the bottom of the back-row panel, this fraction increases in gradation. Assuming that the diffuse transposition factor is isotropic, the diffuse irradiance masking can be taken into account by modifying the isotropic diffuse transposition factor, i.e., $ R_d^\text{ISO} = [1+\cos (S)]/2 $, as

      The above equation can be applied to all points of the modules to calculate how the diffuse irradiance is distributed along the height of the modules (Varga and Mayer, 2021). However, in most cases, the average of the diffuse shading over the whole module plane is of interest. To calculate this, a common simplification is to introduce an average obscuring angle, by integrating $ \psi(l) $ along $ 0 $ to $ L $, of which the analytical solution 9 is available as

      where $ K = D/L $ is the relative row spacing. Another possible simplification is to just assume the worst case, as SAM does, in which the highest portion of diffuse radiation is obstructed by the front-row panel, and the obscuring angle becomes

      The rationale behind using the highest obscuring angle is that it results in the lowest remaining diffuse irradiance, and thus the lowest current in the PV cells. Due to their series connection, the cell with the lowest current can limit the current of the whole string, further reducing the overall power output. However, as simulated by Varga and Mayer (2021), the diffuse masking causes only a small difference in the absorbed irradiance, which translates to small current differences, which are well compensated by small changes in the voltage without a significant power loss resulting from the mismatch of the modules (see the flat peak of the $ P $$ V $ curves in Fig. 16).

      A theoretically more accurate method to calculate the average diffuse shading is based on the discovery that the isotropic transposition factor is actually the view factor between the modules and the sky (Maor and Appelbaum, 2012). Therefore, the reduced view factor that also considers the masking of the adjacent rows can also be calculated by Hottel’s crossed string method [see pg. 31–37 of Hottel and Sarofin (1967)] as

      This reduced transposition factor is an average of the whole module surface. To include diffuse self-shading in a model chain, $ G'_c $ obtained from Eq. (36) should subtract the amount

      to account for diffuse self-shading. Since the first row of panels is not shaded, a derating term $ (N_\text{row}-1)/N_\text{row} $ is devised. The average masking loss factor changes with both the surface tilt $ S $ and the ground coverage ratio (GCR), which is defined to be the ratio of $ L $ to $ D $, thus, $ K = 1/ $GCR. The relative diffuse irradiance (the ratio between DTI after accounting for shading and DHI), that is $ R_d^\text{ISO, mask} $ as calculated by Eq. (75), is plotted in Fig. 18b.

      Figure 18.  Average masking angle (a) and relative diffuse irradiance (b) as a function of tilt angle for various ground coverage ratios.

      The modeling for beam shading is more straightforward, for it depends upon the shadow dimensions, which are related just to solar positions. Over the course of a day, the zenith and azimuth angles change, with which the shadows the front rows cast onto the back rows morph in two directions. Graphically, the two directions are represented by $ l $- and $ w $-directions as shown in the zoomed inset of Fig. 19. If the row is sufficiently long, which is usually the case, the changes in $ w $-direction can be neglected, and the problem reduces to calculating the shadow height. Let the shadow height be denoted by $ L_\text{shadow} $, its ratio with respect to $ L $, i.e., the relative shaded area of the array, has been expressed in Eqs. (9.17) and (9.19) of Gilman et al. (2018), which, under the present notation convention, is

      Figure 19.  Illustration of beam self-shading.

      where $ \varphi_S $ and $ \varphi_C $ are the azimuth angles of the sun and the array, and $ \alpha = \pi/2-Z $ is the elevation angle; these have already been defined in section 4.1. At this stage, two options are available for estimating the BTI under self-shading, of which the first is a nonlinear option, and the other is linear. The nonlinear option considers the actual design of the bypass diodes, and determines which diode is activated and which is not, according to whether the modules are posited in portrait and landscape—see section 9.5 of Gilman et al. (2018) or section 3.2 of Mayer and Gróf (2020), which also includes a graphical explanation. The linear option is simpler, in that, one may assume the reduction in BTI to be proportional to $ L_{\text{shadow}}/L $; however, it must be known that this approach always underestimates the shading losses. Put differently, if one is to include beam self-shading in a model chain, $ G'_c $ obtained from Eq. (36) should subtract the amount

      to account for beam self-shading.

    • Solar panels are installed in outdoor environments, their performance is thus affected by various pollutants, such as dust, sediment, or bird excrement. The term “soiling” generally describes the buildup of pollutants on solar panels, which reduces the amount of effective light that reaches the solar cells and, as a result, the effectiveness of the panel and the system. In the past, solar engineers were mainly concerned with converting the effects of soiling into a proportion of the overall energy yield of a PV system, so that they could be taken into account when designing, simulating, and evaluating the performance of the system. To put it another way, up until now, soiling studies have followed the same time scale concerning resource assessments (Mejia and Kleissl, 2013). When forecasting is the application of concern, the interactions between short-term variations in meteorological conditions and soiling become prominent, for a rain event can undo the power losses caused by soiling and thereby boost the power output by a relatively sizable margin. It is best to take into account the PV plant’s cleaning schedule when forecasting because it has an impact on soiling as well. Regardless, a sizable amount of research has been carried out for soiling [see Conceição et al. (2022), for a review]. Again it is important to note that the results from one scenario can rarely be transferred to another, due to the various experimental designs and ambient conditions.

      As previously stated, the severity of soiling is primarily influenced by three variables: (1) the buildup of dust, which is further influenced by the exposure time (i.e., the interval between successive rain events), as well as other environmental factors at the PV system installation location; (2) the effectiveness of rain events in removing dust, which is further influenced by the intensity of rain over a period; and (3) the active cleaning schedule. Theoretically, one could incorporate the first two variables into the soiling modeling process by using NWP, particularly those atmospheric composition models that can reasonably predict episodes of precipitation and dust storms of varying severity. However, a reliable soiling forecasting model should also take into account local anthropogenic particle sources, which are much more difficult to collect and standardize information about. Examples of these sources include farming, industrial facilities, airports, and major roads (Conceição et al., 2022). Additionally, one should be aware that cleaning is entirely influenced by soiling economics and human factors and varies from case to case. In order to increase optical efficiency while reducing the number of cleanings, numerous studies have inquired into cleaning optimization in accordance with local dust accumulation behaviors [e.g., Abdeen et al. (2017); You et al. (2018); Micheli et al. (2020); Ullah et al. (2020)]. At this time and for the foreseeable future, it would likely not be possible to incorporate this knowledge into forecasting because these optimizations call for an excessive number of location-dependent and largely unknown parameters.

      Although it is difficult to fully incorporate the effects of soiling for PV power forecasting, this problem can be mitigated to a large extent if in situ soiling measurements are available. In fact, just as installing in-plane reference cells, thermocouples, and anemometers for GTI, cell temperature, and wind speed measurements became the standard for large PV systems a long time ago, one can expect soiling sensors to follow suit in the near future. Commercial soiling sensors have emerged, including DustIQ from Kipp & Zonen and MARS from Atonometrics. Low-cost soiling sensors have also recently advanced, according to the reports of Hussain et al. (2021) and Valerino et al. (2020). Because hardware is no longer a barrier preventing better soiling modeling, the remaining difficulty is, in the main, public acceptance. With the exception of severe weather events like dust storms or torrential rain, the daily changes in soiling rate only account for a tiny portion of the overall variations in the PV power the following day. To that end, it is hypothesized that forecasts of the soiling ratio, a dimensionless parameter used to quantify the soiling impact, can be obtained by combining the immediate past soiling ratio measurements and the forecast impact of severe weather.

    • It is preferable to use DC cable loss, also referred to as DC wiring loss, in conjunction with the physical DC model covered in section 4.6.2. Between the power produced by the modules and the power that reaches the terminals of the PV array, losses are caused by the ohmic resistance of the wiring circuit. The resistance $ R_\text{dc, wire} $, which is the equivalent resistance of the wires as seen from the MPPT in relation to the array, is an important parameter in this context. Recalling that the one-diode model follows the $ I $$ V $ curve of a single module, and that this curve must be scaled to fit the series and parallel configuration at the MPPT, and supposing the number of parallel strings is denoted by $ N_p $ and that of series-connected modules by $ N_s $, then the DC current and voltage are:


      which, when multiplied, gives

      where $ I_\text{mpp} $, $ V_\text{mpp} $, and $ P_\text{mpp} $ result from the one-diode model. In other words, the second term of Eq. (81) quantifies the ohmic loss in DC cables, which has a quadratic relationship with the MPP current. Needless to say, the above calculation needs to be repeated for each MPPT.

      In calculating the equivalent resistance $ R_\text{dc, wire} $, there are two ways: either from the length in meters and resistivity of the cables in ohms per meter, or from a percentage ohmic loss ($ l_\text{dc, wire} $), that is,

      In particular, when empirical DC models, such as those exemplified in section 4.6.1, are used, $ I_\text{mpp} $ and $ V_\text{mpp} $ may or may not result from the modeling process. If $ I_\text{mpp} $ and $ V_\text{mpp} $ are not output by the empirical DC model, the only option for the modeler is to assume, based on experience, a percentage to represent the ohmic loss. Regarding the importance of implementing DC cable loss modeling for real PV systems, it is important to be aware of the industrial practices that string inverters are typically installed close to the string, implying a negligible DC cable loss but a non-negligible AC cable loss, whereas the situation for central inverters is reversed, in that they are placed closer to the transformer, implying a non-negligible DC loss but a negligible AC loss (Cabrera-Tobar et al., 2016).

    • The power loss due to the inverter results from two main mechanisms. The first is linked to the DC–AC inversion efficiency. Insofar as the power electronics within the inverter operates, a proportion of the power dissipates as heat, and another part is consumed as stand-by power for keeping the inverter in powered mode. Describing the conversion efficiency of an inverter is an efficiency curve, which is a function of the load-to-nominal ratio and the input voltage of the inverter. Worth noting is that the inverter only activates when the DC input voltage is higher than the inverter’s start-up voltage, implying a minimum input power (i.e., a small portion of the nominal power) below which the inverter is not activated. The second mechanism that causes power loss is inverter clipping, which refers to the truncation of power when the maximum input power rating of an inverter is exceeded by the MPP power from a PV array. The inverter typically switches to fold-back mode in this situation, forcing the PV array to operate at a voltage higher than the MPP voltage and reducing the current and consequently power (Chen et al., 2013); cf. the $ I $$ V $ curves in Fig. 16. Due to these two differing loss mechanisms, inverter loss is often modeled in two stages, one consisting of estimating the efficiency under a given operating condition, and the other on checking whether the minimum and maximum power limits are exceeded.

      In what follows, we consider inverter loss with respect to the AC model of King et al. (2004), who advocated to represent the AC power output of the inverter as

      in which the intermediate parameters $ A $, $ B $, and $ C $ are given by

      where $ V_\text{dc, inv} $ and $ P_\text{dc, inv} $ are the input voltage and power received by the inverter; $ P_\text{ac, inv, ref} $ is the AC power rating of the inverter; $ V_\text{dc, inv, ref} $ and $ P_\text{dc, inv, ref} $ are the DC voltage and power with which $ P_\text{ac, inv, ref} $ is achieved; $ P_\text{s, inv, ref} $ is the DC power required to start the inversion process; and $ C_0, \dots, C_3 $ are empirical coefficients that describe the intrinsic properties of an inverter. Besides $ V_\text{dc, inv} $ and $ P_\text{dc, inv} $, which are to be acquired from the preceding steps of the model chain, the remaining parameters are usually found from an inverter database—recall the CEC module library in section 4.6.2, a similar database is available for commercial inverters. Alternatively, the datasheet of most inverters includes the inverter efficiency curves as a function of the load for three voltage levels in a graphical (and sometimes even tabular) form, which one can use to fit the model parameters if they are not directly available from a database. For instance, the Huawei SUN2000-100KTL-USH0 inverter has $ P_\text{ac, inv, ref} = 100 $ kW, $ V_\text{dc, inv, ref} = 1120 $ V, $ P_\text{dc, inv, ref} = 101.45 $ kW, $ P_\text{s, inv, ref} = 181.58 $ W, $ C_0=0 $, $ C_1=-1.1\times10^{-5} $, $ C_2 = 3\times 10^{-5} $, and $ C_3 = 1.63\times10^{-3} $. With Eq. (83), the inverter efficiency at some specific $ V_\text{dc, inv} $ and $ P_\text{dc, inv} $ values is given as

      When $ P_\text{dc, inv} $ is smaller than $ P_\text{nt, inv} $, which is the AC power consumed by the inverter at night, i.e., night tare, King et al. (2004) assigns $ P_\text{ac, inv}^\text{King} = -P_\text{nt, inv} $, where the negative sign indicates that power is drawn from the grid. When $ P_\text{dc, inv} $ is greater than $ P_\text{ac, inv, ref} $, the model assumes $P_\text{ac, inv}^\text{King} = $ $ P_\text{ac, inv, ref} $. Figure 20 presents the efficiency curves of the Huawei SUN2000-100KTL-USH0 inverter under several different $ V_\text{dc, inv} $ values. Due to modeling errors, there are some differences between the curves shown in this figure and those in the datasheet of the inverter.

      Figure 20.  (top) The efficiency curves of the Huawei SUN2000-100KTL-USH0 inverter under three fixed $ V_\text{dc, inv} $ values each with varying $ P_\text{dc, inv} $, modeled using the AC model of King et al. (2004). (bottom) Zoomed view over the region $ 5\% < \eta_\text{inv}^\text{King} < 100\% $.

      The above model is undoubtedly a simplified one because the actual working of an inverter is much more intricate. The fold-back mode, in particular, is only one component of the overall inverter protection system. Other measures include cooling systems and inverter protection delays, which refer to the time period during which some energy lost as a result of excessive radiation can be recycled. The King model assumes that every watt exceeding $ P_\text{ac, inv, ref} $ is truncated; this is also the case in many other conventional approaches to inverter modeling of inverters [e.g., Burger and Rüther (2006)]. In reality, however, the inverter is able to operate in over-irradiance situations for a brief period of several minutes. Invoking protection delay generates excessive heat, which harms the inverter, and hence it is of interest to invest in a cooling system to prolong the delay. Most inverters in the kW range including the Huawei SUN2000-100KTL adopt natural convection cooling via finned metal housings. Some may choose to include additionally a fan as a supportive cooling method which is only turned on if overloading is severe. This, however might lower the overall reliability of the inverter, and thus the inverter controller must have the ability to disable the protection delay when the cooling system fails. On the other hand, if no active cooling is installed, the maximum output power of the inverter can even decrease if the ambient temperature is too high, which is called the derating of the inverter. In this regard, mimicking the actual operation of an inverter in a model chain is a tedious task, although the engineering principles have long been investigated.

      Inaccuracies in inverter modeling originate not just from operational issues, but also from design issues. There are climatological and meteorological factors, such as the local irradiance and temperature statistical regime, which impact the DC output of the PV system; there are factors related to energy economics, such as feed-in tariff or electricity pricing policies, which determine the return on investment; and there are those aforementioned intrinsic inverter properties, such as efficiency curves or overload protection schemes, which define the hardware constraints (Chen et al., 2013). Whereas all these practical factors may influence the design of an inverter, when they are not considered, two common sizing strategies are available. The first strategy matches the inverter size to the nominal DC output—e.g., a 100-kWp system is sized with a 100-kW inverter, which is the case in Fig. 14. The other strategy is to use an inverter with 30% smaller capacity than the nominal DC output—e.g., a 100-kWp system is sized with a 70-kW inverter. This latter strategy is motivated by the fact that the PV system rarely generates power close to the nominal DC capacity, such that sizing a smaller inverter is more economical. In contrast, if PV systems are employed for reactive power control in distribution systems, then a larger inverter size is desirable as it reduces the need to curtail real PV power during times of overvoltages (Cañadillas et al., 2021). Clearly then, nor is there a definitive orientation towards what the optimal sizing strategy should be, since the practical factors are specific to each installation (Macêdo and Zilles, 2007). The reader is referred to Toreti Scarabelot et al. (2021), Luoma et al. (2012), Notton et al. (2010) and Mondol et al. (2006) for more engineering considerations on inverter sizing. An important note to highlight at this stage is that inverter sizing and operation strategy, just like other PV system design parameters, is rarely available and thus is hardly conducive to model chain construction.

    • Conductor loss is the main reason explaining the attenuation of electric power transmitted by cables. In comparison to the DC conductor loss, the severity of the AC loss is higher as a result of the skin effect and the proximity effect. Faraday’s law of electromagnetic induction suggests that when AC power is transmitted by a cable, an alternating magnetic field stimulated by the current induces an electromotive force opposite to the driving force. The counter electromotive force is strongest at the center of the conductor and diminishes radially outwards, and hence propels electrons towards the outer part of the conductor. Consequently, the current density is highest at the conductor surface and reduces in magnitude moving deeper into the conductor, which is known as the skin effect. The skin effect may be quantified by the skin depth, which is the depth at which the current density declines to 1/e (about 0.368) of its value near the surface. At low frequencies, the general formula for the skin depth ($ \delta $) is

      where $ \mu $ and $ \sigma $ are the permeability and conductivity of the conductor, whereas $ f $ is the frequency.

      Since the current can only flow between the surface and several skin depths below that, the same conductor would have a greater apparent AC resistance than it would have under DC conditions. For cylindrical conductors, the AC resistance is:

      where $ l_\text{wire} $ and $ r_\text{wire} $ are the length and the radius of the cylindrical conductor, respectively. The AC ohmic loss can then be calculated with Joule's law as $ I_\text{rms}^2R_\text{ac, wire} $, where $ I_\text{rms} $ is the root mean square (RMS) amplitude of the transmitted current. In practice, to acquire the power entering the transformer, the AC power coming out of the inverter needs to be subtracted by the AC loss:

      where $ V_\text{rms} $ is the RMS value of the line voltage at the low-voltage side of the transformer.

      Besides the skin effect, the current density distribution of one conductor is also impacted by other current-carrying conductors nearby, which is known as the proximity effect. Although the proximity effect is also caused by electromagnetic induction, it differs from the skin effect in that the proximity effect results from the mutual induction between insulated conductors rather than self-induction. However, the proximity effect enhances the AC resistance of a conductor as well as its thermal loss by preventing current distribution from being even over the cross-section. The quantification of proximity effects must be performed by considering the distance between, as well as the cross-sectional area of, the acting and acted conductors: The nearer the conductors are placed and the larger the cross-sections are, the more prominent the proximity effect. On this point, if one is to investigate the quantification of the proximity effect on AC resistance, the usual strategy is to engage the finite element method, which is not only a very specialized skill, but also time-consuming. In this regard, it is possible to estimate the AC resistance based on the DC resistance, as exemplified in the work of Hafez et al. (2014); this approach is suitable for medium-voltage grid inter-connection, which is typical for MW-scale PV plants.

    • The final part of a grid-tied PV system, the transformer, increases the inverter output's AC voltage to the grid voltage. For the classification of low, medium, and high grid voltage ratings, there are numerous international and national standards. The IEEE 1547 Standard for Interconnection and Interoperability of Distributed Energy Resources with Associated Electric Power Systems Interfaces (IEEE Std 1547-2018) is frequently regarded as one of the founding documents for solar energy systems (Narang et al., 2021). Low voltage is defined in IEEE Std 1547-2018 as a class of nominal voltages below 1 kV; medium voltage is defined as ranges between 1 kV and 35 kV; and high voltage is defined as voltages above that.

      A topology of a PV system describes how its component parts are interconnected and related to one another. Radial, ring, and star topologies can be used to describe AC collection grids (Cabrera-Tobar et al., 2016); these three arrangements are also suitable for wind power plants (De Prada Gil et al., 2015). Several transformers are connected in series by a radial collection grid. The radial configuration has low reliability despite being cost-effective because losing one transformer renders the entire collection grid inoperable. A ring collection grid can increase reliability because it joins the open ends of the serially connected transformers, closing the circuit, so that even if one transformer is lost, the grid can continue to function as a radial one. Compared to the other two configurations, the star collection grid has the highest level of reliability because all transformers are connected to the same medium voltage point.

      Similar to the process of sizing an inverter, the process of sizing a transformer generally involves striking a balance between the conservative approach of under-sizing and the economical approach of sizing in accordance with the rated power of the plant. In the former situation, the transformer frequently operates at a lower efficiency due to the operational ambient conditions not always matching the STC, which may cause significant oscillations in the power injected into the grid (Testa et al., 2012). In the latter scenario, a transformer that is too small acts as a power export bottleneck, wasting energy. A transformer sizing strategy based on the loss of produced power probability (LPPP) index was proposed by Testa et al. (2012). This index calculates the likelihood that the transformer will be unable to deliver all of the power that enters the transformer to the grid due to overloads or power losses in the transformer. Conceptually, the LPPP ought to be reduced, and is dependent on the load profile, the availability of solar resources, and the presence of energy storage devices (Testa et al., 2012).

      Transformer power losses are brought on by two different mechanisms. One of them is referred to as the core loss, which is an umbrella term for the various losses that happen in the transformer when there is no load, including dielectric loss, stray eddy current loss, hysteresis loss, and eddy current loss. Iron being the primary component of the transformer, core loss is also referred to as iron loss ($ P_\text{Fe} $). Insofar as the transformer is excited, core loss depends on the voltage level and operating frequency and can be taken for granted to be constant. The heat generated by the currents in the transformer windings is the source of the other type of loss in transformers, namely, the copper loss ($ P_\text{Cu} $). The loading of the transformer determines the copper loss, and is therefore only relevant during operation.

      In the case of oil-cooled (oil-immersed) transformers, both the iron and copper losses vary quasi-linearly with the nominal transformer rating $ P_\text{trans} $, which has units of kVA:

      of which the coefficients $ \beta_0 $, $ \beta_1 $, and $ \beta_2 $ for oil-cooled transformers, with a nominal transformer rating ranging from 50 to 2500 kVA, with the highest voltage for equipment not exceeding 36 kV, have been tabulated by Malamaki and Demoulias (2014). For cast-resin (dry-type) transformers, the iron loss still varies quasi-linearly with $ P_\text{trans, ref} $, but the copper loss is quadratic with $ P_\text{trans, ref} $, i.e.,

      of which the coefficients $ \beta'_0, \dots, \beta'_3 $ for cast-resin transformers, with a nominal transformer rating ranging from 100 to 3150 kVA, with the highest voltage for equipment not exceeding 36 kV, have again been tabulated by Malamaki and Demoulias (2014).

      Suppose there are $ N_\text{trans} $ identical transformers connected in parallel, the total transformer loss under an input power $ P_\text{trans} $—recall Eq. (90)—is given as the sum of iron and copper losses, which needs to be subtracted from $ P_\text{trans} $ during model chain evaluation, and the final power injected into the grid is:

      Alternatively, when information for identifying $ P_\text{Fe} $ and $ P_\text{Cu} $ is unavailable, one may account for the transformer loss via a lump medium-voltage loss factor at nominal power, which is herein denoted as $ l_\text{trans} $. The power injection into the grid in this case may be written as:

      As a rule of thumb, transformers consume approximately 1.5% of the nominal energy output by the PV plant, i.e., $ l_\text{trans}\approx1.5\% $.

    5.   Hybrid solar power curves
    • Cascading two or more techniques in order to enhance the overall modeling performance is a well-accepted strategy in many fields—this strategy is known as hybrid modeling. In the case of solar power curve modeling, the drawback of regression methods is from a practical aspect their reliance on historical data, and from a theoretical aspect, that they do not incorporate all theoretical knowledge that has been collected in the domains of the atmospheric sciences and solar engineering. On the other hand, most stages of model chains are actually partly empirical in nature, since their parameters are determined by theoretical inference or learning from data. Consequently, even the most detailed model chains must rely on several simplifications that compromise their accuracy, especially when the available design data are incomplete. Moreover, model chains, or solar power curves in general, are nonlinear, which suggests that the initial errors of the input dataset propagate differently in different irradiance domains.

      The mean square error (MSE), or its equivalent, root mean square error (RMSE), is the most commonly used error metric for deterministic solar predictions or estimates 10 (Blaga et al., 2019; Yang et al., 2020). Therefore, various discussions have been populated in regard to optimizing predictions in terms of MSE. However, MSE-optimized GHI predictions are always under-dispersed, which means that they have a positive conditional bias in the low, and a negative condition bias in the high, irradiance domain (Mayer and Yang, 2023a). Due to their nonlinearity and the errors of the input GHI predictions, model chains are likely to introduce bias even if their inputs are unbiased. Moreover, the introduced bias also depends on the design parameters, such as the tilt angle, inverter sizing factor, and row spacing; thus, it is different for all PV plants (Mayer, 2022b). Therefore, model chains are expected to benefit from a data-driven correction method. It is worth noting that a combination of two or more regression models is also a form of hybridization, but such purely statistical hybridization approaches have less added value than combining the physical and statistical methods (Yang and Dong, 2018). Therefore, this section exclusively focuses on such methods where model chains and regression methods are used together.

      Insofar as the performance of a model chain is to be improved, hybridization is not the only option. One “brute force” approach is simply testing a large collection of model chains, and selecting the one with the highest performance based on historical data. This brute force approach was first introduced by Mayer and Gróf (2021), who gathered a pool of component models, which, when exhaustively combined, resulted in a total of 32 400 different model chains. By testing those model chains on the forecasting of 16 PV plants in Hungary, the results revealed a difference of up to 13% in mean absolute error (MAE) and 12% in RMSE between the most and the least accurate sets of forecasts on the average of all locations. The drawback of this enumeration–selection approach for model chain optimization is that it requires the implementation of not only one set of component models to create a single model chain, but several different models in each stage to be able to construct a large number of model chains, which calls for an even higher level of domain knowledge. As an alternative, it is easier to use only a single model chain and correct it with a regression model, which is the true manifestation of a physical–statistical hybrid solar power curve.

      Hybridization is conceptually similar to post-processing, as both procedures depend on using regression to modify predictions. Generally, if the regression is applied to a set of PV power predictions, the procedure is post-processing, whereas if the regression consolidates different partial results of a model chain as predictors, then it is more of a hybrid solar power curve. One of the earliest attempts at making hybrid solar power curves was presented by Ogliari et al. (2017), who proposed a so-called physical hybrid artificial neural network (PHANN), in which the physical part was no more than a clear-sky irradiance predictor, whereas the statistical part was a neural network. In today’s viewpoint, PHANN, in which no model chain is explicitly involved, is more likely to be classified into the regression category of solar power curve modeling, for clear-sky irradiance is but one basic feature as recommended in section 3.1. In state-of-the-art hybrid solar power curve modeling, it is typical to involve several early stages of a model chain (e.g., solar positioning, separation, or transposition), and the goal is to exploit the advantages of physical modeling as much as possible, insofar as the design information can support, and then integrating the results using regression.

      Naturally, two questions arise: (1) If several model chain stages are used, each yielding some output variables, should all these intermediate variables be used as predictors; and (2) which model chain stages contribute most to the accuracy of hybrid solar power curves? The first study aiming to provide a detailed answer to these questions was presented by Mayer (2022a). In that paper, the author compared physical model chains, regression-based solar power curves in the form of multilayer perceptions (MLPs), and their hybridization, for PV power forecasting of 14 PV plants in Hungary based on NWP irradiance, temperature, and wind speed forecasts. In addition to the reference case where only an optimized model chain is used, 12 further cases involving regression were defined with an increasing number of predictor variables. The possible predictors include:

      1. NWP outputs, including GHI, the 2-m ambient temperature, and the 10-m wind speed forecasts.

      2. Solar zenith, azimuth, and declination angles, as calculated via solar positioning.

      3. Clear-sky index and/or clear-sky irradiance.

      4. Beam horizontal irradiance and DHI, as decomposed from the GHI by a separation model.

      5. GTI, as derived by a transposition model.

      6. Cell temperature and MPP module power, as calculated by temperature and DC power models.

      7. Power fed into the grid by the PV plant, which is the ultimate output of the model chain.

      The above list is not exhaustive, and further predictors can also be defined from the intermediate results of the different stages of a model chain. The results of Mayer (2022a) revealed that using all predictors mentioned above leads to the most accurate forecasts. A more recent analysis of the predictor importance in hybrid power curve modeling was presented by Visser et al. (2023). In that paper, an even wider range of intermediate model chain outputs were used as predictors, e.g., not only the GTI, but also its beam, diffuse, and ground-reflected components, recall Eq. (13). The results support the earlier finding that including more predictors improves forecast accuracy. However, the added error reduction by involving a further predictor diminishes to an insignificant level after the 8–10 most important predictors.

      The benefit of including intermediate results besides the final PV power output is also easy to explain theoretically: If the regression model can condition its output on the intermediate results, it enables the hybridization to indirectly correct the inaccuracy in the individual stages of the model chain instead of only correcting the overall errors of the model chain as a whole. Including too many predictors, on the other hand, can make the model more prone to overfitting, and the less important predictors can obscure the effect of the more important ones. In summary, the response of the first question is that using intermediate model chain outputs are predictors is beneficial, but one should pay attention to include the most relevant variables only, selected either based on domain knowledge or a data-driven predictor importance analysis. The exact number and range of the useful predictors depend on many factors, e.g., the complexity of the regression model and the length of training data.

      The importance of the different stages of a model chain was first assessed by Mayer and Gróf (2021), and the ranking was based on how the component model selection in a given stage affects the overall accuracy of the model chain. In this respect, the most critical step is transposition modeling, closely followed by separation modeling. This finding gave the motivation to Mayer (2022a) to compare two different kinds of hybrid models, one of which involves a complete model chain (including all seven predictor groups listed above), whereas the other restricts the physical modeling only up to the the calculation of the GTI (including only the predictors from the first five points of the above list). The results showed that the accuracy of these two hybrid models is similar in PV power forecasting. A clear benefit of the second approach, however, is its simplicity, as it only requires a partial and shorter model chain including solar position, separation, transposition (and optionally reflection) models, and it does not require any design information aside from the tilt and azimuth angles of the PV modules, which are known most of the time. The answer to the second question is therefore this: The second half of model chains (beyond the calculation of GTI) can be well substituted by machine learning, and therefore, if there are enough historical data for hybrid modeling, one can rely on a shorter model chain that calculates only GTI, without significantly compromising the prediction accuracy of the solar power curve.

      Another practical question related to hybrid modeling is the error reduction potential over the use of regression or model chain alone. Since the accuracy of both individual approaches depends on many factors, this question can only be answered qualitatively. Extending a regression method with relevant physical information is almost always beneficial, therefore, hybrid modeling has an edge over regression in all but a few exceptional practical cases. However, the comparison of physical and hybrid modeling is not that clear. Mayer (2022a) showed that if the model chains are optimized, extending them with regression might only bring a clear improvement if at least two years of training data are available. However, as long as hybridization is used as an alternative to model chain optimization, it has a clear benefit even with just one year of training data. It not only reduces the errors, but also equalizes the accuracy of different model chains, and thus reduces the uncertainty coming from the use of a random non-optimized model chain. The error reduction also depends on the directive, which refers to the error metric that is intended to be optimized by the predictions. Hybrid solar power curves are able to achieve a higher degree of error reduction in MAE than in RMSE compared to the errors of the model chains (Mayer, 2022a).

      Finally, the applied regression method also affects the accuracy of the hybrid solar power curve. Visser et al. (2023) compared two machine learning methods, namely an MLP and a random forest (RF), paired with the same model chain and predictor variables, and discovered a noticeable error difference in favor of the RF model. Overall, the selection and hyperparameters of the regression model can offer added value comparable to that of the physical predictors, especially for such regression methods that can also describe deeper relationships such as the spatio-temporal correlations between multiple sites. Theoretically, almost any kind of regression method can be utilized in hybrid modeling, which, considering the large body of literature on regression methods with increasing complexity and accuracy, gives a broad perspective to further research on hybrid solar power curves.

      An outline of the general concept of state-of-the-art hybrid modeling (in a forecasting setting) is shown in Fig. 21. The left-hand column shows the training and optimization of the hybrid model, which is more complex, but only needs to be performed on a monthly or quarterly basis, if not longer. The right-hand column shows the process of creating the PV power forecasts using the trained models. The outline also includes the post-processing of the raw NWP irradiance forecasts, which can be done by a large variety of different methods (Yang and van der Meer, 2021). It is followed by a model chain optimization procedure, which generally improves the accuracy, but could also be omitted for simplicity (Mayer, 2022a). The last step of the process is the application of regression, via an MLP in this case, but it can be substituted by any other regression method, including the current hype—deep learning models.

      Figure 21.  Outline of a general hybrid PV power forecasting process, involving the post-processing of the raw NWP output, the optimal model chain selection, and creating the PV power forecasts based on the physical predictors by a machine-learning model (Mayer, 2022a).

      Overall, the essence of hybrid modeling is the combination of selecting a detailed and accurate model chain, an effective regression method, and most importantly, an optimal way to combine them. Hybrid solar power curves, in terms of their complexity, are still a largely unexplored area with only a few examples in the literature to date, which calls for extensive further research in this field.

    6.   Probabilistic solar power curves
    • The probabilistic representation of a prediction can take several forms, among which predictive density is most preferred as it covers all other forms including quantiles, prediction intervals, and samples that form an ensemble. The overarching motivation for using a probabilistic representation is to quantify the uncertainty associated with the prediction (Gneiting et al., 2007; Gneiting and Katzfuss, 2014). Despite many application scenarios still only demanding deterministic predictions, the best attempts to make informative predictions should perpetually be probabilistic. Weather forecasters have been at the frontier of making probabilistic predictions, particularly forecasts, ever since the early 1960s, when the chaotic nature of the weather was first discovered. On the other hand, statistics is the science of uncertainty, in that, probabilistic methods have also flourished in the field of statistics. In regard to probabilistic solar power curve modeling, several options are available, among which some have been discussed in section 3.3, where general statistical procedures to convert from deterministic predictions to, or to improve the quality of, probabilistic predictions are discussed. Nonetheless, this section should move beyond those statistical means of generating probabilistic solar power curves. More specifically, the following discussion deals exclusively with the probabilistic methods for solar power curve modeling that involve model chains.

      Although probabilistic modeling is a concept that has been well recognized and studied for a very long time, its realization in model chain applications is far more recent. The earliest attempts, which emerged just a few years ago, focused on particular stages of a model chain, such as transposition (Quan and Yang, 2020) or decomposition (Yang and Gueymard, 2020). Two distinct approaches of probabilistic modeling for energy meteorology models (i.e., the component models of a model chain) are possible. One of those is to modify an existing model by upgrading its construct, to integrate the notion of uncertainty into the modeling itself. For instance, the Perez model—recall section 4.3—is essentially a least-squares problem assuming homogeneous Gaussian errors (Perez et al., 1988; Yang et al., 2014), which by nature offers not just the predictive mean but also the standard error, which in turn allows probabilistic estimation of GTI (Quan and Yang, 2020). However, modifying the construct of an energy meteorology model is not always possible, the statistical derivations are often tedious, and the predictive distributions are usually confined to a presupposed parametric form. Therefore, the second approach for probabilistic modeling, namely, using ensemble, is much more amenable. The idea is very simple: One collects several models of the same class, and treats their outputs as member predictions. This was in fact the approach of Quan and Yang (2020) and Yang and Gueymard (2020), who applied the approach to transposition and decomposition modeling, respectively. Another benefit of ensemble-based probabilistic modeling of energy meteorology models is that such predictions allow further calibration, e.g., through P2P post-processing methods such as BMA or EMOS, which yield predictive densities of various types.

      If we are to apply ensemble modeling to every stage of a model chain—this is possible as numerous component models are available for each stage—an obvious difficulty is the sheer dimensionality. To give perspective, suppose we sample 100 deterministic predictions from a predictive distribution of a probabilistic separation model, each of these 100 predictions needs to be used as the input for a probabilistic transposition model, which would result in 100 different GHI predictive distributions, so on and so forth, and the dimensionality scales exponentially. On this point, a better approach is to use an ensemble of model chains; that is, each model chain is treated as an ensemble member. The concept of a model-chain ensemble is depicted in Fig. 22b, in which each blue path represents a possible member, whereas the red path denotes the best-guess option (i.e., the optimized model chain). This framework was first formally put forward by Mayer and Yang (2022), who presented a model-chain ensemble in a PV power forecasting setting.

      Figure 22.  Schematics of (a) ensemble NWP and (b) ensemble model chain, where each circle represents a component model. The red paths mark the “best-guess” predictions, whereas the blue paths exemplify the member trajectories (Mayer and Yang, 2023b).

      Using power measurements from eight ground-mounted PV plants and their corresponding deterministic day-ahead (24–48-h horizon) forecasts from the operational Application of Research to Operations at Mesoscale (AROME) model of the Hungarian Meteorological Services, Mayer and Yang (2022) tested five different probabilistic model chains, each differing from the others either in terms of the number of members or how the members are selected. Two strategies for selecting member model chains were easily thought of: (1) random selection and (2) ensuring each component model gets selected at least once—this is referred to as ACM, which stands for “all component models.” Furthermore, the authors also studied the effect on predictive performance when quantile regression as a post-processing tool is applied to the model-chain ensemble. The conclusions of that study were many. First, it found that the raw model-chain ensemble is under-dispersed, which echoes the necessity for calibration. Based on the empirical evidence, linear quantile regression with one year of fitting data seems sufficient, because the nonlinearity in PV power modeling has been effectively handled by the model chains. The calibration can also neutralize to a certain extent the subjectivity during the selection of ensemble members, making the number of members a less important factor. Last but not least, it has been concluded that model-chain ensembles are also beneficial even if only deterministic predictions are needed. This is because eliciting deterministic PV power forecasts from the model-chain ensembles can outperform the classical approach, in which the deterministic forecasts are generated through a single optimized model chain.

      At this stage, one follow-up question is whether using deterministic weather input is sufficient, or would there be any added benefits if probabilistic weather predictions are used in conjunction with a model-chain ensemble? Indeed, as shown in Fig. 22a, which is a well-known representation of the ensemble NWP concept, several equally probable forecast trajectories may be produced using different initial conditions but the same model—this is the case of dynamical ensemble NWP, whereas probabilistic weather forecasts from several NWP models form a poor man’s ensemble; the reader is referred to Roulston and Smith (2003) for a review of different types of ensemble NWP. Moving beyond forecasts, probabilistic predictions of weather variables, such as the irradiance or aerosol optical depth, can also be derived from remote sensing data [e.g., Yang and Gueymard (2021a, b)]. Clearly then, there are three ways to materialize a probabilistic model chain without post-processing: (1) a deterministic weather input with an ensemble model chain; (2) probabilistic weather input with a single optimized model chain; and (3) probabilistic weather input with an ensemble model chain. While the first option was investigated by Mayer and Yang (2022), the latter two were considered in the follow-up work of Mayer and Yang (2023b).

      The raw ensemble is under-dispersed since it only covers some, but not all sources of the uncertainties. This is especially true for the model-chain ensemble, which only accounts for the uncertainty of the irradiance-to-power conversion, while most of the uncertainty comes from the NWP. Good reliability is a prerequisite of good probabilistic forecasts; therefore, ensemble models always need to be calibrated, which can be done practically with any P2P method, but the most commonly used ones are EMOS and QR. In a more general sense, even deterministic predictions can be post-processed into probabilistic ones, which makes the workflows of creating a probabilistic solar power curve with post-processing extremely versatile:

      1. Deterministic weather input + D2P post-processing + single model chain;

      2. Deterministic weather input + D2D post-processing + ensemble model chain;

      3. Deterministic weather input + D2D post-processing + single model chain + D2P post-processing;

      4. Deterministic weather input + D2P post-processing + single model chain + P2P post-processing;

      5. Deterministic weather input + D2D post-processing + ensemble model chain + P2P post-processing;

      6. Probabilistic weather input + P2P post-processing + single model chain;

      7. Probabilistic weather input + P2D post-processing + ensemble model chain;

      8. Probabilistic weather input + P2D post-processing + single model chain + D2P post-processing;

      9. Probabilistic weather input + P2P post-processing + single model chain + P2P post-processing;

      10. Probabilistic weather input + P2D post-processing + ensemble model chain + P2P post-processing.

      Clearly, this sort of freedom of choice implies a large amount of work to be done, in order to determine which workflow is optimal and why. At this moment, there is no published work in this regard, however, several are being prepared (Sebastian LERCH, 2023, personal communication). Some preliminary results from Lerch, as well as the current authors, suggest the intermediate post-processing step is not so efficient, as the post-processing at the final stage is able to correct most of the calibration problem, while not losing too much accuracy. In this regard, the possible options are reduced to:

      1. Deterministic weather input + single model chain + D2P post-processing;

      2. Deterministic weather input + ensemble model chain + P2P post-processing;

      3. Probabilistic weather input + single model chain + P2P post-processing;

      4. Probabilistic weather input + ensemble model chain + P2P post-processing.

      These four workflows correspond to methods 0, 1C, 2C, and 3C in the article by Mayer and Yang (2023b), where “C” stands for “calibration,” which contrasts methods 1R, 2R, and 3R that denote the corresponding “raw” versions without P2P post-processing. It should be remarked that because post-processing is involved in all four workflows, one may interpret them as a form of hybridization, where model chain and (probabilistic) regression work hand-in-hand.

      In the empirical part of the work, Mayer and Yang (2023b) considered data from 14 utility-scale PV plants from Hungary alongside ensemble NWP forecasts from the ECMWF, over a course of two years (2019–20), at a 15-min temporal resolution. 11 Whereas the ensemble model chain construction largely followed the previous work (Mayer and Yang, 2022), the P2P post-processing tool employed was quantile regression. The results reveal that the overall most accurate workflow was the one that used both ensemble NWP and ensemble model chain with post-processing, i.e., method 3C, despite other alternatives only deteriorating the continuous ranked probability score (CRPS) marginally, so long as post-processing was applied. However, when post-processing was not applied, e.g., due to a lack of additional training data, the CRPS was generally higher. CRPS is a composite score, in that, it evaluates both calibration and sharpness. It was found that methods 1R, 2R, and 3R have better sharpness than 1C, 2C, and 3C, but are not calibrated, both of which can be can be attributed to the fact that the raw versions of probabilistic solar power curves are under-dispersed. In terms of deterministic forecasting, i.e., eliciting deterministic forecasts from ensemble forecasts, method 3C again shows the smallest error, confirming the need for probabilistic modeling even if the final target required is just deterministic. This pioneering work has shed light on the forecasting applications of probabilistic solar power curves, whereas resource assessment applications remain unexplored.

    7.   Conclusion and outlook
    • Owing to the cardinal importance of the topic at hand—solar power curve modeling—this tutorial review is longer, and provides a lot more technical details, than a typical review article, with the aim of providing atmospheric scientists with a complete knowledge map pertaining to the irradiance-to-power conversion. Indeed, solar power is an indispensable part of modern power systems, which can best exemplify the rapid morphing from a fossil-fuel-dominant energy mix to one in which renewables take the largest share, so as to eventually reach carbon neutrality. Just a few years ago, neither atmospheric scientists nor power system engineers were paying enough attention to solar power curve modeling, as the former were mostly confined to the physics of radiation but not its downstream applications, whereas the latter largely relied on very simple surrogates to represent the relatively small proportion of renewables in power systems. Notwithstanding, with the advent of energy meteorology, the status quo has changed, and the gravity of the solar power curve is being recognized by increasingly many. It is on this account that we hope the present tutorial could become a must-read for anyone entering the field or wishing to stay up to date.

      If we are to summarize solar power curves in one sentence, it should be this. Solar power curves convert irradiance and auxiliary variables to PV power, and they can be either deterministic or probabilistic, data-driven or physical (or a combination of both), and their highest performance can only be achieved with calibration and optimization, but would always differentiate by geographical, meteorological and sky conditions. Depending on whether or not the input is forecast, solar power curves can be used for both forecasting and resource assessment purposes, with the latter encompassing a wide range of applications, such as PV resource mapping, PV system design and evaluation, firm generation, microgrid configuration, power system simulation, or climate change impact on solar generation. Certainly, each of these applications has been attracting attention, but this is still subject to indefinite refinement, which is why continuous development of methods and techniques is most welcome.

      A rule of thumb in energy meteorology is that if physics is able to provide a complete picture, we tend to use physics; otherwise, we have to turn to data-driven methods. Although the former cases are rare, which rationalizes the rapid developments of data-driven methods, the question is how much physics remains exploitable with respect to the current epistemological edifice. Some enhancement to solar power curve modeling, such as integrating radiative transfer in the separation of GHI or upgrading the piecewise parameterization of the Perez model with a continuous one, may be achievable in the near future, but it seems to us that there is no way to model the solar power curve with pure physics, which would then prove the existence of Laplace’s Demon. In other words, scientific understanding of the world would perpetually be partly physical and partly empirical. However, this viewpoint is not incompatible with developing techniques that are more general and models that are more accurate.

      As the complexity and performance of solar power curve modeling get higher, e.g., through exploiting either more advanced deep-learning techniques or model chains with more adequate combinations of component models, it seems quite necessary to simultaneously push forward the modeling capability of other accompanying technologies, such as battery storage or control systems. Only then can the final conclusions in regard to the best way of utilizing solar energy be truly justified. Finally, we should remark that a shared basis of all the aforementioned research is data. Despite the fact that weather data are available in bulk, PV system data are more often than not proprietary, which has hitherto been viewed as a major limiting factor for advancing solar power curve research. Therefore, we urge PV system owners to share their data openly, such that the benefits gained through research can one day be translated to true economic gain and advancements towards century-long energy sustainability, which is absolutely vital to the continuation of human life on this planet.

      Acknowledgements. Dazhi YANG is supported by the National Natural Science Foundation of China (project no. 42375192), and the China Meteorological Administration Climate Change Special Program (CMA-CCSP; project no. QBZ202315).

      Xiang'ao XIA is supported by the National Natural Science Foundation of China (project no. 42030608).

      Martin J. MAYER is supported by the National Research, Development and Innovation Fund, project no. OTKA-FK 142702, and by the Hungarian Academy of Sciences through the Sustainable Development and Technologies National Programme (FFT NP FTA) and the János Bolyai Research Scholarship.

      Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

      Funding note: Open access funding provided by Budapest University of Technology and Economics.




    DownLoad:  Full-Size Img  PowerPoint