A data-driven approach to predicting power outages during winter storms in the southern U.S. leveraging nonparametric machine learning models
Jangjae Lee , Zhe Zhang , Stephanie G. Paal
Computational Urban Science ›› 2025, Vol. 5 ›› Issue (1) : 62
A data-driven approach to predicting power outages during winter storms in the southern U.S. leveraging nonparametric machine learning models
In February 2021, Winter Storm Uri severely impacted much of the southern United States, triggering unprecedented large-scale power outages. Recognizing that a similar extreme weather event could occur in the future, this study identifies as its primary research objective the development of a baseline power outage prediction model specifically tailored for the southern region of the United States. Central to this objective is the research question: Which variables and which regression models play the most significant role in accurately predicting power outages in this context? Given that large-scale outages are, in essence, a direct result of imbalances between electricity supply and demand, population was considered a key influencing factor. Furthermore, to ensure the model adequately reflects the meteorological characteristics of winter storms, several atmospheric variables—such as dew point and atmospheric pressure—were incorporated into the analysis. These variables are intended to capture the environmental dynamics that underpin outage occurrence during extreme cold events. Four machine learning models—Random Forest, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost)—were employed in this study. In addition, to enable a comparison between these four machine learning approaches and traditional statistical models, Ridge regression and Lasso regression were also implemented, utilizing population and geographic information data in conjunction with meteorological variables to achieve this objective. To determine the optimal model configuration, Bayesian optimization was employed using tenfold cross-validation. The results revealed that XGBoost achieved the highest performance, with an R2 score of 0.92. Furthermore, when the XGBoost model was utilized for prediction, a permutation importance analysis identified population, dew point, and pressure—in that order—as the most influential variables. Additionally, given that the number of data points varied by state during the test evaluation phase, a weighted evaluation metric was also computed using the data counts for each state. Under this weighted evaluation, XGBoost still achieved the highest R2 score (0.74), further underscoring its robustness across heterogeneous state-level datasets. Consequently, this paper developed a foundational baseline model for power outage prediction due to winter storms in the Southern United States and identified essential variables through analysis.
Power outages / Data-Driven / Winter Storms / Machine learning
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
Caller-Times Corpus Christi. (2022). OFF THE GRID: United States Power Outage Tracker. Corpus Christi Caller-Times. https://data.caller.com/national-power-outage-map-tracker/ |
| [7] |
|
| [8] |
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 |
| [9] |
Daoud, E. A. (2019, January 15). Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset. https://www.semanticscholar.org/paper/Comparison-between-XGBoost%2C-LightGBM-and-CatBoost-a-Daoud/b992fdb71b4b78d7b81dc3761402f4eb446077c2 |
| [10] |
Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support (No. arXiv:1810.11363). arXiv. https://doi.org/10.48550/arXiv.1810.11363 |
| [11] |
Electric Reliability Council of Texas. (2025). Federal Energy Regulatory Commission. https://www.ferc.gov/industries-data/electric/electric-power-markets/ercot |
| [12] |
|
| [13] |
Gholamy, A., Kreinovich, V., & Kosheleva, O. (2018). Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation. Departmental Technical Reports (CS). https://scholarworks.utep.edu/cs_techrep/1209 |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
Hoegh-Guldberg, O., Jacob, D., Bindi, M., Brown, S., Camilloni, I., Diedhiou, A., Djalante, R., Ebi, K., Engelbrecht, F., Guiot, J., Hijioka, Y., Mehrotra, S., Payne, A., Seneviratne, S. I., Thomas, A., Warren, R., Zhou, G., Halim, S. A., Achlatis, M., … Zougmoré, R. B. (2018). Impacts of 1.5°C Global Warming on Natural and Human Systems. In V. Masson-Delmotte, P. Zhai, H. O. Pörtner, D. Roberts, J. Skea, P. R. Shukla, A. Pirani, W. Moufouma-Okia, C. Péan, R. Pidcock, S. Connors, J. B. R. Matthews, Y. Chen, X. Zhou, M. I. Gomis, E. Lonnoy, T. Maycock, M. Tignor, & T. Waterfield (Eds.), Global warming of 1.5°C. (pp. 175–311). IPCC Secretariat. https://www.ipcc.ch/sr15/chapter/chapter-3/ |
| [23] |
|
| [24] |
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 103). Springer. https://doi.org/10.1007/978-1-4614-7138-7 |
| [25] |
|
| [26] |
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017, December 4). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Neural Information Processing Systems. https://www.semanticscholar.org/paper/LightGBM%3A-A-Highly-Efficient-Gradient-Boosting-Tree-Ke-Meng/497e4b08279d69513e4d2313a7fd9a55dfb73273 |
| [27] |
|
| [28] |
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence. 2:1137-1143. |
| [29] |
|
| [30] |
Lee, S., Choi, J., Jung, G., Tabassum, A., Stenvig, N., & Chinthavali, S. (2023). Predicting Power Outage During Extreme Weather Events with EAGLE-I and NWS Datasets. 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI), 211–212. https://doi.org/10.1109/IRI58017.2023.00042 |
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
Raschka, S., Liu, Y. (Hayden), Mirjalili, V., & Dzhulgakov, D. (2022). Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python. https://ieeexplore.ieee.org/document/10162164 |
| [41] |
|
| [42] |
|
| [43] |
Texas Association of Counties. (2022). Texas Association of Counties. https://www.county.org/ |
| [44] |
The City of Austin and Travis County. (2022). Winter Storm Uri After Action Resources. https://www.austintexas.gov/winter-storm-uri-after-action-resources |
| [45] |
|
| [46] |
|
| [47] |
Weather Underground. (2022). Local Weather Forecast, News and Conditions. https://www.wunderground.com/ |
| [48] |
United States Census Bureau. (2022). United States Census Bureau. Census.Gov. https://www.census.gov/en.html |
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
The Author(s)
/
| 〈 |
|
〉 |