Can extremely high-temperature weather forecast oil prices?

Donglan ZHA , Shuo ZHANG , Yang CAO

Front. Eng ›› 2025, Vol. 12 ›› Issue (3) : 529 -542.

PDF (1824KB)
Front. Eng ›› 2025, Vol. 12 ›› Issue (3) : 529 -542. DOI: 10.1007/s42524-025-4075-5
Energy and Environmental Systems
RESEARCH ARTICLE

Can extremely high-temperature weather forecast oil prices?

Author information +
History +
PDF (1824KB)

Abstract

Participants in oil markets are increasingly aware of the climate risks posed by frequent extreme weather. This paper examines the role of extremely high-temperature weather information in predicting oil futures prices on the China International Energy Exchange (INE). An extreme high-temperature weather index (HTI) is developed on the basis of meteorological data at INE’s crude oil production and storage sites. The local interpretable model-agnostic explanations (LIME) and accumulated local effects (ALE) methods are used to compare the predictive contribution of the HTI with that of 15 common predictors. The results indicate that the HTI enhances the out-of-sample accuracy of five classical prediction models for INE oil prices. The recurrent neural network (RNN) model exhibits superior out-of-sample forecast performance, with an MAE of 14.379, an RMSE of 19.624, and a DS of 66.67%. The predictive importance of the HTI in the best RNN model ranks third in most test instances, surpassing conventional oil price predictors such as stock market indicators. The ALE analysis reveals a positive correlation between extremely high-temperature weather and INE oil prices. These findings can help investors and oil market regulators improve oil price forecast accuracy while also providing new evidence about the relationship between climate risk and oil prices.

Graphical abstract

Keywords

crude oil futures / climate risks / explainable machine learning

Cite this article

Download citation ▾
Donglan ZHA, Shuo ZHANG, Yang CAO. Can extremely high-temperature weather forecast oil prices?. Front. Eng, 2025, 12(3): 529-542 DOI:10.1007/s42524-025-4075-5

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

As crude oil becomes increasingly influenced by market dynamics, fluctuations in its price have a significant effect on the global economic and financial landscape (Naser, 2016). Accurate forecasts of crude oil prices play a crucial role in providing scientific support for energy-intensive enterprises and helping investors optimize their portfolios to effectively manage risks (Tian et al., 2023; Zhang and Wang, 2022).

To develop precise projections, it is essential to uncover the underlying factors driving fluctuations in crude oil prices. Previous research has indicated that the long-term trend of oil prices is determined by market supply and demand fundamentals (Dées et al., 2007; Hamilton, 2009), whereas short-term volatility may be affected by external factors such as stock performance (Bouri et al., 2022), exchange rates (Sun et al., 2022), and investor sentiment (Dai et al., 2022a). The increasing occurrence of extreme weather events in recent years has increased the vulnerability of oil products to climate-related risks (Cruz and Krausmann, 2013; Wen et al., 2021; Tumala et al., 2023), including the impact of global warming caused by the greenhouse effect (Kweku et al., 2018; Zhang et al., 2024). The demand for crude oil and other fossil fuels tends to rise during periods of extreme hot weather (van Ruijven et al., 2019). Moreover, elevated temperatures can disrupt operations at drilling and refinery sites (Yalew et al., 2020; Qui et al., 2023) and pose challenges to the integrity of oil transportation infrastructure, such as pipelines (Izaguirre et al., 2021), potentially affecting the oil supply. Consequently, stakeholders in oil markets must consider the extremely high temperatures when assessing market conditions and making pricing decisions. However, existing research on crude oil forecasting has not adequately considered extremely high weather conditions. Although recent studies have highlighted the importance of extreme weather information in predicting crude oil prices (Xu et al., 2023), their reliance on media reports introduces subjective bias. Microscale meteorological observations have the potential to provide oil market managers with precise weather information (Katopodis and Sfetsos, 2019). Our objective is to contribute additional empirical evidence regarding the relationship between extreme weather events and oil price forecasts by utilizing precise meteorological data from storage and supply locations in specific target oil markets.

There are two primary types of models used for forecasting oil prices. The first type consists of traditional statistical models, such as exponential smoothing (ES) (Azevedo and Campos, 2016; He, 2018), the autoregressive integrated moving average (ARIMA) (Xiang and Zhuang, 2013; Zhao and Wang, 2014), and vector autoregression (VAR) models (Baumeister and Kilian, 2014). However, these models face challenges in capturing the inherent nonlinearity in oil price dynamics (Gao and Lei, 2017).

The second category of methodologies comprises emerging machine learning models (Zhao et al., 2017), which are primarily represented by support vector regression (SVR) (Wang et al., 2020), recurrent neural networks (RNNs) (Chaitanya Lahari et al., 2018), and long short-term memory (LSTM) models (Güleryüz and Özden, 2020). These models have advantages in characterizing the nonlinear relationship between influencing factors and crude oil prices, and they offer effective forecasting accuracy (Öztunç Kaymak and Kaymak, 2022). However, machine learning models are commonly perceived as enigmatic black boxes, as they present challenges in providing users with a comprehensive understanding of their predictive mechanisms.

The advent of explainable machine learning methods has offered a valuable tool for elucidating how factors drive predictive outcomes, and their application has expanded into research fields such as forecasting bitcoin prices (Goodell et al., 2023) and energy consumption (Aras and Hanifi Van, 2022). Nevertheless, these explainable methods have not yet been applied in crude oil price forecasting research.

Considering the aforementioned limitations, this research makes three significant contributions to the current literature. First, we develop an extremely high-temperature weather index (HTI) based on daily meteorological data specific to the crude oil production and storage sites of the China International Energy Exchange (INE). Unlike previous indices that broadly describe the frequency of extreme weather events on a global or national scale (Guo et al., 2023a) or those derived from textual reports on extreme weather attention (Xu et al., 2023), our HTI provides a finer scale to present extreme high-temperature weather information for INE crude oil price prediction.

Second, our study confirms that including the HTI as a predictive factor enhances the accuracy of INE crude oil futures price prediction in terms of errors and trend changes. In fact, the out-of-sample predictive contribution of the HTI even surpasses several common indicators, such as the stock market index, in most instances. As the HTI value increases, a corresponding rise in the predicted INE crude oil futures price is observed. Third, we introduce explainable methods to improve the credibility of machine learning models in the crude oil prediction process, overcoming the inherent deficiencies of previous black box models. This enables us to gain deeper insights into the correlation between varying degrees of extreme heat and crude oil price dynamics. The remainder of the paper is organized as follows: Section 2 introduces the methodology. Section 3 describes the data. Section 4 presents our results. Finally, Section 5 provides concluding remarks.

2 Methodology

2.1 Model framework

Fig.1 illustrates our model framework, which consists of four parts. First, we calculate the extreme HTI based on weather data from crude oil storage locations and production locations. Second, we conduct variable selection and feature engineering steps. We employ linear Granger causality tests to select predictive factors that are beneficial for forecasting crude oil prices. Additionally, Pearson correlation analysis and principal component analysis are used to eliminate multicollinearity among variables. Third, we employ five classical regression models to forecast INE crude oil prices and identify the best-performing model on the basis of comprehensive evaluation criteria. Finally, on the basis of the selected model, we use local interpretable model-agnostic explanations (LIME) and accumulated local effects (ALE) analysis to analyze the role of the HTI in forecasting INE oil prices.

2.2 Feature engineering

2.2.1 Linear Granger causality test

The statistical hypothesis test described in this study is designed to determine whether one time series has predictive ability for another (Granger, 1969). Prior to conducting the Granger causality test, it is necessary to verify the stationarity of the data series via the augmented Dickey‒Fuller (ADF) test. If the original sequence exhibits nonstationarity, it undergoes differential processing before continuing with the stationarity test.

2.2.2 Pearson correlation and principal component analysis

In the Pearson correlation analysis, the Pearson correlation coefficient is used as a statistical measure to quantify the degree of linear association between two variables, ranging from −1 to 1 (Zhang et al., 2023). Principal component analysis (PCA) simplifies data by reducing correlated variables to a smaller set of uncorrelated components known as principal components. This method captures the information of diverse factors within a reduced-dimensional space (Wen and Cao, 2020). PCA is used to integrate variables with correlation coefficients higher than 0.8 into comprehensive indices.

2.3 Forecasting models

2.3.1 Exponential smoothing method

ES is suitable for time series forecasting with irregular trends or nonseasonal patterns because it applies differential weighting to the time series (Yorucu, 2003). We define DINE as the first-order differenced series of the original INE series. The mathematical expression of the ES is depicted in Eq. (1):

DINEt+1^=αDINEt+( 1α ) DINEt^ ,

where DINEt^ represents the predicted value of DINE at time t, DINEt represents the actual value of DINE at time t, and α is the smoothing constant, ranging from 0 to 1. On the basis of the results from the training set, we set the value of α to 0.1.

2.3.2 Support vector regression

SVR is a kernel-based approach used for regression tasks (Cortes and Vapnik, 1995). SVR converts low-dimensional data into high-dimensional spatial data and solves it via Eq. (2):

y=wφ (x)+ b,

where x represents the raw data value as input, y represents the output of the model w represents the weight coefficient vector, φ(x ) represents the mapping function that converts low-dimensional data into high-dimensional space, and b represents bias.

By introducing relaxation variables ξ and ξi, the optimization problem is transformed into a programming problem for the solution, with Eq. (3):

Min12 w2+C i=1n( ξi+ ξi), s.t.{ωφ ( χi)+byiε+ξ, yiωφ(χ i)bε+ξ, ξ, ξ0,i=1,2,,N.

2.3.3 LightGBM (LGBM)

LGBM is an algorithm built on the gradient boosting decision tree (GBDT) (Ke et al., 2017). It uses network connectivity algorithms to maximize parallel learning and constructs trees in a leafwise manner to accelerate GBDT models without compromising accuracy. Its rapid training speed has made it a favored option among data scientists and machine learning practitioners.

2.3.4 Recurrent neural networks

RNNs are a class of supervised machine learning models that consists of input, hidden, and output layers. It utilizes recurrent connections to connect the current output with the previous output, allowing information to be passed on to the next cell. The RNN cell can be represented by Eq. (4):

ht= σ(W xt+ Uht 1+b ),

where ht represents the hidden state of the sell at time t, σ () represents the activation function, W represents the weight matrix of the input, xt represents the input state, U represents the weight matrix of the recurrent input, and b represents the bias.

The backpropagation algorithm in an RNN may lead to vanishing or exploding gradients, which hampers the learning of long-term dependencies. This issue can be mitigated by using LSTM, which incorporates special gating units.

For the SVR, RNN, and LSTM models, we normalize the data via the maximum-minimum normalization method. To address overfitting problems, the hyperparameters of all machine learning models are optimized via time series split cross-validation and the Bayesian optimization algorithm. − in Appendix A provides detailed information regarding the hyperparameter settings.

2.3.5 Evaluation metrics

We utilize the following three metrics to assess the out-of-sample predictive performance of each model, as expressed in Eqs. (5)−(7):

1) The mean absolute error (MAE) measures the average prediction error of a model and quantifies the mean magnitude of the discrepancy between the predicted and observed values.

2) The root mean square error (RMSE) is a robust evaluation indicator that quantifies the magnitude of deviation between the predicted and true values by computing the square root of the sum of the squared differences.

3) Directional symmetry (DS) is a metric used to evaluate a model’s ability to accurately capture trends in oil futures prices. It is a crucial gauge monitored by market speculators.

MAE=1ni=1n|yi yi^|,

RMSE= 1n i=1n(yi yi^)2,

DS=i= 1nI (sign( yi)=sign( yi^))n,

where I designates an indicator function, which takes a value of one when the condition in parentheses is satisfied and 0 otherwise; yi and yi^ denote the actual and predicted values, respectively; and n is the total number of samples.

2.4 Explainable analysis methods

2.4.1 Local interpretable model-agnostic explanations

LIME is a post-hoc interpretation method that explains the local prediction mechanism of a black box model by constructing a locally interpretable proxy model around the instances of interest (Ribeiro et al., 2016). LIME generates a list of explanations that describe the local predictive contribution of each feature to a data sample.

The computation process of LIME involves four steps. First, new approximate samples are generated around the original sample set by applying Gaussian perturbation. Second, an exponential kernel function is used to calculate the proximity degree of the samples on the basis of the Euclidean distance. Third, a local linear regression model is trained using these weighted neighboring samples to approximate the behavior of the prediction model locally. Finally, the coefficients of the trained linear regression represent the contribution of each feature to the prediction target.

2.4.2 Accumulated local effects analysis

ALE was first proposed by Apley and Zhu (2020) and is applied to describe how features affect the average predictions in a machine learning model. ALE is considered an improved and more efficient version of partial dependence plots (PDP), as it addresses the issue of overlooking correlations among input features.

The process of calculating the decentered ALE values for the HTI is as follows: First, the HTI values are divided into multiple intervals on the basis of their distribution. Next, for each interval, the average difference in the predicted values is computed when the HTI feature transitions from the lower to the upper bound. Finally, the decentralized ALE value is obtained by aggregating the cumulative average differences across all intervals and subtracting the population mean.

3 Data

This study focuses on analyzing INE crude oil futures in China for two primary reasons. First, the INE has emerged as the world’s third-largest center for trading crude oil futures, significantly influencing international crude oil pricing (Dai et al., 2022b). While there have been research efforts dedicated to forecasting crude oil prices, most of them have focused predominantly on the WTI and Brent markets, paying limited attention to INE crude oil futures. Second, meteorological station data from the oil storage and production sites of the INE are readily accessible, making it convenient to construct an extremely high-temperature index.

3.1 INE crude oil futures prices

The data set used in our study covers daily time series data from March 26, 2018, to December 31, 2022. In machine learning forecasting, directly predicting crude oil prices may result in a pseudo prediction phenomenon, causing a delay between the predicted and actual values (Cao et al., 2023). Therefore, we utilize the first-order difference in the closing price of INE crude oil futures (D-INE) as the predictor variable. The timeline of D-INE is illustrated in Fig.2.

3.2 Extreme high-temperature weather index

Considering the regional variability in the production and storage areas related to INE, we employ relatively extreme weather thresholds instead of a uniform absolute threshold. Extreme high-temperature weather is defined as a day when the highest temperature exceeds the 85th percentile of its historical data, which is based on the highest temperatures recorded for that date over the past 30 years (Li et al., 2023). The extreme weather data for the storage and production areas were obtained from the National Center for Environmental Information (NCEI) of the US National Oceanic and Atmospheric Administration (NOAA). Our calculation of the HTI is expressed as Eq. (8):

HTI= i= 1nαiHi+ k=1nβiHi2,

where n and m represent the number of INE crude oil storage and production sites, respectively; α i represents the weight of storage location i, which is determined by the proportion of effective capacity at each crude oil storage site; β k represents the weight of production location k, which is based on the utilization frequency of the supply port on the INE’s official website; and H represents the degree of extreme high-temperature weather at a specific location, calculated via Eq. (9):

H={ x ob se rv edxthreshoid xmax xthreshoid, xobserved>xthreshoid 0,xobrenedxthreshoid,

where xobserved is the observed value of a specific day; xmax is the historical maximum value; and x th re sh ol d is the 85th quantile threshold. in Appendix A shows the weather station information and weights at every location.

Fig.3 presents the daily HTI from March 26, 2018, to December 31, 2022, highlighting various spikes and corresponding key extreme heat weather events. Notably, the HTI accurately captures the influence of central-type El Niño and La Niña events in 2020 and 2021, respectively, leading to dry anomalies over eastern China and significant sea surface temperature anomalies in the Pacific and Indian Oceans (Wang et al., 2023). This demonstrates the effectiveness of our HTI index in capturing extremely high-temperature events relevant to INE crude oil.

3.3 Other predictive factors

Crude oil futures prices can be influenced by many factors. This paper develops an indicator system with 16 factors across eight dimensions to predict D-INE, as shown below.

1) International crude oil markets: The returns and volatility series of INE, WTI, and Brent oil futures are closely interconnected, indicating significant information transfer with respect to Chinese INE pricing (Wei et al., 2022). Therefore, we include the WTI crude oil futures closing price (WTI) and Brent crude oil futures closing price (Brent) as proxy variables.

2) Energy market prosperity: We consider the Chinese energy industry index from Wind (EI) and the oil and gas index of the Chinese National Index (OG) to represent the development of energy markets (Jiang et al., 2022).

3) Exchange and bond markets: The RMB exchange rate has an effect on crude oil futures pricing (Sun et al., 2022). We utilize the daily real effective exchange rate for China (REER) as well as the overnight interest rate in the Shanghai interbank offered market (SHIBOR) to calculate the interest rate (Guo et al., 2023b).

4) Stock markets: There are significant risk interconnections between the crude oil and stock markets (Zhu et al., 2024). Therefore, we include the Hushen 300 index (CSI300) to represent the Chinese stock market and the S&P 500 index (SPX) to represent the American stock market (Jiang et al., 2022; Shi et al., 2020).

5) Public attention. Internet search activity data are widely utilized to estimate trader attention in various markets (Afkhami et al., 2017). As Baidu is the largest search engine in China, we selected the search term ‘crude oil futures’ and employed the Baidu Index search volume (BD) as a proxy for public attention (Shen et al., 2017).

6) Geopolitical risks. To represent geopolitical risk, we adopt three classical variables, namely, global geopolitical risk (GPR), geopolitical act risk (GPA), and geopolitical threat risk (GPT) (Caldara and Iacoviello, 2022).

7) Climate policy uncertainty. To reflect transitional climate risk, we introduced the China climate policy uncertainty (CCPU) index (Ma et al., 2023).

8) Technical indicators. The application of technical indicators in crude oil price forecasting has led to significant improvements in forecasting performance (Zhang et al., 2018). We selected the moving average convergence divergence (MACD) and the five-day moving average (5D-MA), both widely used indicators, to represent technical development.

Following the alignment of date information for all indicators, the linear interpolation method was employed to fill in any missing daily values. Tab.1 provides descriptive statistics for all the predictive factors.

4 Empirical analysis

4.1 Feature engineering results

Before conducting the Granger causality test, we apply the ADF test to ensure the stationarity of the data series. The results are presented in Tab.1. Nonstationary sequences such as WTI, Brent, EI, OG, REER, CSI300, and SPX were differenced once to achieve stationarity. We used the stationary series of each variable in the Granger causality test. Tab.2 displays the test results between each predictive factor (cause) and D-INE (result). At the 10% significance level, WTI, Brent, CSI300, GPA, HTI, MACD, and 5D-MA passed the test, whereas EI, OG, REER, SHIBOR, SPX, BD, GPR, GPT, and CCPU did not. As the INE crude oil futures market operates five days a week, we conducted tests for up to five lags to ensure a comprehensive causality analysis. Variables passing any lag test were retained for subsequent forecasting.

The Pearson correlation coefficients among all factors that satisfy the Granger causality tests are shown in Tab.3. The HTI has a slight correlation with the other variables. Additionally, the correlation coefficients of the related crude oil futures market (WTI & Brent) and technical indicators (MACD & 5D-MA) are both greater than 0.8.

Therefore, the PCA method is employed to synthesize these two pairs of variables into two comprehensive indices. The first component, labeled ‘B-T’, captures approximately 99.42% of the related crude oil futures markets (WTI & Brent). This indicates that B-T effectively captures the essence of market information. Similarly, the second component, labeled ‘Trend’, represents the price trend (MACD & 5D-MA) and explains approximately 98.29% of the variance observed in the technical indicators. It has a substantial eigenvalue of 36.68, demonstrating its robust ability to capture price movements.

4.2 Prediction results

In this subsection, five model predictions are conducted to investigate the impact of introducing the HTI on the prediction accuracy. Tab.4 displays the evaluation results of the out-of-sample prediction. A lower MAE and RMSE, along with a higher DS, indicate better model performance. The RNN model with HTI (input sequence length: 1) outperforms all the other models. Empirical findings suggest that using shorter input sequence lengths instead of longer sequences mitigates overfitting and promotes more effective feature learning, given the limited size of our data set. The integration of the HTI index into all the models led to a noticeable increase in prediction accuracy. For the RNN (input sequence length: 1) model, introducing the HTI results in decreases in the MAE and RMSE of 0.018 and 0.025, respectively, as well as a 0.87% improvement in the DS over the prediction results without the HTI. The prediction models with the HTI improve the DS by 0.43% for the SVR, 4.33% for the LGBM, 2.22% for the LSTM (input sequence length: 5), 0.43% for the LSTM (input sequence length: 1), and 2.66% for the RNN (input sequence length: 5). These findings indicate that the HTI plays a significant role in crude oil futures investment, with important practical implications. Fig.4 presents a comparison of the predicted and true values of the best RNN models across the full data set, illustrating accurate predictions of D-INE with strong fits for both in-sample and out-of-sample predictions.

To validate the robustness of our feature engineering module, an experiment was conducted in which all feature variables were incorporated without Granger causality tests. The best-performing RNN model was used for forecasting while maintaining the processes of calculating correlation coefficients and performing principal component analysis, with unchanged hyperparameter optimization settings. The detailed results can be found in Appendix B.

4.3 Explainable machine learning analysis

4.3.1 LIME analysis

All explainable machine learning analyses were based on the best-performing RNN (input sequence length: 1) with HTI. The feature variables in the test data set were ranked according to their absolute impact on the model’s local predictions, as shown in Fig.5.

The trend variable, which represents the movement tendency of crude oil futures prices, consistently ranked first in the test samples. The GPA, an indicator representing geopolitical actions, generally ranked second, largely because of the tensions in the 2022 Russian–Ukrainian conflict. The impact of the HTI on D-INE forecasts ranked third on most days, providing a more significant predictive contribution than traditional indicators such as CSI300. The higher average HTI value in the test set accounted for its observed high ranking in this LIME analysis. It was also noted that the HTI rose to second place between March and April 2022. This fluctuation may be linked to the stratospheric final warming event in the Northern Hemisphere, accompanied by high HTI values during this period.

4.3.2 ALE analysis

The ALE method was employed to plot the marginal effect of the HTI, as shown in Fig.6. The y-axis represents the difference between the model’s predicted D-INE output and the average predicted value across all samples. The x-axis reflects the range of normalized feature values.

We observed a positive slope in the ALE plot, indicating that as the HTI increases, the model predicts a relative increase in D-INE compared with the average prediction. When the HTI value is less than 0.19, the y-coordinate is negative, meaning that the predicted outcome is lower than the average predicted D-INE. As the HTI continues to increase, the predicted value of the model gradually increases. Importantly, when the HTI exceeds 0.8, the corresponding D-INE-predicted value is greater than 0.35 compared with the average predicted value.

The ALE analysis results align with the economic principle of oil supply and demand. Under extremely hot weather conditions, the efficiency of crude oil production may be affected, resulting in reduced production (Cruz and Krausmann, 2013). Additionally, extremely high temperatures can cause transportation delays and increase shipping and storage costs. These factors work together to reduce the supply of crude oil. On the demand side, the extremely high temperatures may lead to an increase in demand for cooling energy, subsequently increasing the demand for crude oil. These changes in both supply and demand lead to an increase in oil prices.

5 Conclusions and implications

Participants in oil markets have become increasingly aware of the potential climate risks linked to frequent extreme temperature events. In this study, an HTI is constructed on the basis of meteorological data from the supply and storage locations of crude oil futures in China’s INE. Explainable machine learning methods are employed to analyze the contributions of HTI and other common indicators to forecasting INE crude oil futures prices. Our findings primarily include three aspects.

First, the RNN model (input sequence length: 1) with the HTI has the best out-of-sample forecasting performance, with an MAE of 14.379, an RMSE of 19.624, and a DS of 66.67%. Second, the inclusion of the HTI as a predictive factor improves the prediction accuracy of the SVR, LGBM, LSTM, and RNN models. In particular, the DS of the best RNN model increases by 0.87%, and the MAE and RMSE decrease by 0.018 and 0.025, respectively. Third, the predictive role of the HTI on D-INE ranks third in most instances in the test set, surpassing several common indicators such as CSI300. The increase in the HTI value is associated with a corresponding increase in D-INE, indicating a positive relationship between extremely high temperatures and INE crude oil futures prices.

Our findings offer insights for stakeholders in crude oil markets. Investors in the crude oil futures trading should incorporate the influence of extremely high temperatures into their trading strategy formulations and price expectations. Especially during periods of high HTI, the closing prices of crude oil futures tend to exceed average values. Investors should be vigilant against potential oil price fluctuations stemming from extreme high-temperature weather. Additionally, it is crucial to enhance temperature monitoring in oil production and storage areas. Regulatory bodies should establish comprehensive temperature monitoring systems and promptly disseminate high-temperature risk alerts to the market. This would raise awareness among market participants about potential weather-related risks.

While this paper contributes to the literature, it has limitations that future research should address. First, we constructed an extreme HTI using information from crude oil storage and supply locations. However, some extreme weather events that reflect the demand for crude oil have not been considered due to a lack of clear site information. We suggest that future research construct a more comprehensive extreme weather index, covering more types of weather information and considering the impact of extreme weather on the demand side of crude oil. Second, although we have confirmed the effect of extreme weather factors on the price prediction of INE crude oil futures, the confirmation of such an effect within other international crude oil markets remains unverified.

Appendix A

Appendix B

We conduct a supplementary experiment to verify the predictive role of the feature engineering module based on the RNN (input sequence length: 1). The Granger causality tests are removed in this experiment, namely, all variables are included as predictive factors. shows the Pearson correlation coefficients among all the variables. We find that the extremely high-temperature weather index has few correlations with the other variables. In addition, the correlation coefficients of the related crude oil futures market (WTI & Brent), exchange and stock markets (REER & SPX), geopolitical risk (GPR & GPT) and technical indicators (MACD & 5D-MA) are greater than 0.8.

We employ PCA to synthesize these four highly correlated variable pairs into four comprehensive indices. The first component captures 99.99% of the information of the exchange and stock markets (SPX & REER). The second component accounts for 97.86% of geopolitical risk (GPR & GPT). The third component, ‘B-T’, explains 99.42% of the crude oil futures markets (WTI & Brent). The fourth component, ‘Trend’, represents price trends (MACD & 5D-MA) accompanied by a 98.29% cumulative variance explained rate.

The evaluation results for the prediction model are as follows: MAE: 17.955; RMSE: 24.163; and DS: 63.32%. These metrics are not as good as those of the best-performing RNN model with the entire feature engineering step proposed in our paper. The results demonstrate the necessity of the Granger causality test and the effectiveness of PCA.

References

[1]

Afkhami M, Cormack L, Ghoddusi H, (2017). Google search keywords that best predict energy price volatility. Energy Economics, 67: 17–27

[2]

Apley D W, Zhu J, (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 82( 4): 1059–1086

[3]

Aras S, Hanifi Van M, (2022). An interpretable forecasting framework for energy consumption and CO2 emissions. Applied Energy, 328: 120163

[4]

Azevedo V G, Campos L M S, (2016). Combination of forecasts for the price of crude oil on the spot market. International Journal of Production Research, 54( 17): 5219–5235

[5]

Baumeister C, Kilian L, (2014). Real-time analysis of oil price risks using forecast scenarios. IMF Economic Review, 62( 1): 119–145

[6]

Bouri E, Iqbal N, Klein T, (2022). Climate policy uncertainty and the price dynamics of green and brown energy stocks. Finance Research Letters, 47: 102740

[7]

Caldara D, Iacoviello M, (2022). Measuring geopolitical risk. American Economic Review, 112( 4): 1194–1225

[8]

Cao Y, Zha D, Wang Q, Wen L, (2023). Probabilistic carbon price prediction with quantile temporal convolutional network considering uncertain factors. Journal of Environmental Management, 342: 118137

[9]

Chaitanya LahariMRaviD HBharathiR (2018). Fuel price prediction using RNN. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI): 1510–1514

[10]

Cortes C, Vapnik V, (1995). Support-vector networks. Machine Learning, 20( 3): 273–297

[11]

Cruz A M, Krausmann E, (2013). Vulnerability of the oil and gas sector to climate change and extreme weather events. Climatic Change, 121( 1): 41–53

[12]

Dai Z, Zhu J, Zhang X, (2022a). Time-frequency connectedness and cross-quantile dependence between crude oil, Chinese commodity market, stock market and investor sentiment. Energy Economics, 114: 106226

[13]

Dai X, Xiao L, Li M C, Wang Q, (2022b). Toward energy finance market transition: Does China’s oil futures shake up global spots market. Frontiers of Engineering Management, 9( 3): 409–424

[14]

Dées S, Karadeloglou P, Kaufmann R K, Sanchez M, (2007). Modelling the world oil market: Assessment of a quarterly econometric model. Energy Policy, 35( 1): 178–191

[15]

Gao S, Lei Y, (2017). A new approach for crude oil price prediction based on stream learning. Geoscience Frontiers, 8( 1): 183–187

[16]

Goodell J W, Ben Jabeur S, Saâdaoui F, Nasir M A, (2023). Explainable artificial intelligence modeling to forecast bitcoin prices. International Review of Financial Analysis, 88: 102702

[17]

Granger C W, (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37( 3): 424–438

[18]

Güleryüz D, Özden E, (2020). The prediction of Brent crude oil trend using LSTM and Facebook prophet. European Journal of Science and Technology, 1( 20): 1–9

[19]

Guo K, Liu F, Sun X, Zhang D, Ji Q, (2023a). Predicting natural gas futures’ volatility using climate risks. Finance Research Letters, 55: 103915

[20]

Guo L, Huang X, Li Y, Li H, (2023b). Forecasting crude oil futures price using machine learning methods: Evidence from China. Energy Economics, 127: 107089

[21]

Hamilton J D, (2009). Causes and consequences of the oil shock of 2007–08. 40( 1): 215–283

[22]

He X J, (2018). Crude oil prices forecasting: time series vs. SVR models. Journal of International Technology and Information Management, 27( 2): 25–42

[23]

Izaguirre C, Losada I J, Camus P, Vigh J L, Stenek V, (2021). Climate change risk to global port operations. Nature Climate Change, 11( 1): 14–20

[24]

Jiang Z, Zhang L, Zhang L L, Wen B, (2022). Investor sentiment and machine learning: Predicting the price of China’s crude oil futures market. Energy, 247: 123471

[25]

Katopodis T, Sfetsos A, (2019). A review of climate change impacts to oil sector critical services and suggested recommendations for industry uptake. Infrastructures, 4( 4): 74

[26]

Öztunç Kaymak Ö Kaymak Y, (2022). Prediction of crude oil prices in COVID-19 outbreak using real data. Chaos, Solitons, and Fractals, 158: 111990

[27]

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T Y, (2017). Lightgbm: A highly efficient gradient boosting decision tree. 3149–3157

[28]

Kweku D, Bismark O, Maxwell A, Desmond K, Danso K, Oti-Mensah E, Quachie A, Adormaa B, (2018). Greenhouse effect: greenhouse gases and their impact on global warming. Journal of Scientific Research and Reports, 17( 6): 1–9

[29]

Li L, Wu S, Chen Z, (2023). Extreme weather risks and macroeconomic fluctuations: a dual perspective based on network connectedness and spatial spillovers. Journal of Financial Research, 519( 09): 58–75

[30]

Ma Y R, Liu Z, Ma D, Zhai P, Guo K, Zhang D, Ji Q, (2023). A news-based climate policy uncertainty index for China. Scientific Data, 10( 1): 881

[31]

Naser H, (2016). Estimating and forecasting the real prices of crude oil: A data rich model using a dynamic model averaging (DMA) approach. Energy Economics, 56: 75–87

[32]

Qui M, Qui L H, Umar M, Su C W, Jiao W, (2023). The inevitable role of El Niño: a fresh insight into the oil market. Economic Research-Ekonomska Istraživanja, 33( 1): 1943–1962

[33]

RibeiroM TSingh SGuestrinC (2016). “Why Should I Trust You?”: explaining the predictions of any classifier. In: KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 1135–1144

[34]

Shen D, Zhang Y, Xiong X, Zhang W, (2017). Baidu index and predictability of Chinese stock returns. Financial Innovation, 3( 1): 4

[35]

Shi X, Wang K, Cheong T S, Zhang H, (2020). Prioritizing driving factors of household carbon emissions: An application of the LASSO model with survey data. Energy Economics, 92: 104942

[36]

Sun C, Zhan Y, Peng Y, Cai W, (2022). Crude oil price and exchange rate: Evidence from the period before and after the launch of China’s crude oil futures. Energy Economics, 105: 105707

[37]

Tian G, Peng Y, Meng Y, (2023). Forecasting crude oil prices in the COVID-19 era: Can machine learn better. Energy Economics, 125: 106788

[38]

Tumala M M, Salisu A, Nmadu Y B, (2023). Climate change and fossil fuel prices: A GARCH-MIDAS analysis. Energy Economics, 124: 106792

[39]

van Ruijven B J, De Cian E, Sue Wing I, (2019). Amplification of future energy demand growth due to climate change. Nature Communications, 10( 1): 2762

[40]

Wang B, Sun W, Jin C, Luo X, Yang Y M, Li T, Xiang B, McPhaden M J, Cane M A, Jin F, Liu F, Liu J, (2023). Understanding the recent increase in multiyear La Niñas. Nature Climate Change, 13( 10): 1075–1081

[41]

Wang J, Zhou H, Hong T, Li X, Wang S, (2020). A multi-granularity heterogeneous combination approach to crude oil price forecasting. Energy Economics, 91: 104790

[42]

Wei Y, Zhang Y, Wang Y, (2022). Information connectedness of international crude oil futures: Evidence from SC, WTI, and Brent. International Review of Financial Analysis, 81: 102100

[43]

Wen J, Zhao X X, Chang C P, (2021). The impact of extreme events on energy price risk. Energy Economics, 99: 105308

[44]

Wen L, Cao Y, (2020). Influencing factors analysis and forecasting of residential energy-related CO2 emissions utilizing optimized support vector machine. Journal of Cleaner Production, 250: 119492

[45]

Xiang Y, Zhuang X H, (2013). Application of ARIMA model in short-term prediction of international crude oil price. Advanced Materials Research, 798–799: 979–982

[46]

Xu Y, Duong D, Xu H, (2023). Attention! Predicting crude oil prices from the perspective of extreme weather. Finance Research Letters, 57: 104190

[47]

Yalew S G, van Vliet M T H, Gernaat D E H J, Ludwig F, Miara A, Park C, Byers E, De Cian E, Piontek F, Iyer G, Mouratiadou I, Glynn J, Hejazi M, Dessens O, Rochedo P, Pietzcker R, Schaeffer R, Fujimori S, Dasgupta S, Mima S, da Silva S R S, Chaturvedi V, Vautard R, van Vuuren D P, (2020). Impacts of climate change on energy systems in global and regional scenarios. Nature Energy, 5( 10): 794–802

[48]

Yorucu V, (2003). The analysis of forecasting performance by using time series data for two Mediterranean islands. Review of Social, Economic & Business Studies, 2: 175–196

[49]

Zhang J, Zhang Y, Wei Y, Wang Z, (2024). Normal and extreme impact and connectedness between fossil energy futures markets and uncertainties: Does El Niño-Southern Oscillation matter. International Review of Economics & Finance, 89: 188–215

[50]

Zhang M, Li W, Zhang L, Jin H, Mu Y, Wang L, (2023). A Pearson correlation-based adaptive variable grouping method for large-scale multi-objective optimization. Information Sciences, 639: 118737

[51]

Zhang Y, Ma F, Shi B, Huang D, (2018). Forecasting the prices of crude oil: An iterated combination approach. Energy Economics, 70: 472–483

[52]

Zhang Y, Wang Y, (2022). Crude oil price forecasting: A 30-year literature review and future directions. Journal of Systems & Management, 31( 6): 1169

[53]

Zhao C, Wang B, (2014). Forecasting Crude Oil Price with an Autoregressive Integrated Moving Average (ARIMA) Model. In: Cao B Y, Nasseri H, eds. Fuzzy Information & Engineering and Operations Research & Management. Advances in Intelligent Systems and Computing. Berlin, Heidelberg: Springer, 211: 275–286.1135–1144

[54]

Zhao Y, Li J, Yu L, (2017). A deep learning ensemble approach for crude oil price forecasting. Energy Economics, 66: 9–16

[55]

Zhu H, Huang X, Ye F, Li S, (2024). Frequency spillover effects and cross-quantile dependence between crude oil and stock markets: Evidence from BRICS and G7 countries. North American Journal of Economics and Finance, 70: 102062

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1824KB)

917

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/