1. State Key Laboratory of Operation and Control of Renewable Energy & Storage Systems, China Electric Power Research Institute, Beijing 100192, China; Department of Electrical Engineering, Tsinghua University, Beijing 100084, China
2. State Key Laboratory of Operation and Control of Renewable Energy & Storage Systems, China Electric Power Research Institute, Beijing 100192, China
wangzhaohsgd@126.com
Show less
History+
Received
Accepted
Published
2017-02-26
2017-04-06
2017-06-01
Issue Date
Revised Date
2017-04-14
PDF
(375KB)
Abstract
Unlike the traditional fossil energy, wind, as the clean renewable energy, can reduce the emission of the greenhouse gas. To take full advantage of the environmental benefits of wind energy, wind power forecasting has to be studied to overcome the troubles brought by the variable nature of wind. Power forecasting for regional wind farm groups is the problem that many power system operators care about. The high-dimensional feature sets with redundant information are frequently encountered when dealing with this problem. In this paper, two kinds of feature set construction methods are proposed which can achieve the proper feature set either by selecting the subsets or by transforming the original variables with specific combinations. The former method selects the subset according to the criterion of minimal-redundancy-maximal-relevance (mRMR), while the latter does so based on the method of principal component analysis (PCA). A locally weighted learning method is also proposed to utilize the processed feature set to produce the power forecast results. The proposed model is simple and easy to use with parameters optimized automatically. Finally, a case study of 28 wind farms in East China is provided to verify the effectiveness of the proposed method.
Zhao WANG, Weisheng WANG, Bo WANG.
Regional wind power forecasting model with NWP grid data optimized.
Front. Energy, 2017, 11(2): 175-183 DOI:10.1007/s11708-017-0471-9
Climate change mitigation becomes a hot topic around the world. Since the Paris climate summit was held in December 2015, many countries have been making great efforts in reducing the greenhouse gas (GHG) emission. An energy target made by the Chinese government intends to increase the share of non-fossil energy consumption in the total primary energy consumption to 20% in 2030 [1]. Wind energy, as a fast-developing renewable energy source, has the potential of reducing the emission of GHG. However, the variable and uncertain nature of wind energy brings many problems to the power system with the high-level penetration of wind power. Wind power forecasting (WPF), as an important tool, has been extensively studied to solve these many problems, such as energy trading [2–4], unit commitment (UC), economic dispatch (ED) [5,6], determining the operating reserve requirements [7,8], energy storage optimization [9,10] and so on.
The capacity of wind power generation increases dramatically these years. According to the latest report [11] provided by Global Wind Energy Council (GWEC), the new global total for wind power at the end of 2015 was already 432.9 GW. Under this circumstance, a lot of wind farms have been built in areas with rich wind energy resources. In the specific region concerned by transmission system operators (TSO) and regional energy traders, the number of wind farms is large, and the total wind power outputs can directly affect the decision-making of power system operation and electricity market management. Thus, the forecast of the aggregated wind power outputs for regional wind farm groups is of great importance. Due to the spatial correlation among the outputs of regional wind farms, the regional situation is different from that of a single wind farm. As revealed in Ref. [12], the regional forecast turns out to be more precise because of spatial smoothing effects. A simple method for achieving this aggregated forecast is to directly add up the forecasts of all the wind farms, but a point forecast system has to be established for each wind farm in the region considered, which can be too costly for some forecast service providers, and as is described in Ref. [13], the online information is not always available for all wind farms concerned, especially when the number of wind farms is large.
A few researchers have studied the case of regional WPF in recent years. An aggregate prediction method is proposed in Ref. [14] to search for the samples with similar weather conditions. The proposed methodology is compared with the predictions obtained with the WPF tool used by the Spanish system operator. However, it does not include the construction of better feature sets when the data in the weather set is redundant and high-dimensional. In Ref. [15], the k-means clustering algorithm and mutual information-based feature selection (MIFS) algorithm are studied to determine the best feature set for the forecast model. A principal component analysis (PCA) method is used in Ref. [16] to reduce the dimension of datasets when forecasting the regional wind power and solar irradiance. In order to probabilistically forecast regional wind farms, a recursively backtracking framework based on the particle filter is provided in Ref. [17].
A complete scheme for regional wind power forecasting is proposed in this paper. Two different methods for constructing the optimized feature set and a locally weighted learning method based on the constructed feature set are also introduced. The locally weighted learning method, whose parameters can be estimated automatically, has the ability to select the samples with the similar weather condition according to the target point, and establishes the proper historical power sample set to produce the power forecast result. Finally, a case study of 28 wind farms in East China, in which the differences of the two kinds of feature set optimizing methods are analyzed, is provided to verify the effectiveness of the proposed model.
2 Methodology
In this paper, the total power output P of the regional wind farms is the target variable. Meteorological variables, such as the wind speed v, of the regional numerical weather prediction (NWP) grid points are explanatory variables, which are also known as features (or attributes) in pattern recognition and machine learning. The number of NWP grid points is large when the geographical space considered is broad, which can lead to a high dimensionality problem. Since the grid points in the considered region are influenced by similar weather systems, the outputs of these NWP are more or less correlated with each other, which also means that some information in these features are redundant. At this point, it is of great importance to obtain useful information without such high dimensionality from these correlated input data. Feature selection and feature extraction are the answers to this question. Feature selection makes it possible to select an appropriate subset of the original feature set to be used in the later model construction, while feature extraction constructs new features in a reduced dimensional space by transforming the original features with some combinations.
2.1 Feature selection based on mRMR
The feature set of m regional NWP grid points is represented by Vm = {vi, i =1, ..., m}, while Sn = {vi, i =1, ..., n }denotes the n-dimensional subset of Vm. Take the 28 NWP grid points for example, the number of all the potential 10-dimensional subset is = 13123110. Therefore, it is inefficient to examine the model performance with all these many subsets. The criterion of mRMR proposed in Ref. [18] provides a simple way to select the proper feature subset without much computational cost. Unlike the method of describing the dependence based on the mutual information in Ref. [18], the method simply based on two well-known nonparametric measures of dependence, namely Spearman’s rho and Kendall’s tau. Compared with the classic Pearson’s correlation coefficient, these two measures have the advantages of describing variable pairs which have nonlinear functional dependence. In the area of wind power forecast, this kind of dependence is very common. The details of the two measures are introduced in Ref. [19]. For an n-dimensional subset Sn, the maximal relevance (MR) criterion can be obtained by
where Corr is the chosen nonparametric measure. However, it cannot satisfy the MR criterion alone. Because if two highly-dependent features are chosen, the total information provided by them to explain the target variable would be similar to that provided by a single feature. Therefore, it is necessary to add the criterion of minimal redundancy (mR), which is given by
With Eqs. (1) and (2), the object of the problem of mRMR can be obtained as
The incremental search method can be used to find the near-optimal feature set. Suppose the (n–1)-dimensional feature set Sn–1 is achieved, the rest of the features are included in the set {Vm– Sn-1}={vk, k =1, …, m–n+1}. The incremental search algorithm is to choose the best feature in the set Vm– Sn–1which satisfies the object function:
2.2 Feature extraction with PCA
Feature extraction is related to dimensionality reduction, and PCA is an efficient method to accomplish this work. The main idea of PCA is to reduce the dimensionality of a data set consisting of many interrelated variables, while retaining as much as possible of the relevant information present in the data set. The relevant information is represented by the variance of principal components. The principal components (PC) are achieved by line combinations of original feature variables, which are ordered by their variance. For example, the first component summarizes the maximal information over the data set, while the second one optimizes the rest of the information under the constraint of non-correlation with other components. At last, the dimensionality of the original data set is reduced by choosing the first several orthogonal components to establish the new data set.
For convenience of description, the forecasted wind speed for M grid points across the region at time i is represented by , in which, the superscript m corresponds to the number of features in the set Vm. Given the input data with N samples, the above-mentioned feature set Vm can be denoted by , which represents the set of vectors vi at different time points. In this paper, PCA is done by the eigenvalue decomposition of the data covariance matrix. The sample covariance matrix S of Vm is obtained by
where
The eigenvalues of S are l1≥l2...≥lm and the corresponding eigenvectors are u1, …, uM. With the first l (l≤m) eigenvalues selected, the component vector for vi can be achieved by
The percentage of original variance explained by the PCA set Xl = is C%, which is shown in Eq. (8).
According to the introduction of Ref. [20], three methods are mentioned to determine the dimension l of the PCA algorithm:
(1)Set the value of C% first and then choose the dimension l.
(2)Select the principal components with eigenvalues larger than one.
(3)Determine the dimension l with the help of “Scree Plot”.
Since the eigenvalues of most components are very small, the second method is sufficient to get all the information needed. Therefore, the second method is adopted in this paper.
2.3 Locally weighted learning method for regional wind power forecasts
With the feature set achieved either by mRMR or the PCA method mentioned previously, a local model based on the weighted average is used to predict the total power outputs. This memory-based learning method can search for the relevant data in its stored historical dataset to answer a particular query [14]. A detailed survey of this kind of model can be seen in Ref. [21]. The relevant data are selected according to the similarities (defined by a distance function d) between NWP wind speed vectors vi and vt+h, where vi is the vector in the historic feature set, while vt+h is the vector for the target time t+h. The distance function for vi and vt+h is given by
where w1,…,wm are the weights corresponding to the dimensionalities of the feature set. In the case of mRMR, the values of w1,…,wm are equal to 1, which leads to the Euclidean distance, while in the case of PCA, corresponding eigenvalues are assigned to be the weights, which can reflect the significance of different features in the PCA feature set Xl.
Given that Pi is the stored total power output corresponding to the vector vi, the total power forecast can be obtained by the K nearest neighbors as shown in Eq. (10).
is obtained through a simple k-Nearest Neighbors (k-NN) algorithm [22]. The weighting function w in Eq. (10) should have the ability to gain the maximum value at zero distance, and its value should decay smoothly as the distance increases. In this paper, a Gaussian kernel as expressed in Eq. (11) is adopted.
where d is the distance, and m is the parameter that determines the locality of the model. It can be seen from Fig. 1 that a bigger m can cause a slow decay of weight.
When m tends to be infinitely great, all the weights tend to be equal to 1. To achieve the best parameter m, an optimization problem of function (12) is solved by the simplex algorithm [23]. A k-fold cross validation can be used to partition the training dataset for the optimization.
where F is the object function which is the sum of square errors as Eq. (13) shows, and arg min means the argument of the minimum.
Finally, combining the feature set construction method with the forecast model, a flowchart summarizing the whole process is illustrated in Fig. 2.
There are three major parts in the flow chart, which are constructing the feature set (corresponding to Subsections 2.1 and 2.2), predicting the total output power with the processed feature set (Subsection 2.3), and evaluating the forecasting results (Subsection 2.4). The input data needed by the proposed model are also listed in the flowchart, which include the chosen NWP grid data and the total wind power output of the considered region.
2.4 Point forecast evaluation
Bias, mean absolute error (MAE), root mean square error (RMSE) and Pearson’s correlation coefficient are used to evaluate the performance of point forecast.
Bias is achieved by calculating the mean of differences between the prediction and the observation Pi, as shown in Eq. (14). The bias can evaluate the systematic error. Zero is the perfect value for this measure. A positive (negative) value of bias indicates underestimation (overestimation) of the observation.
MAE is the average absolute errors over the test set, whose formula is given by Eq. (15). Lower MAE values indicate higher forecast accuracy.
RMSE is the squared root of the average squared errors over the test set, whose formula is given by Eq. (16). Similar to MAE, lower RMSE values indicate better performance in accuracy.
Pearson’s correlation coefficient is used to assess the linearity between the predictions and observations.
In this paper, bias, MAE and RMSE are normalized by the total capacity of wind farms.
3 Case study
In this paper, 28 wind farms with a total capacity of 2123.6 MW in East China are selected to verify the effectiveness of the proposed methods. The locations of these wind farms are depicted in Fig. 3. The wind power forecast results for each wind farm and the corresponding NWP data from May 5, 2014 to March 24, 2016 are provided by China Electric Power Research Institute (CEPRI ), while the observations of the wind power outputs are collected from supervisory control and data acquisition (SCADA) system. After deleting invalid data, about 9620 observations with hourly temporal resolution are left. The initial training set includes 5060 observations, while the length of the testing set is 4560. The WRF(weather research forecasting)-based mesoscale NWP provided by CEPRI obtains its initial and boundary conditions from the Global Forecast System (GFS ), the global environment multi-scale model (GEM) and the Japan spectrum model (JSM) backgrounds with the technologies of real time four dimension data assimilation (RTFDDA) and rapid update cycle (RUC). The spatial resolution of the NWP grid points is 9×9 km. The predicted wind speed at 100 m AGL is the main meteorological variable applied to the wind power forecast model. The NWP data used in this paper are updated every 24 h, so that the proposed day-ahead wind power forecast with 24 hours forecast horizon is generated each time when the new NWP data are available.
The subsets of the feature set selected based on the criteria of mRMR and max-dependence are provided in Tables 1 and 2. All the wind farms are sorted from 1 to 28 based on these two criterions respectively. By using the max-dependence method, the features can be sorted by their dependences with the target variable, and the first few features with high dependence values are, then, selected to form the feature subset. The max-dependence method only searches for high dependent variables, which fails to reduce the redundant information among the variables. By selecting the first 10 wind farms in each subset, the locations of the 10 wind farms can be marked, as shown in Fig. 4. It is clearly observed that the wind farms selected by mRMR have larger average distances from each other. In practice, the wind farms close to each other tend to have much redundant information due to the adjacent weather system and the influence of similar topography. By selecting the proper feature subset, the redundant information is reduced, and the forecast service providers can purchase fewer gird points of NWP data, which can save costs with improved forecast performance.
As a dimension reduction technology, PCA is used to establish the new feature set with fewer features which can represent the large variance in the original set. In the screeplot of Fig. 5, the cumulative percentage of total variance explained by the first four principal components is as large as 95%. The number of components chosen as the new feature set can be determined according to the method of selecting the components with eigenvalues higher than 1.
The forecast performance (evaluated by MAE and RMSE) with different number of features is demonstrated in Fig. 6. Four methods including the feature selection methods based on mRMR of different dependence measures such as Kendall’s tau and Spearman’s rho, the feature selection method based on max-dependence, and the feature extraction method with PCA are used to compare with each other. No significant difference can be seen from feature selection methods based on different dependence measures (Kendall’s tau and Spearman’s rho). The method of Kendall’s tau is able to achieve its best performance of RMSE with 11 features, while the method of Spearman’s rho is able to achieve its best performance of RMSE with 22 features. Therefore, the method of Kendall’s tau is preferred for its ability to reduce the feature set dimensionality. As for other cases, it is recommended that different measures should be attempted to select the fittest dependence measure. Forecast errors of these two methods decrease rapidly for the first 5 features, and the fluctuations of RMSE and MAE tend to be small for these two methods as the dimension is larger than 10. The forecast error of the max-dependence method decreases rather slowly because the max-dependence method fails to select the most useful feature with less redundancy each time the dimension of the set increases, and the best performance of the max-dependence method is achieved when all the 28 wind farms data are included, which is not the result wanted. As for the method of PCA, no matter how many principal components are included at last, the data of the 28 wind farms have to be considered when performing PCA. Since nearly 80% of the variance can be explained by the first component as shown in Fig. 5, the PCA method achieves the best results for the dimension smaller than 5. For the case of very high dimensions, PCA proves to be the most effective way to simplify the original data set by extracting several principal components without losing much information.
In Table 3, the best dimensions for different feature set construction methods are provided according to RMSE. The forecast model of the first four methods is the proposed locally weighted learning model, while the method named “original” is achieved by adding up the power forecast results of each wind farm provided by CEPRI. The PCA method is able to provide good performance with the lowest dimensions. In practice, dimensions lower than 5 are also acceptable, which is very useful when many wind farms are considered and the subset selected by the other methods is still a large one. The MaxDep method is poor in selecting the feature set, and it achieves the best performance when all wind farms are considered. It can be used to compare with the original method, which can demonstrate the advantage of the proposed locally weighted learning model. The reasons for this phenomenon might be that the total output results are constrained by the accuracy of the forecast result of each wind farm which is only based on its own NWP data, and the way of simply adding them up cannot distinguish the contributions of different wind farms to the total power output. The main improvement of the proposed method comes from correcting the high bias of the original method. The time series of the forecast results and measurements of a period of one month are provided in Fig. 7. The improved forecast corresponds to the method of mRMR(t) in Table 3. The forecast results produced by the original method are used as the reference.
4 Conclusions
Wind power forecasting at the regional level can significantly influence the decisions made by TSOs and local energy traders who can collect the total wind power records conveniently. For this application scenario, the method designed in this paper can map the available explanatory variables, such as the regional NWP data, to the total wind power output directly. Compared with the traditional method of adding up the power forecasting results of each wind farm, it saves the work which has to establish one forecast system for each wind farm. The influence among different wind farms can be included when modelling these wind farm groups together.
Since the feature set of this problem is of high dimensions (corresponding to the many NWP grid points of the considered region) and some features in it are redundant or irrelevant, two different feature-set construction methods are provided. The first one is to select the feature subsets based on the criterion of mRMR, which is able to reduce the number of the needed NWP grid points and improve the forecast performance at the same time. With the feature set selected, the cost of producing or purchasing NWP data is reduced, and the forecast model can be simplified with lower dimensions. The second one is based on PCA, which can dramatically reduce the dimensions through different line combinations of the original variables. For the case whose feature subset selected by mRMR is still of high dimensions, PCA can be a good choice to simplify the feature set.
A case study of 28 wind farms in East China proves that the proposed methods have a very good performance. Other works such as the description of the uncertain information in regional forecast and the use of the forecast results in solving the actual power system decision making problems are the future concerns.
Xinhua Net. China-U.S. joint presidential statement on climate change. 2017-1-12
[2]
Pinson P, Chevallier C, Kariniotakis G N. Trading wind generation from short-term probabilistic forecasts of wind power. IEEE Transactions on Power Systems, 2007, 22(3): 1148–1156
[3]
Botterud A, Zhou Z, Wang J, Bessa R J, Keko H, Sumaili J, Miranda V. Wind power trading under uncertainty in LMP markets. IEEE Transactions on Power Systems, 2012, 27(2): 894–903
[4]
González-Aparicio I, Zucker A. Impact of wind power uncertainty forecasting on the market integration of wind energy in Spain. Applied Energy, 2015, 159: 334–349
[5]
Makarov Y V, Etingov P V, Ma J, Huang Z, Subbarao K. Incorporating uncertainty of wind power generation forecast into power system operation, dispatch, and unit commitment procedures. IEEE Transactions on Sustainable Energy, 2011, 2(4): 433–442
[6]
Botterud A, Zhou Z, Wang J, Sumaili J, Keko H, Mendes J, Bessa R J, Miranda V. Demand dispatch and probabilistic wind power forecasting in unit commitment and economic dispatch: A case study of Illinois. IEEE Transactions on Sustainable Energy, 2013, 4(1): 250–261
[7]
Bessa R J, Matos M A, Costa I C, Bremermann L, Franchin I G, Pestana R, Machado N, Waldl H, Wichmann C. Reserve setting and steady-state security assessment using wind power uncertainty forecast: a case study. IEEE Transactions on Sustainable Energy, 2012, 3(4): 827–836
[8]
Menemenlis N, Huneault M, Robitaille A. Computation of dynamic operating balancing reserve for wind power integration for the time-horizon 1–48 hours. IEEE Transactions on Sustainable Energy, 2012, 3(4): 692–702
[9]
Pinson P, Papaefthymiou G, Klöckl B, Verboomen J. Dynamic sizing of energy storage for hedging wind power forecast uncertainty. In: Proceedings of PESGM 2009. Alberta: IEEE, 2009, 1760–1768
[10]
Bludszuweit H, Domínguez-Navarro J A. A probabilistic method for energy storage sizing based on wind power forecast uncertainty. IEEE Transactions on Power Systems, 2011, 26(3): 1651–1658 doi:10.1109/TPWRS.2010.2089541
[11]
Global Wind Energy Council. Global wind report 2015. Brussels, Belgium, 2016, 9–10
[12]
Focken U, Lange M, Mönnich K, Waldl H P, Beyer H G, Luig A. Short-term prediction of the aggregated power output of wind farms—a statistical analysis of the reduction of the prediction error by spatial smoothing effects. Journal of Wind Engineering and Industrial Aerodynamics, 2002, 90(3): 231–246
[13]
Monteiro C, Bessa R, Miranda V, Botterud A, Wang J, Conzelmann G, Porto I. Wind power forecasting: state-of-the-art 2009. Office of Scientific & Technical Information Technical Reports, 2009
[14]
Lobo M G, Sanchez I. Regional wind power forecasting based on smoothing techniques, with application to the spanish peninsular system. IEEE Transactions on Power Systems, 2012, 27(4): 1990–1997
[15]
Siebert N. Development of methods for regional wind power forecasting. Dissertation for the Doctoral Degree. Paris: ENSMP (École Nationale Supérieure des Mines de Paris), 2008, 127–189
[16]
Davò F, Alessandrini S, Sperati S, Delle Monache L, Airoldi D, Vespucci M T. Post-processing techniques and principal component analysis for regional wind power and solar irradiance forecasting. Solar Energy, 2016, 134: 327–338
[17]
Li P, Guan X, Wu J. Aggregated wind power generation probabilistic forecasting based on particle filter. Energy Conversion and Management, 2015, 96: 579–587
[18]
Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226–1238
[19]
Genest C, Favre A. Everything you always wanted to know about copula modeling but were afraid to ask. Journal of Hydrologic Engineering, 2007, 12(4): 347–368
[20]
Jolliffe I T. Principal Component Analysis. Berlin: Springer, 1986, 111–147
[21]
Atkeson C G, Moore A W, Schaal S. Locally weighted learning. In: Aha D W. ed. Lazy Learning. Dordrecht: Springer Netherlands, 1997, 11–73
[22]
Freidman J H, Bentley J L, Finkel R A. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 1977, 3(3): 209–226 doi:10.1145/355744.355745
[23]
Lagarias J C, Wright M H, Wright P E, Reeds J A. Convergence properties of the Nelder-Mead simplex method in low dimensions. Siam Journal on Optimization A Publication of the Society for Industrial & Applied Mathematics, 1998, 9(1): 112–147
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.