1 Introduction

After annually doubling the total installed capacity of wind turbines from 2006 to 2009, the Chinese wind power industry has entered a stable phase of development with a cumulative installed capacity of 145362 MW until the end of 2015 [1]. Wind power forecasting is a basic support technique for wind power development, which is valuable for the efficiency of energy utilization and power flow control of wind farms [2, 3]. Relevant standards and codes have stressed the application of wind power forecasting systems to mitigate the adverse effects of wind fluctuations on the safe and stable operation of power system [4, 5]. Many forecasting systems have been developed with different operation schemes, which are adaptive to field conditions [6, 7]. For the past couple of years, wind power forecasting in China has grown from zeros to a point where almost all of the large wind farms currently have at least one forecasting system operating for the daily business routine.

The typical schemes for short-term wind power forecasting (with several hours to day-ahead forecasting horizons) usually involve the physical and/or statistical approach of wind speed prediction, and the power curve model, which can convert the results from wind speed to wind power. Currently, the field forecasting accuracy in China can barely meet the expectations of users, which may be caused by numerical whether prediction (NWP) precision, data quality, etc. Modeling forecasting errors to amend the results is a universal way to remedy the defects of prediction procedures, independent of the specific forecasting method used. Such an approach can be easily applied to improve forecasting accuracy effectively and automatically for commercial operating forecasting systems, which may cover dozens of wind farms and involve composite forecasting models.

Some studies have learned from the model output statistic (MOS) procedure of NWP to amend wind power forecasting errors, training the regression model based on the historical records of forecasting results and actual power output [8, 9]. Most of these studies applied linear models derived from the MOS procedure and originally designed for linear variables. Reference [10] indicates that the nonlinear features of forecasting errors are mainly derived from the typical nonlinear procedure of forecasting systems, i.e., the power curve model, which ideally meets Betz’ law, including the cube of the wind speed. Therefore, the wind power and its forecasting error sequence reflect nonlinear characteristics after the energy conversion procedure, though the model input variables, such as wind speed, may be linear. Over all, linear models contradict the nonlinear forecasting error sequence to some extent and have some limitations to characterize the relationship between the lagged variables.

References [11, 12] applied back propagation (BP) neutral networks and support vector machine (SVM) to learn the relationship between historical forecasting errors and other variables. Such intelligent learning methods use kernel function as the basis, which could extend the linear algorithms to the nonlinear relationships of variables. In this way, the fast and simple calculation steps of linear methods are retained, and the nonlinear features can be captured by the kernel-based algorithms. The basic idea is to substitute the inner product in the high-dimensional space by the kernel function value of the input vectors to avoid complex calculation procedures of the high-dimensional vectors, i.e., the kernel trick.

Owing to the stochastic nature of the process, the forecasting accuracy of the target wind farm changes with the varying situations. This results in the fluctuant and non-stationary features of forecasting errors. Several publications detect the fluctuation characteristics of historical data to capture future trends of the errors [13, 14]. Reference [13] analyzes the statistics of the fluctuation and amplitude of wind power to establish the error estimation model. Reference [14] studies the numerical characteristics of the “recent forecasting errors” to predict the error of the next moment.

The non-stationary features denote that the mean and variance of the series is time-varying. For online operation, the model should be adjusted with the updated samples at each time stamp, so that it can reflect the latest changes of wind resource, turbines and other elements. Due to the practical constraints, the computational time of updating models should not increase significantly with the growing sample size. Thus, the challenge for the kernel-based intelligent methods is how to maintain a reasonable calculation speed with the growing sample size, in order to meet the efficiency requirement of online rolling modelling. Recursive least squares algorithms could ensure that the computational complexity introduced by a new sample is irrelevant to the sample updated time, which meets the demand of online applications [15].

The kernel recursive least-squares (KRLS) algorithm is a combination of the kernel method and recursive least squares, and has a remarkable computational time-saving effect based on its sparsity solution [16]. KRLS has been widely applied to nonlinear signal processing [1719], and a few explorations have also been performed for wind power forecasting. Some work discussed how to improve the accuracy of very short-term wind speed forecasting using KRLS [20, 21].

In this paper, a new error-amending approach is proposed to yield the potential benefit of statistical methods, consisting of several error forecasting models, which are set in an iterative way. We introduce the KRLS method to model the forecasting errors, which is adaptive to the non-linear and non-stationary characteristics of the errors. The input variables are selected by the teleconnection analysis of forecasting errors from neighboring wind farms. This can help capture the fluctuation characteristics in view of the spatial and temporal correlations in forecasting errors, contributing to the error prediction. The presented approach can be part of the aggregating wind power forecasting platform, which breaks down the information barriers among single wind farm systems and enables our approach to have a better performance in improving forecasting accuracy.

The rest of the paper is structured as follows. Section 2 gives a brief introduction on how to extend the general regression model to the KRLS one. The design of the error amending approach and the illustration of the teleconnection of forecasting errors among neighboring wind farms are presented in Section 3. The performance of the method for different horizons of wind power forecasts is demonstrated in Section 4, based on field data from an aggregated forecasting platform in China. Section 5 gives the conclusion.

2 KRLS method

2.1 General regression form

Generally, for a set of multi-input-single-output (MISO) samples, \([\varvec{x}_{i} ,y_{i} ]\), i = 1, 2, …, N, the regression model f(·) can be written as:

$$y_{i} = f(\varvec{x}_{i} ) + \varepsilon_{i}$$
(1)

where y i is the response variable at time i; \(\varvec{x}_{i} = [x_{i}^{1} ,x_{i}^{2} , \ldots ,x_{i}^{l} ]^{\text{T}}\) is the vector of explanatory variables, and ɛ i is the white noise. For wind power forecasting error modeling, y i is the forecasting error at time i, and \(\varvec{x}_{i}\) is the l-dimensional vector of relevant regression variables, e.g., auto-regression variables y iτ y iτ−1, …, y iτd , with τ as the modifying horizon and d as the time lag, or hybrid regression variables composed of forecasting errors from neighboring wind farms.

The least-squares algorithm is used to find the estimation \(\hat{f}( \cdot )\) of f(·) by minimizing the sum of squared residuals:

$$J = \sum\limits_{i = 1}^{t} {(y_{i} - \hat{f}(\varvec{x}_{i} ))^{2} }$$
(2)

By multivariate linear model (MLR), f(·) can be expressed as:

$$y_{i} = \varvec{b}^{\text{T}} \varvec{x}_{i} + \varepsilon_{i}$$
(3)

where b is the weight vector, and \(\varvec{b} \in {\mathbf{R}}{}^{l \times 1}\). At time t, the cost function is:

$$J(\varvec{b}) = \sum\limits_{i = 1}^{t} {(y_{i} - \varvec{b}^{\text{T}} \varvec{x}_{i} )^{2} } = \left| {\varvec{Y}_{t} - \varvec{X}_{t} \varvec{b}} \right|^{2}$$
(4)

where \(\varvec{Y}_{t} = [y_{1} ,y_{2} , \ldots ,y_{t} ]^{\text{T}}\) and \(\varvec{X}_{t} = [\varvec{x}_{1} ,\varvec{x}_{2} , \ldots ,\varvec{x}_{t} ]^{\text{T}}\).

2.2 KRLS model and its recursive formulation

The response variable x is mapped into a high-dimensional space by a fixed finite map ϕ. Thus, (3) and (4) are reformulated as:

$$\hat{f}(\varvec{x}) = \left\langle {\varvec{w},\phi (\varvec{x})} \right\rangle$$
(5)
$$J(\varvec{w}) = \sum\limits_{i = 1}^{t} {\left( {y_{i} - \left\langle {\varvec{w},\phi (\varvec{x}_{i} )} \right\rangle } \right)^{2} } = \left| {\varvec{Y}_{t} -\varvec{\varPhi}_{t} \varvec{w}} \right|^{2}$$
(6)

where 〈·, ·〉 means the inner product; w is the weight vector with the same dimension of ϕ and the map matrix \(\varvec{\varPhi}_{t} = [\phi (\varvec{x}_{1} ),\phi (\varvec{x}_{2} ), \ldots ,\phi (\varvec{x}_{t} )]^{\text{T}}\). Our aim is to have:

$$\varvec{w}_{t} = \mathop {\arg }\limits_{w} {\kern 1pt} \hbox{min} \left| {\varvec{Y}_{t} -\varvec{\varPhi}_{t} \varvec{w}} \right|^{2}$$
(7)

which is difficult to calculate in the high-dimensional space and can be expressed as:

$$\varvec{w}_{t} = \sum\limits_{i = 1}^{t} {q_{i} \phi (\varvec{x}_{i} ) =\varvec{\varPhi}_{t}^{\text{T}} \varvec{q}}$$
(8)

where \(\varvec{q} = [q_{1} ,q_{2} , \ldots ,q_{t} ]^{\text{T}}\) is the coefficient vector of \(\phi (\varvec{x})\) for \(\varvec{w}_{t}\). Plugging this into (6) and substituting the inner product by kernel function values, we obtain:

$$J(\varvec{q}) = \left| {\varvec{Y}_{t} - \varvec{K}_{t} \varvec{q}} \right|^{2}$$
(9)

where K t is the kernel matrix with dimension t × t, \(\varvec{K}_{t} (i,j) = k(\varvec{x}_{i} ,\varvec{x}_{j} ),\quad i,j = 1,2, \ldots ,t\) and \(k(\varvec{x}_{i} ,\varvec{x}_{j} )\) is the kernel function, which is chosen as the Gaussian kernel in this work:

$$k(\varvec{x}_{i} ,\varvec{x}_{j} ) = \exp \left( { - \left\| {\varvec{x}_{i} - \varvec{x}_{j} } \right\|^{2} /2\eta^{2} } \right)$$
(10)

where η is the bandwidth. The Gaussian kernel is a measure of the similarity between x i and x j , which is in accordance with the analysis of the correlations in forecasting errors. Studies about time series forecasting show that the Gaussian kernel could have a better performance than other functions, such as the polynomial and triangular kernels, when priori knowledge lacks [16, 22]. There are many researches on kernel methods about discussing the selection and construction of kernel functions, which continues putting forward more complicated kernels [23, 24]. Although this is not the focus of this work, we may use the relative achievements to enhance the performance of the error-amending model in future.

The classic least-squares algorithms have the dimension of K t be equal to the number of samples. As the sample size increases, the calculation time increases dramatically and over fitting may occur. We employ the sparse method in [16] to solve the problems, the basic idea of which is to make use of the selected sample dictionary \(D_{t} = \{ \tilde{\varvec{x}}_{i} \} ,\quad i = 1,2, \ldots ,m_{t}\), m t  < < t, instead of the whole dataset \(\{ \varvec{x}_{i} \} ,\quad i = 1,2, \ldots ,t\). Accordingly, K t is substituted by the selected kernel matrix \(\tilde{\varvec{K}}_{t}\) with dimension m t  × m t . Thus, the amount of computation to solve q t can be reduced greatly. The recursive formulations are listed as follows.

  1. 1)

    Initialization: \(k_{1,1} = k(\varvec{x}_{1} ,\varvec{x}_{1} )\), \(\tilde{\varvec{K}}_{1} = k_{1,1}\), \(\tilde{\varvec{K}}_{1}^{ - 1} = 1/k_{1,1}\),\(\tilde{\varvec{q}}_{1} = y_{1} /k_{1,1}\), the optimal expansion coefficient matrix A 1 = 1, D 1 = {x 1}, and set the sparse parameter υ.

  2. 2)

    At time t, the regression coefficients is updated as follows.

    1. a)

      Calculate the kernel vector \(\tilde{\varvec{k}}_{t - 1,t}\), \(\tilde{\varvec{k}}_{t - 1,t} \in {\mathbf{R}}{}^{{m_{t - 1} \times 1}}\) by \(\tilde{\varvec{k}}_{t - 1,t} (i) = k(\tilde{\varvec{x}}_{i} ,\varvec{x}_{t} ),\quad {\kern 1pt} i = 1,2, \ldots ,m_{t - 1}\);

    2. b)

      Define \(\varvec{s}_{t} = \tilde{\varvec{K}}_{t - 1}^{ - 1} \tilde{\varvec{k}}_{t - 1,t}\);

    3. c)

      Calculate the residual \(\varepsilon_{t} = k_{t,t} - \tilde{\varvec{k}}_{t - 1,t}^{\text{T}} \varvec{s}_{t}\).

If ɛ t  > υ, then the new sample x t is considered to be approximately linearly independent with the combination of D t−1, and should be added to the dictionary. The updated steps are:

$$D_{t} = D_{t - 1} \cup \{ \varvec{x}_{t} \}$$
(11)
$$\tilde{\varvec{K}}_{t}^{ - 1} = \frac{1}{{\varepsilon_{t} }}\left[ {\begin{array}{*{20}c} {\varepsilon_{t} \tilde{\varvec{K}}_{t - 1}^{ - 1} + \varvec{s}_{t} \varvec{s}_{t}^{\text{T}} } & { - \varvec{s}_{t} } \\ { - \varvec{s}_{t}^{\text{T}} } & 1 \\ \end{array} } \right]$$
(12)
$$\varvec{A}_{t} = \left[ {\begin{array}{*{20}c} {\varvec{A}_{t - 1} } & 0 \\ {0^{\text{T}} } & 1 \\ \end{array} } \right]$$
(13)
$$\tilde{\varvec{q}}_{t} = \left[ {\begin{array}{*{20}c} {\tilde{\varvec{q}}_{t - 1} - \frac{{\varvec{s}_{t} }}{{\varepsilon_{t} }}(y_{t} - \tilde{\varvec{k}}_{t - 1,t}^{\text{T}} \tilde{\varvec{q}}_{t - 1} )} \\ {\frac{1}{{\varepsilon_{t} }}(y_{t} - \tilde{\varvec{k}}_{t - 1,t}^{\text{T}} \tilde{\varvec{q}}_{t - 1} )} \\ \end{array} } \right]$$
(14)

Else, if ɛ t  ≤ υ, then x t could be approximately expressed by D t−1. Therefore, D t  = D t−1, the updated steps are:

$$\varvec{A}_{t} = \varvec{A}_{t - 1} - \frac{{\varvec{A}_{t - 1} \varvec{s}_{t} \varvec{s}_{t}^{\text{T}} \varvec{A}_{t - 1} }}{{1 + \varvec{s}_{t}^{\text{T}} \varvec{A}_{t - 1} \varvec{s}_{t} }}$$
(15)
$$\tilde{\varvec{q}}_{t} = \tilde{\varvec{q}}_{t - 1} + \tilde{\varvec{K}}_{t}^{ - 1} \varvec{r}_{t} \left( {y_{t} - \tilde{\varvec{k}}_{t - 1,t}^{\text{T}} \tilde{\varvec{q}}_{t - 1} } \right)$$
(16)

where r t is defined by \(\varvec{r}_{t} = \varvec{A}_{t - 1} \varvec{s}_{t} /\left( {1 + \varvec{s}_{t}^{\text{T}} \varvec{A}_{t - 1} \varvec{s}_{t} } \right)\).

  1. 3)

    Given x i , the corresponding estimation is:

    $$\hat{y}_{i} = \tilde{\varvec{k}}_{t,i}^{\text{T}} \tilde{\varvec{q}}_{t}$$
    (17)

where \(\tilde{\varvec{k}}_{t,i} (j) = k(\tilde{\varvec{x}}_{j} ,\varvec{x}_{i} ),\quad j = 1,2, \ldots ,m_{t}\).

3 Error amending of wind power forecasts

3.1 Amending approach

The historical samples are supposed to imply the properties of wind power forecasting errors. The designed approach is to modify the original forecasts by error forecasting model, which is trained and adjusted in the historical dataset. The process can be explained as follows.

Step 1: Partition the historical dataset into two time-lapse stages: DB and DB +, and the forecasting error of DB is:

$$x_{t} = p_{t}^{F} - p_{t} \quad t \in DB^{ - }$$
(18)

where p F t is the forecasting power at time t and p t is the actual power.

Step 2: Train the first order error forecasting model f 1 for {x t t ∊ DB } and apply it to modify the forecasts of DB +:

$$p_{t}^{F,1} = p_{t}^{F} - x_{t}^{f_{1}} \quad t \in DB^{ + }$$
(19)

where \(x_{t}^{{f_{1} }}\) is the forecast of x t .

Step 3: Test if either of the terminal conditions is valid: 1) the modified forecasting error \(x_{t,1} = p_{t}^{F,1} - p_{t} ,{\kern 1pt} \quad t \in DB^{ + }\) meets the required precision; 2) the modification order has reached the scheduled number. If so, go to Step 6; otherwise, go to Step 4.

Step 4: Train the second order error forecasting model f 2 for \(x_{t,1} = p_{t}^{F,1} - p_{t} ,\quad t \in DB^{ - }\) and apply it to modify the forecasts of DB +:

$$p_{t}^{F,2} = p_{t}^{F} - x_{t}^{{f_{1} }} - x_{t}^{{f_{2} }} \quad t \in DB^{ + }$$
(20)

where \(x_{t}^{{f_{2} }}\) is the forecast of x t,1.

Step 5: Similar to Step 3, check to decide whether the model order should be increased again as in Step 4 or if the target is attained and we can proceed to Step 6.

Step 6: A set of modification models {f i i = 1, 2, …, n} with the highest order n have been trained for the historical dataset DB = DB  + DB +.

Hybrid variables composed by forecasting error from the target wind farm and its neighbors are recommended by the authors for model f 1. For model \(f_{i} ,\;i > 1\), auto-regression is preferred considering practical problems such as computational efficiency and storage space for online systems.

The above process is mainly used to determine the modeling order, the type of input variables and the value of key parameters for recursive methods. For non-recursive methods, the model trained in DB can be directly applied to the next time domain.

The flowchart of the iterative forecasting error modification approach is summarized in Fig. 1. The final modification for the original forecast at time t is:

$$p_{t}^{F,n} = p_{t}^{F} - \sum\limits_{i = 1}^{n} {x_{t}^{{f_{i} }} }$$
(21)

where n is the highest modeling order of the amending approach.

Fig. 1
figure 1

Flowchart of iterative forecasting error modification approach

3.2 Teleconnection in forecasting errors

The spatial and temporal teleconnection in forecasting errors is the physical basis of whether the error amending approach could have a significant effect on forecasting accuracy. Due to the inertia of meteorological phenomena, the current forecasting error of an upwind wind farm may have a strong correlation with that of a downwind wind farm [25]. For meteorological studies, teleconnection means the stable correlation between synoptic processes that are away from each other spatially and temporally [26, 27]. Considering the similar physical background, we extend this concept to the stable correlation between forecasting errors of different wind farms

Let {x j,t } be the forecasting error sequence and j be the wind farm number. The teleconnection between forecasting errors of different wind farms is expressed by the cross-correlation function (CCF) as below:

$$\begin{aligned} r_{ij} (d) & = r(x_{i,t} ,x_{j,t - d} ) \\ & = \frac{{E[(x_{i,t} - \mu_{i} )(x_{j,t - d} - \mu_{j} )]}}{{\sigma_{i} \sigma_{j} }} \\ \end{aligned}$$
(22)

where r ij (d) is the CCF value between wind farm i and j with the time lag d; μ i and μ j are the means of {x i,t } and {x j,td } respectively; σ i and σ j are the corresponding standard deviation. It is of great importance for error-amending models with hybrid input variables to understand the teleconnection characteristics. After that, we can choose the proper parameters and regression variables based on the temporal and spatial correlation regularities of error fluctuation.

4 Case study

4.1 Data description

The data for the case study have been collected from an aggregated wind power forecasting platform in Jilin, China, which consists of seven wind farms, as shown in Fig. 2. Please note that Cap a = 99 MW, Cap b = 49.5 MW, Cap c = 99 MW, Cap d = 57.35 MW, Cap e = 249.9 MW, Cap f = 400.5 MW, Cap g = 49.5 MW, where Cap a, Cap b, Cap c, Cap d, Cap e, Cap f, Cap g are the capacity of WFa to WFg respectively. Both short-term forecasts, with forecasting horizons ranging from 1 to 24 hours ahead, and actual power data are collected and the time window covers from May, 2014 to October, 2015. The sampling interval of the forecasting and actual power data is 1 hour. The preprocessing work has removed the data that are apparently erroneous or influenced by wind power curtailment. The wind farm marked as WFa in Fig. 2 is chosen as the target wind farm, whose forecasting errors are to be amended. Figure 3 gives the auto-correlation function (ACF) of forecasting errors of WFa at different time lags. One can see that remarkable auto-correlation is seen at time lags that are less than 5 to 6 hours, which means that the forecasting error is very different from a white noise.

Fig. 2
figure 2

Positions of sampling wind farms used in case study

Fig. 3
figure 3

Auto-correlation of WFa (blue lines represent 95% confidence interval)

Figure 4 shows the CCF results of WFa and other wind farms. One can see that there are significant lag correlations between WFa and its neighbors, WFe and WFf, at time lags that are less than 10 hours. Therefore, according to the teleconnection analysis, forecasting errors from WFe and WFf are selected as the potential input variables to improve the forecasts of WFa. Note that for this case the teleconnection is easily captured because West (W) to Southwest (SW) is the prevailing wind direction of this region through 12 months, as shown in Fig. 5. Note that NNE stands for northeast by north; ENE stands for northeast by east; ESE stands for southeast by east; SSE stands for southeast by south; SSW stands for southwest by south; WSW stands for southwest by west; WNW stands for northwest by west; NNW stands for northwest by north.

Fig. 4
figure 4

CCF value of wind farm between WFa and its neighboring wind farms (blue lines represent 95% confidence interval)

Fig. 5
figure 5

Rose maps of wind direction of WFa, WFe, WFf

For cases under complex wind conditions, more attention should be paid to the influence of wind direction on the teleconnection in forecasting errors. If so, seasonal models according to different prevailing wind directions may be required.

4.2 Results

  1. 1)

    Modeling of wind power forecasting errors based on KRLS

Error amending for WFa is used as the example to illustrate the establishment of KRLS model, for which the modifying horizons are 1 to 24 hours, i.e., τ = 1, …, 24.

First, to train the first order model, the response variable and explanatory variables in (1) are defined as: the forecasting error at WFa, \(y_{i} = x_{a,i}\), and the historical forecasting error at WFa, WFe, WFf, \(\varvec{x}_{i} = [x_{a,i - \tau } , \ldots ,x_{{a,i - \tau - d_{1} }} ,x_{e,i - \tau } , \ldots ,x_{{e,i - \tau - d_{2} }} ,x_{f,i - \tau } , \ldots ,x_{{f,i - \tau - d_{3} }} ]^{\text{T}}\), where d 1 is the auto-regression step of WFa; and d 2, d 3 are the hybrid regression steps of WFe and WFf, respectively. A modification model with modeling order n (n > 1) is established based on the auto-regression of the forecasting error after modification with order n − 1.

The key parameters to be chosen include the sparse parameter υ n for each order, the bandwidth η n , the hybrid regression step d 1, d 2, d 3, and the auto-regression step d ar,n . Generally, larger regression steps and smaller bandwidth parameters can help improve the modelling accuracy. When they are too large or small, however, the improvement on forecasting accuracy is very limited, resulting in a waste of computation time. Thus the optimal parameter selection principle is to check whether the mean squared error (MSE) alters significantly with the changes in parameters.

The total number of sample points after preprocessing is 4560, which are then normalized to the interval [0, 1] and divided into three groups: 1) 1 to 500 points are set for modeling initialization to accumulate the sample dictionary and steady the modeling effects; 2) 501 to 2000 are for the cross validation test of parameters; 3) The remaining 2001 to 4560 points are the accuracy test samples. The final models are determined as:

  1. a)

    First order: υ 1 = 0.1, η 1 = 1.6, d 1 = 6, d 2 = 4, d 3 = 4;

  2. b)

    Second order: υ 2 = 0.01, η 2 = 0.9, d ar,2 = 3.

  1. 2)

    Model comparison

The proposed model is marked as f 2 K,M . To have a further look at the effect of modification, the MLR is chosen as the benchmark method, which is widely used for MOS modeling. At time t, for samples \([\varvec{x}_{i} ,y_{i} ]\), i = 1, 2, …, t, the regression model is:

$$\varvec{Y}_{t} = \varvec{X}_{t}^{{\prime }} \varvec{b}_{t}^{{\prime }} + \varepsilon_{t}$$
(23)

where \(\varvec{Y}_{t} = [y_{1} ,y_{2} , \ldots ,y_{t} ]^{\text{T}}\); \(\varvec{X}_{t}^{{\prime }} = [\begin{array}{*{20}c} \varvec{I} & {\varvec{X}{}_{t}} \\ \end{array} ]\); I is the unit vector; \(\varvec{X}_{t} = [\varvec{x}_{1} ,\varvec{x}_{2} , \ldots ,\varvec{x}_{t} ]^{\text{T}}\); \(\varepsilon_{t} = [\varepsilon_{1} ,\varepsilon_{2} , \ldots ,\varepsilon_{t} ]^{\text{T}}\); \(\varvec{b}_{t}^{{\prime }} = [b_{t}^{0} ,b_{t}^{1} , \ldots ,b_{t}^{l} ]^{\text{T}}\) and l is the dimension of x i . \(\varvec{b}_{t}^{{\prime }}\) is estimated by least squares as:

$$\hat{\varvec{b}}_{t}^{{\prime }} = \left( {\varvec{X}_{t}^{{{\prime {\rm T}}}} \varvec{X}_{t}^{{\prime }} } \right)^{ - 1} \varvec{X}_{t}^{{{\prime {\rm T}}}} \varvec{Y}_{t}$$
(24)

which can then be extended to the next time domain to estimate the forecasting error.

All of the models that are compared with f 2 K,M are summarized in Table 1, and Fig. 6 gives the forecasting error amending results of all of the models, which is judged by the root mean squares error (RMSE). Note that the maximum value of the horizontal coordinates is set to be 14 due to the fact that all the models are failed when the modifying horizon is longer than 14 hours. The RMSE has been calculated as a percentage of nominal capacity of the wind farm. It can be seen that the modification effect varies with changing modifying horizons and different models. With the same type of explanatory variables, all of the MLR models have a poorer performance than KRLS. When the horizon is very short, i.e., τ = 1, the difference is not large. However, the performance of MLR models decreases very quickly as the horizon grows and the advantage of KRLS increases. When the modified results approach the original accuracy, the improvement will fluctuate slightly around the original one. Thus we identify the failure of the modification when the difference between the modified error and original error is less than 0.05%, the previous horizon of which is defined as the effective modifying horizon. Due to this result, the effective horizons of f 1 M,A , f 1 K,A , f 1 M,M , f 1 K,M and f 2 K,M are 5, 9, 7, 11, 11 hours, respectively. It can be seen that the effective horizon could be increased by applying more complicated nonlinear models and considering the teleconnection for input variables as well.

Table 1 Models to be compared with f 2 K,M
Fig. 6
figure 6

Comparison of different modification models

4.3 Discussion

  1. 1)

    Key factors of the modification effect for different horizons

To analyze the key factors of the amending effect, Fig. 7 is a bar diagram showing the improvement on RMSE by different models. When a model fails, the improvement will be set as 0. The key factors are as follows.

  1. a)

    Base value: f 1 M,A is a simple auto-regression linear model, which can be regarded as the base value of the modification;

  2. b)

    NL + NS: The improvement by f 1 K,A is due to its applicability to the nonlinear and non-stationary (NL + NS) characteristics of the samples;

  3. c)

    TC: The advantage of f 1 K,M over f 1 K,A is the consideration of the teleconnection (TC) in the samples;

  4. d)

    IT: f 2 K,M can modify the error of forecasting error iteratively (IT) for further improvement.

Fig. 7
figure 7

Key factors to improve forecasting accuracy at each modifying horizon

It can be seen from Fig. 7 that the contribution of ‘NL + NS’ to forecasting accuracy grows significantly as the horizon increases from 1 to 3 hours, which gradually becomes the major influencing factor of error amending. On the contrary, the importance of base value and ‘TC’ that are both based on the spatial and temporal correlation of the samples gradually wear off at such horizons, which conforms to the ACF and CCF results in Figs. 3 and 4. When the horizons continue increasing from 4 to 10 hours, the modification gradually becomes saturated with the contribution of ‘NL + NS’ decreasing and the teleconnection is the key factor to increase the effective modifying horizon of f 1 K,M , f 2 K,M . Note that in Fig. 7, base value, NL + NS, TC and IT are the error modifying results from,, and respectively. The RMSE has been calculated as a percentage of nominal capacity of the wind farm.

Table 2 demonstrates the forecasting error statistics of f 1 K,M and f 2 K,M to help illustrate the effect of multi-order models, including mean error (ME), variance, RMSE and maximal absolute error. One can see that the error bias, reflected by ME, has significantly decreased after the first order modification. Other statistical metrics can also be reduced when the horizon is short enough. The second-order model can achieve a better performance at shorter horizons, helping lower the amplitude and volatility of forecasting error. Although such a performance drops off as the horizon increases, most of its metrics are still better than those of the first order model. The principle of increasing the order of modification model is to strike a reasonable balance between improving the accuracy and model simplification. This is the reason why higher order models are not applied in this case study.

  1. 2)

    Modification for different error intervals

Table 2 Forecasting error statistics of f 1 K,M and f 2 K,M

The modified RMSEs for different error intervals are summarized in Table 3, in which it can be seen that the contribution of f 2 K,M is mainly due to its correction on the large forecasting errors, which wears off as the horizon increases. The phase error of short-term wind power forecasting is the lead or lag of the forecasts to actual power in the horizontal time axis [6]. A lot of large RMSE results are caused by such a kind of forecasting errors. At short time horizons, the phase error can be corrected to some extend by f 2 K,M due to its consideration for the teleconnection of forecasting errors, resulting in smaller RMSE values for large error intervals.

  1. 3)

    Computational time

Table 3 Modified RMSEs for different error intervals

A reduction in computational time for online application is ensured by the sparse feature of KRLS, which uses the selected sample dictionary D t for training instead of the whole dataset. The best combination of forecasting accuracy and computational efficiency is achieved by adjusting the sparse parameter υ. For example, Fig. 8 shows the results of υ 1 for f 2 K,M . Note that the calculation involved is based on the computing environment of MATLAB R2013a and Inter Core i7-3520M CPU of 2.90 GHz with an 8.00 GB RAM. With deceasing values of υ 1, the elapsed time keeps growing, while the changes in RMSE can be divided to three stages.

Fig. 8
figure 8

Computational efficiency with different values of sparse parameter

Stage 1: When υ 1 > 0.1, RMSE decreases with the reducing υ 1, because the size of D t is not large enough to ensure better forecasting accuracy;

Stage 2: When 0.05 < υ 1 < 0.1, RMSE remains approximately the same level;

Stage 3: When υ 1 < 0.05, RMSE increases dramatically with the reducing υ 1, due to the over-fitting effect with a redundant sample dictionary.

Thus, we choose 0.1 as the best value for υ 1. The principle to achieve the best combination is to have the best level of RMSE (e.g., no more than the minimum value of RMSE + 0.02%) with the elapsed time as less as possible.

5 Conclusion

This paper presents a novel error-amending approach for short-term wind power forecasting systems to improve accuracy, which is independent of specific forecasting algorithms. The approach consists of several error-amending models based on the KRLS method to reduce the error iteratively. The teleconnection in forecasting errors is probed to help establish the hybrid regression models for aggregated wind power forecasting systems. Compared with the linear methods derived from the MOS procedure of NWP, the proposed approach is more adaptive to the nonlinear and non-stationary characteristics of the forecasting errors, and the improvement effect is verified by the results of the case study.

In addition to reducing forecasting error, it is of importance to learn the mechanism of the accuracy improvement for different modifying horizons. The simulation results show that the nonlinear and non-stationary characteristics of the samples gradually become the primary factor for short modifying horizons that are less than 4 hours, and the teleconnection is the key factor for long-horizon modification. The multi-order models contribute to maximizing the effect of statistical methods at different horizons. The simulation also confirms the especial modification effect of our approach for large forecasting errors, which may help reduce the phase error for forecasting systems.

The proposed approach has merits for online application as well. The deficiency of weak generalization ability for many kernel-based intelligent algorithms such as support vector machines requires large datasets and long training times to ensure the modeling performance, which is computationally expensive and ineffective for online rolling situations. Such a problem can be significantly mitigated by the recursive and sparse features of KRLS. Simulation results show that the elapsed time is distinctly saved on the premise of modification effect by choosing a proper sparse parameter.

Another concern for online application of our approach is an open forecasting platform to support the data transfer among different wind farms, which can provide a solid foundation for improving the forecasting accuracy based on errors from neighboring wind farms. The hardware and software requirements for such a platform has been discussed in our recent work [28], which is an internet based platform with the forecasting server implemented at the service provider side. The target of the platform is to intelligently and flexibly deal with the teleconnection between wind farms, which may learn from the structure of multi-agent systems in future [29, 30].