Journal of Modern Power Systems and Clean Energy

ISSN 2196-5625 CN 32-1884/TK

网刊加载中。。。

使用Chrome浏览器效果最佳,继续浏览,你可能不会看到最佳的展示效果,

确定继续浏览么?

复制成功,请在其他浏览器进行阅读

Data-driven Approach for State Prediction and Detection of False Data Injection Attacks in Smart Grid  PDF

  • Haftu Tasew Reda
  • Adnan Anwar
  • Abdun Mahmood
  • Naveen Chilamkurti
1. Department of Computer Science and IT, La Trobe University, Bundoora, VIC 3086, Australia.; 2. School of IT, Deakin University, Waurn Ponds, VIC 3216, Australia.

Updated:2023-03-25

DOI:10.35833/MPCE.2020.000827

  • Full Text
  • Figs & Tabs
  • References
  • Authors
  • About
CITE
OUTLINE

Abstract

In a smart grid, state estimation (SE) is a very important component of energy management system. Its main functions include system SE and detection of cyber anomalies. Recently, it has been shown that conventional SE techniques are vulnerable to false data injection (FDI) attack, which is a sophisticated new class of attacks on data integrity in smart grid. The main contribution of this paper is to propose a new FDI attack detection technique using a new data-driven SE model, which is different from the traditional weighted least square based SE model. This SE model has a number of unique advantages compared with traditional SE models. First, the prediction technique can better maintain the inherent temporal correlations among consecutive measurement vectors. Second, the proposed SE model can learn the actual power system states. Finally, this paper shows that this SE model can be effectively used to detect FDI attacks that otherwise remain stealthy to traditional SE-based bad data detectors. The proposed FDI attack detection technique is evaluated on a number of standard bus systems. The performance of state prediction and the accuracy of FDI attack detection are benchmarked against the state-of-the-art techniques. Experimental results show that the proposed FDI attack detection technique has a higher detection rate compared with the existing techniques while reducing the false alarms significantly.

I. Introduction

THE emergence of cutting-edge information and communication technology with the power grid has transformed the energy ecosystem into the current arena of cyber-physical system known as the smart grid. However, their integration into the power grid has also brought a great number of vulnerabilities that pose breaches of data integrity, confidentiality, availability, and so forth. Threats to the smart grid can take many forms, from compromising meter reading, carrying out remote attacks against communication protocols, to compromising power system state estimation (SE) results.

Weighted least squares (WLS) [

1] is the most commonly used SE technique in the industry. As an alternative to WLS, the least absolute value (LAV) [2] is considered better for its robustness. Yet, LAV estimators are typically slow and thus insufficient for real-time system monitoring due to the non-convexity and non-smoothness [3]. Further, the conventional SE methods are facing a growing threat from an emerging data integrity cyber-attack known as false data injection (FDI) [4], [5]. Cyber criminals can launch FDI attacks by injecting malicious data over the measurement reading of intelligent electronic devices (IEDs) [1] and these attacks can be crafted in a way to bypass the bad data detection (BDD) process of the SE. In consequence, the FDI attack can mislead the outcome of the SE and lead to a myriad of security risks, including failure of power system operation. Moreover, the conventional SE methods are inadequate to fully track power system variables and customer load profiles that are changing dynamically.

The main motivation of this paper is the challenge of existing SEs against the incumbent FDI attack. Besides, because of the inherent complexity of power systems, the sheer volume of data and the fact that high-performance computing devices are becoming available, data-driven techniques are increasingly powering various applications of the smart grid. For example, [

6] has proposed a data-driven FDI attack design where subspace identification technique is employed for the attack construction, and the authors have investigated the attack detection scheme using coding theory in the cyber-physical system environment. Moreover, in [7], a data-driven method based on partial observable Markov decision process for an FDI attack against automatic voltage controls is evaluated using Q-learning algorithm, where a data-driven FDI attack construction and a data-driven defense strategy are suggested. To address the security issues of the existing SEs, this paper proposes a two-stage power state prediction and attack detection (SPAD) framework for power system security. The first stage aims to predict the power system states, and the second stage is used to detect the FDI attacks. Binary classification based on Kullback-Leibler (KL) distance is used for the detection of FDI attacks. Overall, the proposed SPAD framework aims to improve the detection of incumbent cyber-attack. Deep neural networks (DNNs) are used for the implementation of the proposed framework.

The main contributions of this paper are summarized as follows. ① A model is developed for the state prediction and a KL distance is derived as an attack detection metric. ② Experiments have been conducted considering attack-free scenario and a wide range of false data attack scenarios. In the proposed model, a false data attack alert is generated when the computed KL value of the predicted states is greater than a decision threshold. One of the main reasons why existing BDD techniques fail to detect FDI attack is that the dissimilarity measure (i.e., residual vectors of the SE) after the attack drops below the threshold of BDD. However, our proposed FDI attack detection technique is capable of adjusting the detection threshold adaptive using the probability distributions between estimated states and previously known attack-free states. ③ In addition, the experiments incorporate medium- to large-scale power system transmission networks (namely 39-, 118-, 300-, and 500-bus systems) to evaluate the scalability of the proposed state prediction model. Besides, the proposed model has been evaluated towards network topology changes and compared with the WLS-based SE models. ④ Moreover, the numerical results show that the proposed model can perform estimations very similar to the results of the WLS estimator. The estimation error in terms of mean square error (MSE) of the predictive SE and the WLS is in the order of 10-3, which is acceptable [

1] for the purpose of power system SE. ⑤ Finally, the proposed detection technique for FDI attack detects false data attacks with an accuracy of over 98% compared with the existing false data attack detection with an accuracy of below 85%.

The remainder of this paper is organized as follows. First, the background on measurement models and security of the power system SE is briefly presented. Next, Section II reviews the existing literature where related works of existing SE techniques and FDI attack detection techniques are discussed in Sections II-A and II-B, respectively. Section III discusses the architecture and methodology of the proposed SPAD framework and covers comprehensively each method of state prediction and SPAD. Section IV illustrates the experimental setup and performance evaluation of the proposed state prediction. Moreover, the adversary models used within the proposed SPAD framework are examined in Section V. Further, Section VI presents the numerical results and discussion of the proposed SPAD framework. Finally, this paper is concluded in Section VII.

In a control center, an SE aims to obtain optimum system states based on the received measurements. The m-dimensional measurement vector y can be formulated through a non-linear AC model or a linearised DC model [

1], represented by y=h(x)+w and y=Hx+w, respectively, where xRn×1 is the n-dimensional state vector; h() or HRm×n is a function relating measurements and state vectors (also known as the Jacobian matrix); and wRm×1 is a noise attributed to the measurement errors. Considering a Gaussian noise distribution approximated by 𝒩(0,σi2), the WLS estimator uses minimization of weighted sum of the residual squares [1]. After estimation is performed, the SE conducts detection of malicious data, usually through BDDs [1], [8], e.g., chi-squared distribution (χ2) and the largest normalized residue (LNR) tests. The BDDs compute the residual vectors in terms of 𝓁2-norm (where 𝓁2-norm of r is defined as ||r||22=r2) between the original measurements y and the estimated measurements y^=Hx^, given by ||r||22=||y-Hx^||22. However, [4] and [5] have indicated that if ||r||22<τ, it also holds true that ||rfalse||22<τ for detection threshold τ. This implies that the FDI attack vector (which can be represented by a=Hb=[a1,a2,...,am]T, where b=[b1,b2,...,bn]T is the error vector injected by the adversary) can successfully bypass existing BDD techniques that further calls for the development of new SE and/or new detection schemes.

II. Related Work

A. Review on SE Methods

Various SE methods have been reported in the literature. Reference [

9] has summarised that the SE problem formulation can follow optimization-only [1]-[3], hybrid machine learning (ML) optimization [9], [10], or just data-driven methods through ML algorithms [11], [12]. Various existing researches on power system SE deal with optimization schemes including WLS and LAV using convex or non-convex iterative solvers. However, the traditional SE models have a number of limitations. First, they are vulnerable to bad data injection including the FDI attack which has been demonstrated in a number of literatures including [4], [5]. Second, the SE algorithms incur high computational complexity as the estimation process mainly depends on initializations [13]. Third, the estimation procedures take longer time [13] that may leave operators to wait between successive measurement instants. Furthermore, the SEs are challenged by the ever-increasing scale and dynamics of the power system [9], [10]. In contrast, data-driven methods can use states from both current and previous timeslots, and can achieve better performance in accuracy, efficiency, and stability. The application of artificial neural network (ANN) in power system SE is one of the contemporary data-driven researches [11], [12]. Autoencoder-based ANN for power system SE using measurement data has been proposed in [12]. Recent research in [10] has shown that DNN-based SE methods outperform the previously mentioned neural network (NN) techniques. In this paper, we propose a predictive power SE based on deep recurrent NN (DRNN) that utilizes the physical network of the power system.

B. Review on FDI Attack Detection Techniques

1) Detection Categories

χ2 and LNR are two mostly used attack detectors in the SE. However, it has been shown earlier that they are vulnerable to FDI attack. A considerable amount of research has been done in the mitigation strategy against FDI attack and can be broadly classified into three main categories: protection, detection based on SE, and detection based on ML [

8]. The first type, which accounts for the majority of the mitigation strategies against the FDI attack, aims to combat the attack by protecting a set of measurement devices [8], [14] (e.g., graph-theoretic, game-theoretic, and topology perturbation). These strategies, however, have a few drawbacks. First, they could only reach on some minimum number of measurements required to ensure the system observability. In addition, the solutions provided could increase burdens to the operator (for example, the perturbation method). Second, their implementation is not economically feasible while adding the protections over a large number of IEDs or other consumer-side devices. Finally, the protection depends on external elements such as a secure IED; however, if the IED itself is compromised, it will affect the method. Detection methods on the basis of SE include forecasting [8], signal processing [15], and statistical modeling [16].

2) Detection Based on Statistical Distance

Statistical-based detection schemes [

8], [14] are further divided into cumulative sum test, quickest change detection, and detection based on statistical distance. A statistical distance quantifies the consistency of two probability distributions. Compared with the other categories, the distance-based detection methods achieve significant performance against the incumbent cyber-attacks. KL and Jensen-Shannon (JS) distance are two main statistical distance-based methods which have recently been used for detecting malicious measurements. Reference [16] proposes FDI attack detection for power system measurements using KL distance metric. Similarly, [17] suggests FDI attack detection leveraging KL distance metric and joint power and log transformation, where the latter is used to transform measurement variations to improve detection probability. The JS distance metric has been used for attack detection [18] and electricity theft detection of advanced metering infrastructure networks [19].

Table I summarises a comparison of the proposed SPAD framework in this paper with existing works, where PD is the detection probability; FPR represents false positive rate; TPR represents the true positive rate, and AUC is the area under the ROC curve.

Table I  Comparison of Proposed SPAD Framework with Existing Works
TypeComparison attributes[16][17][18][19]Proposed
SE Approach WLS WLS WLS WLS Data-driven
Power flow model AC AC AC AC DC and AC
Estimation performance Not considered Not considered Not considered Not considered Considered
Computational efficiency Low Low Low Low Low
Detection Proposed detection approach Detection of FDI using KL distance Detection of FDI using KL distance and joint image processing Detection of FDI using JS distance Detection of energy theft using JS distance Detection of FDI using predictive SE and KL distance
Probability distribution based on Estimated measurements Estimated measurements Estimated measurements Estimated measurements Predicted states
Adaptive threshold Not considered Not considered Not considered Not considered Considered
Detection performance Detection parameter PD PD, FPR PD FPR v.s. TPR FPR v.s. TPR
Detection accuracy High High Unknown Unknown Very high
AUC Not considered Not considered Not considered Not considered Considered
Recall Not considered Not considered Not considered Not considered Considered
Precision Not considered Not considered Not considered Not considered Considered

In summary, this paper is unique in the following aspects. One of the key differences lies on the SE method. We have taken the unique FDI attack detection technique using predictive SE, which is different from the existing WLS-based FDI attack detection techniques. The main idea is to avoid WLS-based SE, which is vulnerable to the majority of stealthy FDI attacks. The other feature of this paper is based on the detection threshold. If the detection threshold is relatively high, existing detectors will incorrectly report a false negative, and existing detectors report a false positive when the threshold is very small. This can be challenging especially when the malicious user can inject sparse attack vectors into the measurements. The proposed binary classification algorithm uses adaptive detection threshold using probability distributions of the normal and attack data. Instead of using a default threshold obtained from the KL distance metric, the detector is evaluated using a number of thresholds, where one that results in the optimal detection performance is selected as the optimal decision threshold. Furthermore, unlike existing works on the KL-based detectors, this paper has evaluated a number of detection performance metrics including receiver operating characteristic (ROC) curve, AUC, recall, and precision.

III. Architecture and Methodology

A. Proposed SPAD Framework

Figure 1 shows a block diagram of power system SE and attack detection. Figure 1(a) shows the procedures of SE and attack detection adopted in the conventional energy control center, where SCADA stands for the supervisory control and data acquisition system; and EMS stands for the energy management system. Figure 1(b) is the proposed architecture of the SPAD framework, which augments the SE procedures and cyber-attack detection.

Fig. 1  Block diagram of power system SE and attack detection. (a) Considering conventional framework. (b) Considering proposed SPAD framework.

Conventional SE requires network topology processing [

1] to be performed prior to the SE procedure. This is specifically indicated by the “network topology processor” in Fig. 1(a), which collects data about circuit breakers and configures real-time system parameters. That is to say, once the network topology is known, the SE assumes that the topology is correct and continues with the estimation and BDD procedures. In our proposed SPAD framework, there is a module called “topology parameter settings” to account for the changes in topology. The following assumption is introduced.

Assumption 1 (topology parameter settings) In one training instance, it is assumed that the topology parameter does not change.

Assumption 1 implies that only one topology parameter is active at any point of time instead of combining the different network topology configurations. In other words, the topology parameter does not change for a configuration considered, and each training procedure is associated with each setup of system configuration. Hence, the datasets are created following the network topology configurations identified by the network topology processor, and are stored in the database of the control center.

For the implementation of the proposed SPAD framework, two DNN concepts namely DRNN and deep feed forward NN (DFFNN) are used. The proposed framework includes three modular elements, namely data acquisition and pre-processing module, predictive SE module (i.e., Stage 1), and the FDI detection attack module (i.e., Stage 2). These are discussed in the following subsections.

B. Data Acquisition and Data Pre-processing

In real-world scenarios, field devices installed over transmission networks measure electrical quantities, and relay their readings to the control centre through IEDs (e.g., Fig. 1(a)). In a typical modern EMS, system operators store plenty of historical measurement data in database systems for a variety of applications such as monitoring and security tools [

1]. In this regard, the ML model can be trained using the historical data. In this paper, the proposed data-driven state prediction technique is assessed using data from IEEE standard benchmark systems. The dataset includes measurement sets y as an input and x as an output. Nodal voltage angles/magnitudes, and sensor measurements constituting real power injections at the buses and real/reactive power flows across the branches are generated (as to be demonstrated in Section IV-A).

Further, real-time power load data are used which are obtained from Global Energy Forecasting Competition 2012 [

20], hereafter referred to as the GEFCom. The real-time power load data range from the fiscal year 2004 to 2008, altogether 56 months over 20 power utility regions of the USA. More details can be found in Section IV. In DL models, data preprocessing is a critical approach for achieving improved learning efficiency and accuracy. In this step, we deal with inconsistent data, dataset normalization, scaling, and reshaping. In addition, the output variables are scaled to reduce the size of the stochastic gradient descent (SGD), which is used to update weights, resulting in a more stable training model.

C. Proposed Power System State Prediction

The objective of the proposed state prediction model is to infer x based on y. The model is trained using a training data sequence given by St={(y1t,x1t), (y2t,x2t),..., (ymt,xnt)}. As mentioned previously, real-time power load data are used that span a range of time intervals. Hence, the matrix notations Y and X are used for the training and testing, respectively, where Y=y11y12y1ty21y22y2tym1ym2ymt, and X=x11x12x1tx21x22x2txn1xn2xnt.

Algorithm 1 summarises the step-by-step implementation of the power system state prediction. Following Assumption 1 on the topology parameter settings, the following assumption is also considered.

Algorithm 1  : power system state prediction

1: procedure ESTIMATE(Y)

2:  function loadData()

    Read network topology configuration

     for ν=1 to K do

     if ψ=Ψν then

      load Dataset{(Y, X)} corresponding to ψ

     end if

     end for

    end function

    Initialize weights, epoch, batchSize, learningRate

3:  trainingSet80%(Dataset)

4:  testSet20%(Dataset)

5:  Configure model hyperparameters

6:  Compute non-linear activation functions

7:  Compute loss function over forward loop

8:  repeat

9:  Using ADAM optimiser and learningRate

10:  Backprop errors, Update weights

11:  function TRAINMOD(trainingSet)

12:   for each feature of training set do

13:    PSEModPSEMod.fit(trainingSet,epoch,batchSize)

14:    return PSEMod

15:   end for

16:  end function

17:  function TESTMOD(testSet)

18:   for each feature of test set do

19:    PSEValTRAINMOD.evaluate(testSet)

20:    return PSEVal

21:   end for

22:  end function

23:  end repeat until model improved

24:  function PREDMOD(y)

25:   for m=1 to M do

26:   Obtain ym, x̂TESTMOD.predict(y)

27:   return x̂

28:   end for

29:  end function

30: end procedure

Assumption 2 (network topology configurations) The power grid considered has a finite number of network topology configurations denoted by KN. Suppose ν is the index of the configurations with ν={1,2,...,K}.

The ESTIMATE(Y) procedure in Algorithm 1 is used to predict the system states based on the provided input Y. The procedure checks which topology configuration is active at any time point and starts loading the dataset, which is given by Dataset{(Y, X)}, using the loadData(·) function. The corresponding topology configuration is obtained from the topology parameter settings of Fig. 1 which is linked to the network topology processor. Suppose that the set of all finite network configurations is given by Ψ={Ψ1,Ψ2,...,Ψν}|ν. And suppose that the instantaneous network configuration is given by ψ. The procedure is thus expected to load the datasets corresponding to ψ.

Before training the model using the TRAINMOD(·) function, hyperparameters such as optimizer, batch size (represented as batchSize), and learning rate (represented as learningRate), etc. are configured in each layer and weights are initialized using the Xavier normal distribution [

21]. A training model PSEMod is built after fitting the model on the training set trainingSet and the hyperparameters. The function used to train the model is the TRAINMOD(·). Then, the trained model is evaluated on the testing set testSet. Here, the evaluated model is represented by PSEVal where the function is represented by TESTMOD(·). Finally, for each measurement reading m=1 to M, the state vector correponding to y is predicted using the improved model. This part is performed by the PREDMOD(·) function. Although there are plenty of SGD optimisers including Adagrad, Adadelta, etc. that are suitable for different NN models, ADAM [22] optimizer has been used during the training of the DRNN and DFFNN models. To reduce overfitting, regularization method based on dropout [22] has been used while training the DRNN/DFNN models. Further, Huber loss function is defined while training the DNN models. First, an attack-free dataset is used to train the model, then evaluated with and without the FDI attacks.

1) Motivation for Selection of DNN

RNNs are a class of NNs specialised for predicting a sequence of data involving time. In contrast to FFNNs, RNNs allow cyclical connections that can map to each output from previous inputs. Theoretical and experimental evidences [

23] show that DRNNs benefit from the depth of hidden layers and outperform the conventional and shallow RNNs. The depth of RNN is introduced in many ways [23]: input-to-hidden, hidden-to-output, and hidden-to-hidden. Further, for estimating states of correlated power system measurement data, DRNNs outperform the DFFNNs [23]. In this paper, DRNNs are chosen as they are much more robust for prediction and classification tasks than other NNs. Their prediction performance is also compared with the DFFNNs.

2) Model Configuration

For a given observation time t and number of features d, the training sequence for the network model is denoted as stRd. The construction of the model is defined through functions between the input, hidden, and output layers. The neuron at the lth hidden layer (represented by hltRd×nh, where nh is the number of the hidden neurons) receives the input vector of y and hidden neurons of the previous state. This is represented by (1).

hlt=ψl(yt,hlt-1)=ψl(ωlψl-1(ωl-1ψl-2,,(ω1ht-1+b1yt))) (1)

where ψl is the non-linear activation function of the lth hidden layer; and ωl and bl are the weight vector and bias vector, respectively.

D. Proposed FDI Attack Detection

The KL distance is applied in numerous cases [

16]-[19], [24] such as information theory, anomaly detection, data mining, and ML algorithms, and recently, in cyber-attack detection of power system. In this paper, the cyber-attack detection is formulated as a binary classification problem using the KL distance metric. By computing the KL value of the probability distributions between consecutive time steps (i.e., states at the previous time and states at the current time), the detector can effectively monitor the dynamics of the power system states. In the following, we first define the state variations and the corresponding probability distributions, which are important for the detection algorithm.

Definition 1   (state variation) The state variation Δxi (xix) is the difference between two consecutive states given by Δxi=xi(t)-xi(t-1), where t and t-1 denote the current and previous time instants, respectively. Let lΔxi and mΔx^i be the probability distribution of the previous Δx and the predicted current Δx^i, respectively.

Remark 1   After computation of lΔx and mΔx in Definition 1, the KL distance Λ from lΔx to mΔx^ can be expressed mathematically as:

Λ(mΔx^||lΔx)=mΔx^log2mΔx^lΔx (2)

Definition 2   (KL distance) Λ() is a non-negative number, and can be defined as:

Λ()=R+   mΔxlΔx0      mΔx=lΔx,xx (3)

Remark 2   Λ() of Definition 2 is asymmetrical, meaning that:

Λ(mΔx||lΔx)=Λ(lΔx||mΔx)    lΔx=mΔx,xxΛ(mΔx||lΔx)Λ(lΔx||mΔx)    xx (4)

Definition 3   (attack detection) The FDI anomaly detection is formulated as a binary classification problem (defined by (5)) as our aim is to classify measurement samples into normal and FDI attack classes.

δ(x^)=1    Λ(x^)τ0    Λ(x^)<τ (5)

where τ is obtained from a statistical estimation based on the confidence interval.

The confidence interval is given by:

c(ρ)=Λ¯±zσssqrt(S) (6)

where Λ¯ is the mean of the set of KL values; z is a value obtained corresponding to each detection confidence level ρ as given in Table I; and σs is the standard deviation of Λ samples with a sample size of S.

Table I  z Values Based on Normal Distribution
ρ (%)z value
90.0 1.645
95.0 1.960
98.0 2.330
99.0 2.576
99.5 2.807
99.9 3.291

The attack detection algorithm is trained through predicted states of the normal traffic, and evaluated using both normal and bad data injected traffics. Algorithm 2 explains the offline and online FDI attack detection system. DETECT(·) is the attack detection procedure. In the offline mode (represented by the OFFLDETECTION(·) function), the training model uses adaptive threshold based on the confidence interval of the attack-free traffic. Then, the KL of any incoming traffic is compared against the threshold. Tables II and III illustrate the threshold values obtained for the 1-year and 5-year data corresponding to the z value in Table I, respectively. The two cases are based on the results of the proposed state prediction model. As it can be observed from the two tables, the Λ() value of those without attack is quite smaller than those of FDI-manipulated data, where the adversary model I is used (threat model is explained in Section IV-C). The performance of the proposed detection system is also compared with the WLS-based KL detection systems in Section V. For a given topological configuration, the detector is trained using the normal traffic to obtain adaptive thresholds for the considered detection confidence level. In addition, the online FDI attack detection is performed for each incoming measurement data and this is illustrated by the ONLDETECTION(·) function of Algorithm 2.

Table II  Thresholds of Δx for 1-year Data
ρ (%)z valueKL threshold
Without attackWith attack
95.0 2.330 0.0653 2.4637
98.0 2.576 0.0684 2.4891
99.0 2.807 0.0695 2.4975
99.5 3.291 0.0700 2.5017
99.9 3.291 0.0704 2.5051
Table III  Thresholds of Δx for 5-year Data
ρ (%)z valueKL threshold
Without attackWith attack
95.0 1.960 0.2223 2.0764
98.0 2.330 0.2262 2.0992
99.0 2.576 0.2275 2.1068
99.5 2.807 0.2282 2.1106
99.9 3.291 0.2287 2.1136

Algorithm 2  : FDI attack detection

1: Input: Dataset{(Y, X)}

2: procedure DETECT(Y)

3:  function OFFLDETECTION(Y)

4:   for each y in Y do

5:    Evaluate x̂ using Algorithm 1

6:    Evaluate mΔx̂ as well as lΔx

7:    ΛhistmΔx̂log2mΔx̂lΔx

8:    Evaluate τ using (6) and Λhist

9:    return τ

10:  end for

11:  end function

12: function ONLDETECTION(y)

13:   for each smart meter measurement do

14:    Compute x̂ and mΔx̂

15:    ΛmΔx̂log2mΔx̂lΔx

16:    if Λ(x̂)τ then FDI attack detected

17:    end if

18:    if Λ(x̂)<τ then no FDI attack detected

19:    end if

20:   end for

21: end function

22: end procedure

IV. Experimental Setup and Performance Evaluation of Proposed State Prediction

In this section, justifications through numerical simulations of the proposed state prediction are presented.

A. Test System Scenario

The data from four power grid standard test systems, i.e., IEEE 39-bus, 118-bus, 300-bus, and ACTIVSg 500-bus systems, are used to assess the performance of the proposed state prediction. The IEEE 118-bus system has 118 buses, 186 branches, with a total of 304 sensor measurements considering the DC power flow. The IEEE 39-bus system has 39 buses, 46 branches, and a total of 85 nodal injections and branch power flows given the DC power flow. For the AC power flow, while the system states include voltage magnitude and phase angle, the measurement data include real and reactive power injections, and real and reactive power flows. Similarly, the IEEE 300-bus and ACTIVSg 500-bus systems have a total of 711 and 1097 measurement data, respectively, considering DC power flow.

The experimental dataset is prepared using GEFCom based on an hourly interval. Real-time power load profile of five power zones of two categories are used. The first is for 2004 (a total of 43920 dataset used) and the other is for 2004 to 2008 (a total of 90320 dataset used). The hourly real-time power load data have been sampled to a 5-min interval. This system-level load distribution has further been normalized, distributed to the bus-level load rating of the simulated system. The procedures of dataset preparation of the two test systems and the actual load power data using MATLAB R2019B and MATPOWER [

25] are defined as follows. First, the sampled GEFCom power load values are normalized. After setting up the original load case data of the test systems, for each number of samples over the target utility regions, the power loads (including real and reactive) are multiplied by each of the normalized load values. Next, for each of the test systems, the DC and AC optimal power flows are generated. Given the new power load and generation conditions, the system state vectors are then generated. Furthermore, the measurement matrices Y are generated for the AC and DC power flows of the test systems. To be more realistic (i.e., to account for the inherent communication noises against the measurement data), a normally distributed Gaussian noise with mean μ and standard deviation σ is added to the generated measurement. After obtaining the measurement data and system states, the FDI attack is generated using adversarial models I and II shown in Section V. Note that the dataset of the DC and AC power flows are performed independently.

B. Prediction Performance

The proposed state prediction is analysed using the DC and AC power flow models. Implementation results of the DC model for the IEEE 39-bus and 118-bus systems are based on three densely-connected hidden layers. Likewise, the AC model for the IEEE 39-bus and 118-bus systems and the DC model for the IEEE 300-bus and ACTIVSg 500-bus systems are based on six densely-connected hidden layers. Table V is the configuration of the state prediction models with respect to the four test systems for a given observation time. As depicted in Table V and used throughout our numerical simulation, the NN model input refers to the measurement data (e.g., 85 for the IEEE 39-bus system and 304 for the IEEE 118-bus system given the DC power flow), and the NN model output refers to the estimated system states (e.g., 77 for the IEEE 39-bus system and 236 for the IEEE 118-bus system considering the AC power flow) for a given time interval. Similarly, the number of neurons for each densely-connected layer is also described. The following parameters are used while training the NN models for both DC and AC power flows: learning rate is 0.01, epochs are 100, and batch size of training algorithm is 64.

Table V  Configuration of State Prediction Models
Bus systemInput layerHidden layerOutput layer
39-bus (DC) 85 h1=85×85, h2=h1×85, h3=h2×85 38 (for voltage angle)
(DC) 118-bus 304 h1=304×304, h2=h1×304, h3=h2×304 118 (for voltage angle)
39-bus (AC) 117 h1=117×117, h2=h1×117, ..., h6=h5×117 77 (38 for voltage angle and 39 for voltage magnitude)
118-bus (AC) 354 h1=354×354, h2=h1×354, ..., h6=h5×354 236 (118 for voltage angle and 118 for voltage magnitude)
300-bus (DC) 711 h1=711×711, h2=h1×711, ..., h6=h5×711 300 (for voltage angle)
500-bus (DC) 1097 h1=1097×1097, h2=h1×1097, ..., h6=h5×1097 500 (for voltage angle)

Figures 2 and 3 demonstrate the voltage angle and voltage magnitude predictions of the proposed state prediction model and WLS state estimator for IEEE 118-bus and 39-bus systems, respectively. In Fig. 2 and Fig. 3, the predictions are evaluated using unseen test sets.

Fig. 2  Voltage angle prediction of proposed state prediction model and WLS state estimator. (a) IEEE 118-bus system. (b) IEEE 39-bus system.

Fig. 3  Voltage magnitude prediction of proposed state prediction model and WLS state estimator. (a) IEEE 118-bus system. (b) IEEE 39-bus system.

Additionally, to see the efficiency of the proposed model, it has been evaluated using additional unseen data of four different weeks (labeled as week-27, week-28, week-29, and week-52). This is demonstrated in Fig. 4, implemented using DRNN of the proposed model. Additionally, to assess the scalability of the proposed predictive SE method, two larger power system test cases are used: IEEE 300-bus and ACTIVSg 500-bus systems. The plots using the IEEE 300-bus and ACTIVSg 500-bus benchmarks are shown in Fig. 5. In fact, increasing the number of buses of the power grid can also increase the number of measurement points significantly. This, of course, poses a computational complexity towards the learner. However, such challenges can be alleviated by deploying sufficient number of computing devices such as high-bandwidth GPUs or RAMs and/or leveraging feature selection techniques.

Fig. 4  Evaluation of proposed state prediction model using various unseen datasets (DC power flow). (a) IEEE 118-bus system. (b) IEEE 39-bus system.

Fig. 5  Voltage angle prediction before and after topology changes. (a) IEEE 300-bus system. (b) ACTIVSg 500-bus system.

The prediction performance of the proposed model is also evaluated against topology changes. This part is assessed in accordance with Assumptions 1 and 2 using the IEEE 300-bus and ACTIVSg 500-bus systems. The aim of this experiment is to evaluate the performance of the predictive SE model if the underlying network topology changes (e.g., such topology updates can be obtained from the network topology processor). In this regard, topology configurations of some selected networks of the IEEE 300-bus and ACTIVSg 500-bus systems have been randomly modified. Only 5 sets of configurations are modified in the IEEE 300-bus system, while 10 sets of configurations are modified in the ACTIVSg 500-bus system.

The sets of configurations are parts of the Jacobian matrices H and are generated from MATPOWER [

25]. For each configuration, various scaling factors are used to get the topology change. Accordingly, datasets are generated and used for evaluating the prediction performance of the DNN model. The results of SE using the proposed approach and WLS-based SE are shown in Fig. 5 for the IEEE 300-bus and ACTIVSg 500-bus systems.

Numerical results demonstrate that the proposed state prediction is very comparable to the conventional WLS estimator whose estimation error is acceptable for the purposes of power system SE. Except for the AC power flow in Fig. 3, where the DFFNN deteriorates with less prediction results than that of the WLS, the DRNN can perform estimations very close to the result of the WLS estimator.

Furthermore, the performance of the predictive SE is evaluated through the metric MSE. The MSE is defined as (7) for a total number of n states and d observations.

MSE(x-x^)=1dt=1d1ni=1n(xit-x^it)2 (7)

The performance evaluation in terms of the MSE with the different models for IEEE 39-bus and 118-bus systems is shown in Table VI.

Table VI  Performance Evaluation in Terms of MSE with Different Models for IEEE 39-bus and 118-bus Systems
TypePrediction modelMSE
IEEE 39-bus systemIEEE 118-bus system
DC WLS 6.13×10-3 8.34×10-3
DFFNN 4.65×10-3 5.96×10-3
DRNN 2.50×10-3 3.52×10-3
AC WLS 3.92×10-3 5.34×10-3
DFFNN 7.57 ×10-3 8.21×10-3
DRNN 2.15×10-3 3.18×10-3

V. Adversary Models

Attackers come up with different adversarial strategies whereby the final effect of the malicious data leads to damage the state variables across the power system domain. Generally, there are two main FDI anomaly strategies for power system measurement models. One requires knowledge of power system topology [

4], and the other is based on a data-driven approach also known as the blind FDI attack strategy [26].

The vulnerability of the detection module to adversarial ML is analyzed as follows. Although the main assumption of adversarial model considered in this paper is the false injection attack, coordinated adversarial ML attacks can also be potential challenges against the proposed cyber-attack detection or decision-making module. The adversarial ML may bring inconsistencies against the model during its training and retraining phases and introduce errors into unseen datasets, thereby creating a confusion over a previously trained detection model. Overall, such threat models can cause the ML model to make a wrong decision or misclassifications and affect its detection performance. This current work assumes that the cyber-physical processes involved include data sensing, communication, and decision-making using the ML module. While the false injection attack is against the data integrity of the smart grid, the adversarial ML attack can be like poisoning or evasion attack against the ML module.

Therefore, the whole cyber-physical process is considering the false injection attack during the communication and/or injection across the sensors. As a result, we limit the scope of this paper to just the cyber-attack against the data integrity (i.e., the FDI attack).

The adversary is assumed to have access to the network topology. In particular, two realistic attack models are examined: random FDI attack and targeted FDI attack. While the former aims to inject an attack vector to the measurement quantities that will lead to a falsified estimate of state vectors, the latter aims to find an attack vector that can inject arbitrary errors into some state vectors.

A. Adversary Model I

Here, it is assumed that the adversary can have access only to some κ sensor readings (where κm-n+1). This may be due to the fact that some sensors have specific physical defences or may be beyond the reach of the adversary. Let Im={I1,I2,...,Iκ} be the set of indices of sensors. According to Theorem II in [

4], the adversary can compromise κ sensors under the following conditions:

a(i)=Hb    iIm0        iIm (6)

Algorithm 3 illustrates the implementation of this attack model used in our case scenario. In Algorithm 3, the ATTACKCONS() procedure (with κ, H, and y arguments) generates the attack vector a(i) and adds this attack vector to the measurement vector y(i) of k meters. randi() and randn() are random number generator functions where the former is used to generate the set of meters and the latter is used to generate non-zero numbers of the k-sparse attack vector. The zeros() function returns an array of zeros whose size corresponds to the non-compromised sensor indices.

Algorithm 3  : adversary model I

1: Input: κ, H, y

2: Output: yfalse

3: procedure ATTACKCONS(κ, H, y)

4: Iκrandi([1,m],1,κ)

5: if iIκ then

6:  a(i)H×randn(κ,1) 

7:  yfalse(i)y(i)+a(i)

8: end if

9: if iIκ then

10:  a(i)zeros(κ,1)

11: end if

12: end procedure

B. Adversary Model II

Here we consider a targeted FDI attack where the adversary intends to inject bad data into certain chosen state vectors. Again, for a successful FDI attack, we assume that the adversary has the knowledge of the network topology such as bus and chosen state vectors. Suppose the adversary has chosen ζ<n set of state variables x1,x2,...,xζi, where ζi={1,2,...,ζ} is the position of the subset of state vectors. In this model of the attack, the adversary aims to construct xfalse=x^+b. Details of the construction of this attack model are given in [

4].

VI. Numerical Results and Discussion of Proposed SPAD Framework

A. Quantifying KL Metric of Detector

Here, we present scenarios to justify the performance of the data-driven KL metric for the detection of bad data injection. The scenarios demonstrated in this subsection are based on the adversary model I. The histogram of state variations of normal and injections of false data are demonstrated in Fig. 6 for the 1-year and 5-year data. The mean differences in the distribution between these two classes of data are also shown in Fig. 6. In addition, the computation results of the KL distance are demonstrated in Fig. 7 and Fig. 8, respectively, for the 1-year and 5-year data. In general, for both data cases, there is a distinction between the histogram of the KL values of the normal and the tampered traffic, and the plots of KL values of these two classes. However, the results also show that the dissimilarity between the normal and the tampered traffic is more noticeable under the 1-year data compared with the 5-year data, mainly due to the difference in the load profile as the latter covers a wide range of 5-year data. Finally, while Fig. 9 illustrates the cumulative distribution function (CDF) between the ground truth and predicted states of the KL distance of the 1-year and the 5-year data. Therefore, the results shown in Figs. 6-9 confirm that power system states at normal operating conditions have almost similar estimations provided that the topology remains the same. However, when FDI attack is injected to some of the measurement data, it results in a different SE in the control center. These results are, in particular, owing to the combined data-driven approach for prediction of power system states and KL distance metric based attack detection.

Fig. 6  Histogram of state variations without and with FDI attack. (a) 1-year data. (b) 5-year data.

Fig. 7  KL distances of 1-year data (with and without FDI attack). (a) KL of predicted states (without FDI attack). (b) KL of predicted states (with FDI attack). (c) KL distance.

Fig. 8  KL distances of 5-year data (with and without FDI attack). (a) KL of predicted states (without FDI attack). (b) KL of predicted states (with FDI attack). (c) KL distance.

Fig. 9  CDF between ground truth and predicted states of KL distance of 1-year and 5-year data.

B. Detection Performance

The detection module is trained through the KL values of the predicted state vectors. ROC curve, AUC, recall and precision are employed for the detection performance. The performance of ROC is evaluated in terms of the probability of correctly classifying the computed KL distance of the predicted states as either attack-free or manipulated data using the decision rule given by (5). The ROC curve, which is a plot of FPR v.s. TPR, is obtained by varying the decision thresholds. FPR and TPR are defined as (9) and (10), respectively. Additionally, the recall and precision are given by (11) and (12), respectively.

FPR=FPFP+TN (9)
TPR=TPTP+FN (10)
Recall=TPFN+TP (11)
Precision=TPFP+TP (12)

where TP represents the successfully identified FDI attacks; FP represents the number of states that are wrongly classified as FDI attacks; and FN represents the number of states that are wrongly classified as normal.

The proposed detection performance is compared against χ2 distribution test, and with one of recent findings on KL metric based FDI attack detection using WLS estimation [

16]. For the χ2 distribution test, the 𝓁2-norm is computed based on the residuals between the actual and the WLS predicted values using a degree of freedom given by m-n. Similarly, to compare with the other technique, the KL of Δx denoting the difference between WLS-predicted states and the actual states is computed.

C. Attack Scenarios

To perform the FDI attack detection, three attack scenarios are considered. For each scenario, 12000 samples are used.

1) Attack Scenario I

For attack simulation of the IEEE 118-bus system, we assume that the adversary has access to k=190 measurement meters, a condition that satisfies the criteria km-n+1 [

4]. The attack vector is generated following implementation Algorithm 3 and (6). Figure 10 shows the attack detection results in this attack scenario. In this case, the proposed system has a much better detection performance than the BDD and existing KL-based FDI attack detection technique. The BDD has very poor performance in detecting the FDI anomalies. The KL-based FDI attack detection technique demonstrates better detection results compared with the BDD. However, as the KL metric is dependent on the WLS estimation result, its detection performance is still less than the proposed SPAD framework.

Fig. 10  Attack detection results using 5-year data in attack scenario I.

2) Attack Scenarios II and III

These two case scenarios are based on the targeted FDI attack. When the adversary manipulates the state variables, the measurements associated with these elements will be tempered. To inject the bad data, we simulate the bias vector b to be added on each of the 118 state variables of the power network using different settings. First, we use a 10% increase from the initial state value (here referred to as attack scenario II). Then, we increase the bad data by 40%, which means xflase=x+0.4x (attack scenario III). In both attack scenarios, although the proposed SPAD framework outperforms existing techniques, its classification accuracy deteriorates when the magnitude of the attacks are too small. However, as the magnitude of the attacks increases, so does the KL distance, which leads to a much higher probability of attack detection by the SPAD framework. Figures 11 and 12 are the attack detection results for these two attack scenarios. Table VII demonstrates the detection performance of the proposed SPAD framework in terms of AUC, precision and recall considering both the DC and AC optimal power flows. From the detection numerical results we can conclude that the BDDs are vulnerable to the FDI attack models. In contrast, our proposed scheme has much better detection rate compared with the existing KL-based FDI attack detection and the BDD.

Fig. 11  Attack detection results using 5-year data in attack scenario II.

Fig. 12  Attack detection results using 5-year data in attack scenario III.

Table VII  Detection Performance of SPAD Framework in Terms of AUC, Precision, and Recall
Power flow modelAttack scenario I (%)Attack scenario II (%)Attack scenario III (%)
AUCRecallPrecisionAUCRecallPrecisionAUCRecallPrecision
DC 99.41 98.59 99.25 86.47 84.77 85.63 98.15 97.54 97.76
AC 97.41 95.54 96.36 84.90 82.48 82.67 96.43 94.89 95.03

VII. Conclusion

This research work identifies vulnerabilities of existing power system SEs against the FDI attack and proposes data-driven state prediction and defence strategies to ensure the data integrity of power systems. In particular, the proposed SPAD framework is formulated using DL, where a predictive SE is deployed for estimating the system states, and a KL metric-based detection leverages the predicted states. Accordingly, an attack alert is generated when the computed KL value of the predicted states is greater than the decision threshold. Numerical simulations show that under normal operating conditions of the power system, there occurs only a minimal dissimilarity between consecutive state vectors; however, the KL score rises when falsified measurement data is injected into the meter readings. The proposed SPAD framework detects FDI attacks with a higher accuracy compared with the existing FDI attack detection algorithms. In the future, given low-dimensional properties of power system measurement data and sparsity properties of the FDI attack, attack localization can be explored using data-driven techniques. Furthermore, to deter the growing challenges of data integrity cyber-attacks, a comprehensive data-driven approach of FDI attack construction and cyber defence strategy can be proposed leveraging state-of-the-art DL, reinforcement learning, or deep reinforcement learning models along with optimization approaches. Finally, ML adversarial attacks can exploit the ML-based cyber-attack detection of the smart grid SE and can inject bad data to the decision-making module. Hence, cyber-attack detection against the ML adversarial attacks is recommended as an open research issue.

REFERENCES

1

A. Abur and A. G. Exposito, Power System State Estimation: Theory and Implementation, 1st ed., Boca Raton: CRC Press, 2004, pp. 1-327. [Baidu Scholar] 

2

M. Gol and A. Abur, “LAV based robust state estimation for systems measured by PMUs,” IEEE Transactions on Smart Grid, vol. 5, no. 4, pp. 1808-1814, Jul. 2014. [Baidu Scholar] 

3

G. Wang, G. B. Giannakis, and J. Chen, “Robust and scalable power system state estimation via composite optimization,” IEEE Transactions on Smart Grid, vol. 10, no. 6, pp. 6137-6147, Nov. 2019. [Baidu Scholar] 

4

Y. Liu, P. Ning, and M. Reiter, “False data injection attacks against state estimation in electric power grids,” ACM Transactions on Information and System Security, vol. 14, no. 13, pp. 1-33, May 2011. [Baidu Scholar] 

5

H. T. Reda, A. Anwar, and A. Mahmood. (2021, Jul.). Comprehensive survey and taxonomies of false injection attacks in smart grid: attack models, targets, and impacts. [Online]. Available: https://arxiv.org/abs/2103.10594 [Baidu Scholar] 

6

Z. Zhao, Y. Huang, Z. Zhen et al., “Data-driven false data-injection attack design and detection in cyber-physical systems,” IEEE Transactions on Cybernetics, vol. 51, no. 12, pp. 6179-6187, Dec. 2021. [Baidu Scholar] 

7

Y. Chen, S. Huang, F. Liu et al., “Evaluation of reinforcement learning-based false data injection attack to automatic voltage control,” IEEE Transactions on Smart Grid, vol. 10, no. 2, pp. 2158-2169, Mar. 2019. [Baidu Scholar] 

8

H. T. Reda, A. Anwar, A. N. Mahmood et al. (2021, Mar.). A taxonomy of cyber defence strategies against false data attacks in smart grid. [Online]. Available: https://arxiv.org/abs/2103.16085 [Baidu Scholar] 

9

A. S. Zamzam, X. Fu, and N. D. Sidiropoulos, “Data-driven learning-based optimization for distribution system state estimation,” IEEE Transactions on Power Systems, vol. 34, no. 6, pp. 4796-4805, Nov. 2019. [Baidu Scholar] 

10

L. Zhang, G. Wang, and G. B. Giannakis, “Real-time power system state estimation and forecasting via deep unrolled neural networks,” IEEE Transactions on Signal Processing, vol. 67, no. 15, pp. 4069-4077, Aug. 2019. [Baidu Scholar] 

11

H. Mosbah and M. El-Hawary, “Multilayer artificial neural networks for real time power system state estimation,” in Proceedings of IEEE Electrical Power and Energy Conference, London, Canada, Oct. 2015, pp. 344-351. [Baidu Scholar] 

12

P. N. P. Barbeiro, J. Krstulovic, H. Teixeira et al., “State estimation in distribution smart grids using autoencoders,” in Proceedings of IEEE 8th International Power Engineering and Optimization Conference, Langkawi, Malaysia, Mar. 2014, pp. 358-363. [Baidu Scholar] 

13

G. Wang, G. B. Giannakis, Y. Saad et al., “Retrieval via reweighted amplitude flow,” IEEE Transactions on Signal Processing, vol. 66, no. 11, pp. 2818-2833, Jun. 2018. [Baidu Scholar] 

14

M. H. Ansari, V. T. Vakili, B. Bahrak et al., “Graph theoretical defense mechanisms against false data injection attacks in smart grids,” Journal of Modern Power Systems and Clean Energy, vol. 6, no. 5, pp. 860-871, Sept. 2018. [Baidu Scholar] 

15

N. Zivkovi and A. T. Sari, “Detection of false data injection attacks using unscented Kalman filter,” Journal of Modern Power Systems and Clean Energy, vol. 6, no. 5, pp. 847-859, Sept. 2018. [Baidu Scholar] 

16

G. Chaojun, P. Jirutitijaroen, and M. Motani, “Detecting false data injection attacks in AC state estimation,” IEEE Transactions on Smart Grid, vol. 6, no. 5, pp. 2476-2483, Sept. 2015. [Baidu Scholar] 

17

S. K. Singh, K. Khanna, R. Bose et al., “Joint-transformation-based detection of false data injection attacks in smart grid,” IEEE Transactions on Industrial Informatics, vol. 14, no. 1, pp. 89-97, Jan. 2018. [Baidu Scholar] 

18

H. Manyun, N. Ming, L. Manli et al., “Detecting false data injection attacks on modern power systems based on Jensen-Shannon distance,” in Proceedings of IEEE 8th Annual International Conference on Cyber Technology in Automation, Control, and Intelligent Systems, Tianjin, China, Jul. 2018, pp. 1154-1159. [Baidu Scholar] 

19

S. K. Singh, R. Bose, and A. Joshi, “Minimizing energy theft by statistical distance based theft detector in AMI,” in Proceedings of 24th National Conference on Communications, Hyderabad, India, Feb. 2018, pp. 1-5. [Baidu Scholar] 

20

T. Hong, P. Pinson, and S. Fan, “Global energy forecasting competition 2012,” International Journal of Forecasting, vol. 30, no. 2, pp. 357-363. [Baidu Scholar] 

21

X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, May 2010, pp. 249-256. [Baidu Scholar] 

22

D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” in Proceedings of International Conference on Learning Representations (ICLR), San Diego, USA, May 2015, pp. 1-41. [Baidu Scholar] 

23

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436-444, May 2015. [Baidu Scholar] 

24

S. Kullback and R. A. Leibler, “On information and sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79-86, Mar. 1951. [Baidu Scholar] 

25

R. D. Zimmerman and C. Murillo-Sanchez. (2016, Dec.). Matpower 6.0 user’s manual. [Online]. Available: https://matpower.org/docs/MATPOWER-manual-6.0.pdf [Baidu Scholar] 

26

A. Sayghe, Y. Hu, I. Zografopoulos et al., “Survey of machine learning methods for detecting false data injection attacks in power systems,” IET Smart Grid, vol. 3, no. 5, pp. 581-595, Nov. 2020. [Baidu Scholar]