Abstract
With the development of advanced metering infrastructure (AMI), large amounts of electricity consumption data can be collected for electricity theft detection. However, the imbalance of electricity consumption data is violent, which makes the training of detection model challenging. In this case, this paper proposes an electricity theft detection method based on ensemble learning and prototype learning, which has great performance on imbalanced dataset and abnormal data with different abnormal level. In this paper, convolutional neural network (CNN) and long short-term memory (LSTM) are employed to obtain abstract feature from electricity consumption data. After calculating the means of the abstract feature, the prototype per class is obtained, which is used to predict the labels of unknown samples. In the meanwhile, through training the network by different balanced subsets of training set, the prototype is representative. Compared with some mainstream methods including CNN, random forest (RF) and so on, the proposed method has been proved to effectively deal with the electricity theft detection when abnormal data only account for 2.5% and 1.25% of normal data. The results show that the proposed method outperforms other state-of-the-art methods.
ELECTRICITY has become essential in our daily life. However, electricity loss occurs in every process with electricity such as electricity generation, transmission, and distribution [
To restraint these economic losses, power enterprises often assign their workers to check the meter of suspicious customers or update the protective device of meter. However, inevitably, these traditional methods have obvious disadvantages. For example, artificial detection relies too much on expert experience, which makes this method difficult to be applied in small enterprise. Besides, improving protective device means the iteration of smart meter, which costs much. Meanwhile, with the development of computer science, the methods of electricity theft are updating quickly such as cyber-attack for two-way communication network in smart grid without any tampering circuit [
There are three mainstream directions among current data-driven algorithms of electricity theft detection, including anomaly detection, state estimation, and supervised learning. Anomaly detection aims at seeking the similarity of normal samples or designating an index to judge the class of samples such as clustering, correlation analysis, principal component analysis (PCA), and local outlier algorithm. Compared with supervised learning, anomaly detection is capable of learning consumption pattern and information from unlabeled samples. Reference [
In this paper, a novel electricity theft detection model is proposed, which deals with imbalanced dataset well. Firstly, one-class support vector machine (OCSVM) is conducted on every user’s consumption data to ascertain their constant electricity usage. Then, CNN, long short-term memory (LSTM), and prototype learning are employed to construct the prototype of each class. Through calculating the Euclidean distance between the sample and each prototype, the label of the sample is determined by the nearest prototype. In this process, the neural network minimizes the distance of the same class and maximizes the distance of different class to make critical features learned by the model. For the unbalanced dataset, the network is trained by different subsets of the training set. Compared with some supervised learning algorithms, the proposed method has better performance on imbalanced dataset.
The main contributions of this paper are summarized as follows.
1) Prototype learning and ensemble learning are firstly implemented in the area of electricity theft detection. In realistic world, the imbalance between abnormal users and normal users causes large imbalance in electricity consumption dataset. In this case, traditional theft detection based on artificial intelligence (AI) cannot play a role due to the risk of overfitting and the lack of feature extraction. However, the proposed method can still distinguish normal and abnormal data when other methods are unable to achieve according to the experiments.
2) Apart from the imbalance of abnormal data size, the influence of abnormal data with different abnormal levels is also considered. Slight electricity theft causes few reductions on consumption data, which reduces the charge of power and the risk of being detected. There is high similarity between abnormal and normal consumption data. Compared with traditional deep learning, the proposed method has greater performance in dealing with these samples which are difficult to be detected. The design of prototype learning significantly improves the performance of the network for this kind of samples.
3) OCSVM is utilized to further prove the constant consumption pattern of signal customers. In this case, the feature from consumption data and electricity theft dataset are reliably enough for model training. This process can be considered as the reliable proof for model learning process.
The rest of the paper is organized as follows. In Section II, the characteristics of electricity consumption and electricity theft are analyzed. In Section III, a novel electricity theft detection method is proposed. Some experiments which verify the performance of the proposed method in imbalanced dataset will be narrated in Section IV. Finally, Section V concludes this paper.
Electricity theft is a behavior to avoid or reduce electricity cost. All electricity theft can be summarized into three classes, including tampering, bypassing electric energy meters, and false data injection. These behaviors will leave some clues on the consumption data such as abnormal maximum value and abnormal mean value. If customers’ behaviors are normal, his/her electricity usage would remain constant due to his/her fixed lifestyle. Therefore, finding out the feature of abnormal usage and normal usage is the key of detecting electricity theft. In this section, the characteristics of customers’ consumption data is analyzed by OCSVM, which is utilized to prove the constant usage of most customers.
In this experiment, the public dataset containing 536 days’ electricity consumption data of 4225 residential customers, released by Electric Ireland and Sustainable Energy Authority of Ireland in January 2012 is going to be utilized [

Fig. 1 Trends of daily and weekly consumption data. (a) Daily consumption data sampled in one hour. (b) Weekly consumption data sampled in twelve hours.
The above conclusions are the results of our observation for these curves without precise calculation. For verifying the constant consumption pattern of most customers, OCSVM is conducted on electricity consumption data. As a classical machine learning for novelty detection, OCSVM establishes a boundary with normal samples and distinguishes the label of samples through their position in feature space [

Fig. 2 Non-outlier rate of OCSVM for all customers in raw dataset.
Due to the lack of abnormal data in origin datasets, the abnormal data will be constructed based on the characteristics of real electricity theft. Tampering the circuit will permanently change the measure of smart meter such as lowering all measurements in the same proportion and setting measurements to be zero during some time. Bypassing the circuit means using electricity directly. In this case, the power meter will read the measurements of zeros all the time. Compared with the above two types of electricity theft, false data injection will bring various change on the measurements. Because of different electricity price at different time, peak-load shifting and replacing all measurements with mean value can help theft reduce the cost of used power. There is no reduction on total electricity consumption but large reduction on cost. Meanwhile, some thieves choose to add noises to these data for various fluctuations. On account of these analyses above and referring to the past abnormal function [
Table I lists five specific abnormal functions, where is a vector including 48 measurements of daily electricity consumption data; and is the

Fig. 3 Daily consumption data in normal usage and abnormal usage.
Class | Abnormal function |
---|---|
Class 1 | , , where is the uniform sample operation |
Class 2 |
where start means the start time of electricity theft and duration means the lasting time of electricity theft |
Class 3 | , where mean is the average of value in x |
Class 4 | , |
Class 5 |
In realistic world, the obvious characteristic of electricity consumption dataset is a small ratio of electricity theft data to normal electricity data. However, traditional supervised learnings have difficulty in dealing with this characteristic. In this case, many methods are proposed and have finite effect. Compared with those data augment, ensemble learning makes full use of existing dataset by training some weak classifiers and synthesizing their predictions. However, constructing weak classifications based on the neural network will cost amounts of time and memory. Therefore, this paper focuses on improving the accuracy of theft detection while the abnormal samples are few.
Weak classifier refers to the classifier whose accuracy is more excellent than random prediction. The training sets of different weak classifiers are different subsets of total training set. In this case, some trained classifiers will learn different features of training set and give contrary prediction for the same sample. Meanwhile, most classifiers will give the right prediction, which corrects the mistakes of few classifiers. Considering these weak classifiers and their predictions synthetically, a strong classifier is produced.
However, it takes long time for neural network to train its parameters. Meanwhile, the combination of multiple deep neural networks has high requirement for memory. Therefore, a deep neural network is set to replace all weak classifiers in this paper. To ensure the smooth training process of the model, the balanced subset of the total training set is extracted, which contains all abnormal samples and the same number of normal samples. In this case, the balance of training set forces the network not to prefer a certain class. At the same time, different training sets of different epochs avoid the parameters of neural network falling into local optima. After many epochs of training, all samples can be utilized fully and trained by network. Because this training method is similar to the design of batch training, it is called batch ensemble learning.
Before the training process, the raw data need to be preprocessed because different value ranges may influence the convergence speed and generalization performance of the model. There are two common standardization methods.
The function of zero-score standardization is to let raw data follow Gaussian distribution. The following equation is the expression of zero-score standardization:
(1) |
where represents the standard deviations of . This standardization will worsen the performance of the model if the raw data do not satisfy Gaussian distribution.
The function of min-max scaling is to let raw data equal to [0,1] in equal proportion. The following equation is the expression of min-max scaling:
(2) |
where and represent the maximum and minimum values of , respectively. Compared with zero-score standardization, this method is more widespread and does not have preconditions for raw data. After the test of these two methods, we choose the second to normalize our dataset.
Prototype learning [
(3) |

Fig. 4 Basic construction of prototype network.
where is the class of consumption data; is the embedded feature of the
Then, the representations of samples in Q are utilized to predict their class by calculating the Euclidean distance of them with all prototypes and finding the nearest prototype. According to these distances, the probability of all class can be calculated by softmax layer. With the help of cross-entropy function and back propagation, the parameter can be optimized in right direction. The following equation is the concrete loss function:
(4) |
where is the one-hot coding; and is the corresponding probability vector. According to the optimization of loss function, the distance of representations from the same class decreases while that from different classes increases.
In traditional CNN with softmax layer and cross-entropy function, the samples are often mapped to certain area in feature space. In this case, the distance between the features from the same class may be further than that from different classes. In this paper, the samples from the same class are mapped into certain point in feature space. This design makes the prototype representative and improves the robustness of the network.
It is easy to know that the quality of prototype depends on the distribution of dataset and the ability of network. According to current deep learning framework, CNN and LSTM are utilized to extract the features of samples. In this subsection, CNN focuseses on extracting the characteristic about the periodicity of raw samples. LSTM focuses on extracting the characteristic about the global feature of raw samples. The detailed structures of two subnetworks are narrated as follows.
According to the analysis of consumption data in different days, some characteristics of consumption pattern such as the maximum value, the minimum value, their corresponding time indices, and the fluctuations can be revealed. As the variant of RNN, LSTM [
The construction of LSTM cell is shown in

Fig. 5 Construction of LSTM cell.
The red route can be regarded as forgetting information. And forgetting signal is constructed by following formulation.
(5) |
where is the forgetting signal; and is the sigmoid function which lets the number in map between 0 and 1; and are all trainable coefficient matrices; and is a trainable bias matrix. Therefore, the element-wise product of and will drop some information in .
The green route can be regarded as recording information. The recording signal is combined by the following formulations.
(6) |
(7) |
where is the recording signal which is similar to ; is the abstract feature of the current input; , , , and are all trainable coefficient matrices; and and are trainable bias matrices. Through the element-wise product of and , the information of background decays less and irrelevant information are removed.
After that, remains the feature about the relationship between the past and current inputs. However, the finite metrics only retain finite information and the information from a long time ago will be covered. Therefore, through the function of blue route which filters the information of , the information has been kept in a long time before. In the experiment, the last is utilized to represent the global feature of the sample.
Through the novelty detection for weekly consumption data by OCSVM and observation of the consumption data in different weeks, the periodicity of electricity consumption for most customers can be proved. For example, the consumption data of weekends are usually higher than the consumption data of weekdays. In LSTM, the consumption data are handled in order, which will let the relation of value at interval ignored. Therefore, for extracting the periodicity of electricity usage, CNN is utilized. In this subsection, the daily electricity consumption data are folded into 2-D shape. Through sliding convolution window, we can extract features about the relation of consumption data in convolution window. The concrete CNN consists of five similar blocks, which are listed in Table II.
Layer | Parameter | Number |
---|---|---|
Conv2d | (C, 3, 3) or (C, 5, 3) | 1, 2, 3, 4, 5 |
ReLU | 0 | 1, 2, 3, 4, 5 |
AvgPool2d | 0 | 1 |
BatchNorm2d | 0 | 1, 3 |
Table II lists all parameters of blocks in CNN where the C in parameter is the number of the channels. In this table, numbers mean the blocks where this layer exists. Two-dimensional convolutional layer (Conv2d) exists in all blocks for extracting feature. In general, convolution kernel of and (5,5) is conducive for the performance of network. Combined with the reality, the size of all convolutional kernel is (5,3) in the last Conv2d. Rectified linear unit (ReLU) following Conv2d increases the nonlinearity of network and prevents CNN from degenerating into MLP. Besides these parts, two-dimensional average pool layer (AvgPool2d) is utilized to adjust the shape of input and remains most information of input, which is beneficial to reduce the depth of network. Meanwhile, two-dimensional batch normalization layer (BatchNorm2d) is utilized to speed up the convergence rate.
After the disposal of LSTM network and CNN network, we concatenate two 1-D vectors and generate prototypes by calculating the mean of features of each class. However, the length of prototype will be too long, which will increase the cost of time. In this case, the fully connected layer will be used to adjust the length of the prototype and the proportion of two features.
The framework of the proposed algorithm is shown in

Fig. 6 Framework of proposed algorithm.
In this section, training process and parameter optimization will be narrated in detail. To demonstrate the performance of the proposed method, some experiments are set including parameter optimization, comparing experiment, sensitivity analysis of abnormal level, and ablation experiment. Besides these, three metrics including true positive rate (TPR), false positive rate (FPR), and area under curve (AUC) are chosen to evaluate the performance of the proposed method.
According to the abnormal functions in Table I, we have a benign dataset and five abnormal datasets whose shapes are all 4225×536×48, where 4225, 536, and 48 represent the number of customers, days and sampling number of one day, respectively. The training set, test set, and validation set will be sampled from benign dataset and five abnormal datasets. Firstly, 2760 customers’ indices are randomly chosen, including 1800 customers in training set, 480 customers in test set, and 480 customers in validation set. In the following step, the similar methods are conducted on three datasets. Taking the training set as an example, 1800 customers are randomly divided into six parts, where normal class accounts for half and each abnormal class accounts for 10% of all. Customers’ electricity consumption belonging to the corresponding class is collected to assemble the training set. According to this method, normal data and the corresponding abnormal data cannot be obtained from the network at the same time, which is more practical. Meanwhile, as the generalization of model needs to be proved, different customers’ future electricity consumptions are tested and validated in this paper. In the following experiments, the ability of the proposed model for imbalanced datasets is seriously concerned. Therefore, only few parts of abnormal data in the training set will be utilized. Meanwhile, untrained balanced dataset will be used to test the proposed model.
In the experiment, three performance metrics, i.e., TPR, FPR, and AUC, are considered [
Table III presents confusion metric, which stores the total prediction.
Label | Prediction | |
---|---|---|
Positive | Negative | |
Positive | TP | FN |
Negative | FP | TN |
According to this confusion, the following three metrics can be calculated, which is helpful for the calculation of AUC.
(8) |
(9) |
(10) |
where TPR indicates the ratio of true positive sample to all positive samples; and FPR indicates the ratio of false positive samples to all predicted positive samples. In electricity theft detection, our purpose is to find out all abnormal data and avoid predicting normal sample as abnormal. If TPR is high and FPR is low, the classifier has good performance on the dataset. However, it is difficult to let these two indices come to ideal indirection at the same time. When an algorithm gives many positive predictions, the ratio of wrong prediction will inevitably rise. In this case, Diff is also considered to evaluate the performance of our method.
However, even if two different methods obtain the same TPR and FPR, there are still differences between these two methods. For example, when model A gives a positive sample with the positive probability of 0.9 and model B gives the same sample with the positive probability of 0.6, all models will give the sample with positive prediction. If a random sample which is never trained needs to be predicted, model B has less confidence to give a definite prediction, which also can be regarded as the alility of the model. Therefore, AUC is conducted to check the confidence of the proposed method. Compared with TPR and FPR, this index accounts for the score of a randomly chosen sample. In general, an excellent method will give different scores for different classes, like the score closed to 0 for negative samples and the score closed to 1 for positive samples. Therefore, AUC can help us realize whether the method distinguishes the class of sample well. AUC is calculated by the mean of TPR for different thresholds from 0 to 1. Before AUC is calculated, a series of boundary need to be set. When probability of the sample is less than , the model will give a positive prediction to the sample. The following formulation is the expression of AUC:
(11) |
where and denote the values of TPR and FPR when the boundary is , respectively; and is the number of boundaries.
Before the performance of the proposed method is compared with other methods, four comparing experiments are set to choose the best parameters. Due to the way of prediction which is based on the distance of feature space, we think that the number of prototype’s dimensions is more important than other parameters such as batch size and learning rate. Therefore, four lengths of prototype’s dimension are tested, including 16, 32, 64, and 128. Because the proportion of each class is equal, accuracy is simply chosen as the metric to compare the performance of the network.

Fig. 7 Performance of network with different lengths of prototype.
According to
To verify the superiority of the proposed algorithm, other five classification methods which have been used for electricity theft detection are conducted on the given training set. These five methods and corresponding concrete parameters will be introduced in the following section.
1) SVM [
2) RF [
3) Adaboost [
4) CNN [
5) Deep belief network (DBN): DBN is a probability generation model which consists of multiple restricted Boltzmann machine (RBM) and fully connected layers. Due to the unsuitably initial parameters which will make model get stuck at locally optimal value, pre-training is conducted on the RBM to obtain great mapping function and lose little information in the process of mapping. This pre-training can be regarded as the fine adjustment of initial parameters. After the process of pre-training, the DBN is trained by background propagation.
Table IV shows the concrete hyper-parameters of the compared methods. The hyper-parameters of SVM, RF, Adaboost, and DBN refer to the existing research. The hyper-parameter of CNN is the same as the CNN section of the proposed method. In general, the existing research deals with the imbalance of dataset by two methods, i.e., enlarging abnormal datasets and giving different weights to different classes. In our experiments, the second method is utilized on CNN and RF. Abnormal data are given larger weights than normal data according to the ratio of abnormal data to normal data.
Compared method | Hyper-parameter |
---|---|
SVM [ | Kernel is “”, Gamma is “”, , and weight is the rate of abnormal data and normal data |
RF [ | The number of is 40, is “entropy”, and |
RF (weight) [ | Weight is the same with SVM, and other hyper-parameters are the same with RF |
Adaboost [ | , , classifier is “DT” |
CNN (weight) [ | Weight is the same with SVM, and Epoch is 100 |
DBN | The number of RBM is 3, and the number of neurons in each RBM is |
As stated above, there are 900 normal customers in our training set. Meanwhile, different numbers of abnormal customers in training set are utilized to form the imbalanced set, which are 10%, 5%, 2.5%, and 1.25% the size of normal data, respectively. For test section, the same balanced datasets are utilized to test all of the methods. The classifying result of all of the methods for different imbalanced datasets is shown in Table V.
Table V shows the TPR, FPR, Diff, and AUC of different methods when the imbalance of dataset is different. In this table, the previous four methods belong to statistics-based method while the last three methods are based on neural networks. Comparing Diff of all of the methods, it can be found out that only the proposed method and CNN (weight) succeed in distinguishing the labels of most of samples correctly when the ratio is 10%. The low TPR and FPR which are close to 0 indicate that many abnormal samples are mistakenly judged as normal samples for the previous five methods. However, when the probability that each sample belongs to a certain class is calculated, there is a clear boundary between the abnormal samples and normal samples because of high AUC. It may be due to that SVM, RF, and Adaboost are non-parameter methods, which seriously depend on the distribution of training samples. If the difference of abnormal samples and normal samples in training set is not obvious for machine learning, these methods fail to completely distinguish samples in test set but to judge them as normal samples with lower probability than real normal samples. As for DBN, although it has RBM and sigmoid activation layer to obtain the features, its small capacity makes extracting available feature and distinguishing samples difficult. Therefore, it can be observed that the performance of the previous five methods becomes worse when the ratio reduces. As a result, deep learning such as CNN (weight) and the proposed method can deal with imbalanced dataset. The proposed method achieves better performance than CNN (weight). Due to the batch ensemble learning, only balanced subsets of training set are feed into the network at each epoch of training. In this case, balanced abstract feature is utilized by the proposed method to optimize its parameters. On the contrary, large amounts of features from normal features and little features from abnormal features are obtained by CNN (weight), which result in the overfitting of the preference for normal data. This also can be reflected from Table V, where Diff between CNN (weight) and the proposed method becomes larger when the ratio is reduced.
Method | Ratio of abnormal data to normal data is 10% | Ratio of abnormal data to normal data is 5% | Ratio of abnormal data to normal data is 2.50% | Ratio of abnormal data to normal data is 1.25% | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TPR (%) | FPR (%) | Diff (%) | AUC | TPR (%) | FPR (%) | Diff (%) | AUC | TPR (%) | FPR (%) | Diff (%) | AUC | TPR (%) | FPR (%) | Diff (%) | AUC | |
SVM | 36.82 | 0.58 | 36.24 | 0.8491 | 28.17 | 0.27 | 27.90 | 0.8598 | 13.33 | 0.08 | 13.25 | 0.8491 | 9.97 | 0.15 | 9.82 | 0.8367 |
RF | 66.41 | 0.35 | 66.06 | 0.9788 | 51.78 | 0.07 | 51.71 | 0.9671 | 36.94 | 0.12 | 36.82 | 0.9336 | 16.00 | 0.10 | 15.90 | 0.8879 |
RF (weight) | 61.09 | 0.26 | 60.83 | 0.9792 | 40.78 | 0.06 | 40.72 | 0.9681 | 26.01 | 0.05 | 25.96 | 0.9338 | 9.16 | 0.09 | 9.07 | 0.9056 |
Adaboost | 7.67 | 0.13 | 7.54 | 0.9202 | 3.76 | 0 | 3.76 | 0.8999 | 0.31 | 0.01 | 0.30 | 0.9254 | 0 | 0 | 0 | 0.8729 |
DBN | 51.85 | 1.13 | 50.72 | 0.8453 | 46.60 | 0.76 | 45.84 | 0.8135 | 0 | 0 | 0 | 0.5457 | 0 | 0 | 0 | 0.5342 |
CNN (weight) | 91.91 | 4.94 | 86.97 | 0.9803 | 84.76 | 4.10 | 80.66 | 0.9665 | 83.55 | 5.50 | 78.05 | 0.9672 | 79.89 | 10.64 | 69.25 | 0.9253 |
Proposed | 96.32 | 2.52 | 93.80 | 0.9837 | 92.80 | 2.34 | 90.47 | 0.9709 | 94.97 | 5.20 | 89.76 | 0.9654 | 95.50 | 10.34 | 85.16 | 0.9405 |
In this experiment, the ability of dealing with the samples which are difficult to be detected is tested. Compared with class 3, class 4, and class 5, the abnormal levels of class 1 and class 2 are mutable for the different values of coefficients and . As shown in
Because and are set as research objects, the classes of abnormal data in our training set are only class 1 and class 2 which account for 50%, respectively. Meanwhile, the classes of abnormal data in the validation set and test set are the same with the training set. Besides, the ratio of abnormal data to normal data is 10% in the training set. According to twelve groups’ experiment, the result is shown in

Fig. 8 AUC and Diff for different groups of α and β.
To verify the good performance of the proposed method, CNN (weight) is chosen to conduct partial experiment. (, ), (, ), and (, ) represent three abnormal levels. For abnormal data at these three levels, the performance of CNN (weight) and the proposed method is shown in Table VII.
Parameter | Value |
---|---|
α | |
β |
Method | (, ) | (, ) | (, ) | |||
---|---|---|---|---|---|---|
Diff (%) | AUC | Diff (%) | AUC | Diff (%) | AUC | |
CNN (weight) | 96.80 | 0.9988 | 68.92 | 0.9220 | 41.10 | 0.7806 |
Proposed | 98.17 | 0.9938 | 73.12 | 0.9286 | 54.73 | 0.8394 |
When the abnormal level is reduced from (, ) to (, ), Diff of CNN (weight) decreases by 27.78%, which is about 1.1 times the reduction of the proposed method. When the abnormal level is reduced from (, ) to (, ), Diff of the proposed method is 1.51 times that of CNN (weight). Meanwhile, AUC of CNN (weight) also drops violently. While AUC of the proposed method fluctuates only 0.15, AUC of CNN (weight) drops by 0.21. When the similarity between abnormal samples and normal samples increases, the performance of CNN (weight) deteriorates faster than the proposed method. Therefore, it is concluded that the proposed method has greater robustness in dealing with abnormal data at low abnormal level.
In this experiment, the function of prototype learning and batch ensemble learning in improving the performance of electricity theft detection will be tested when the imbalance of dataset is violent. There are three models in this experiment including , , and the proposed model. To avoid the influence of irrelevant variables, the used datasets including training dataset, validation dataset and test dataset are the same. While the ratio of abnormal data to normal data becomes less, the performance of every model becomes worse. Therefore, to highlight the function of models, only 2.5% abnormal data of normal data are utilized to train the model.
Table VIII shows Diff and AUC of different models when the ratio of abnormal data to normal data is 2.5%. The is set as basic model which obtains the lowest Diff and AUC. With the addition of batch ensemble learning, there are 24.55% growth on Diff and 0.062 growth on AUC. Due to the balanced subsets trained in the training process of basic model, the overfitting of model can be ameliorated. However, with the training process going on, overfitting finally happens because all normal samples have been fed into the network. With the addition of prototype learning, Diff increases to 89.76% while AUC decreases to 0.9654. It is attributed to the method of utilizing feature. Basic model prefers to extract the relevant feature to determine the labels of samples. On the contrary, prototype learning utilizes the thought of cluster to make samples belonging to the same class locate in the same position of feature space. In this process, some weak relevant features will be utilized by basic model to predict more samples correctly, which weakens the generation of the model. Prototype learning pays more attention on the similarity of feature instead of partial information.
Model | Diff (%) | AUC |
---|---|---|
60.11 | 0.9154 | |
84.66 | 0.9763 | |
Proposed method | 89.76 | 0.9654 |
In this paper, an electricity theft detection method based on ensemble learning and prototype learning is proposed, which has great performance on imbalanced dataset. According to feature embedding, the abstract feature of every sample is obtained to construct the prototype of each class. After that, the label of each sample is determined by searching the nearest prototype in feature space. In the training process, through extracting the balanced dataset from the total training set, the preference of the model is restrained and the generation of the model improves. To verify the performance of the proposed method on imbalanced dataset, some experiments including parameter optimization, comparing experiment, sensitivity analysis of abnormal level, and ablation study are conducted. Compared with mainstream ensemble learning and deep learning, the proposed method reflects the strongest ability of classification. When the abnormal level of abnormal data decreases, there is less impact on the proposed method while another model loses the ability of classification. Although the proposed method has great performance, there are also disadvantages such as the instability of training process compared with CNN. In our analysis, if we can obtain consumption data which come from customers with the same occupations, the proposed method can get better result using fewer abnormal data. In our opinion, the electricity theft detection should point to imbalanced dataset and how to combine the data from different sources such as occupation and permanent resident population to improve the detection model.
REFERENCES
Z. Zheng, Y. Yang, X. Niu et al., “Wide and deep convolutional neural networks for electricity-theft detection to secure smart grids,” IEEE Transactions on Industrial Informatics, vol. 14, no. 4, pp. 1606-1615, Apr. 2018. [Baidu Scholar]
J. I. Guerrero, I. Monedero, F. Biscarri et al., “Non-technical losses reduction by improving the inspections accuracy in a power utility,” IEEE Transactions on Power Systems, vol. 33, no. 2, pp. 1209-1218, Mar. 2018. [Baidu Scholar]
Z. Yang, W. Liao, Q. Zhang et al., “Fault coordination control for converter-interfaced sources compatible with distance protection during asymmetrical faults,” IEEE Transactions on Industrial Electronics, vol. 70, no. 7, pp. 6941-6952, Jul. 2023. [Baidu Scholar]
N. F. Avila, G. Figueroa, and C. Chu, “NTL detection in electric distribution systems using the maximal overlap discrete wavelet-packet transform and random undersampling boosting,” IEEE Transactions on Power Systems, vol. 33, no. 6, pp. 7171-7180, Nov. 2018. [Baidu Scholar]
R. Katakey and R. K. Singh, “India fights to keep the lights on,” Bloomberg Business Week, vol. 2014, no. 4382, pp. 21-22, May 2014. [Baidu Scholar]
S. Tufail, S. Batool, and A. I. Sarwat, “False data injection impact analysis in ai-based smart grid,” in Proceedings of SoutheastCon, Atlanta, USA, Mar. 2021, pp. 1-7. [Baidu Scholar]
K. Zheng, Q. Chen, Y. Wang et al., “A novel combined data-driven approach for electricity theft detection,” IEEE Transactions on Industrial Informatics, vol. 15, no. 3, pp. 1809-1819, Mar. 2019. [Baidu Scholar]
M. Zanetti, E. Jamhour, M. Pellenz et al., “A tunable fraud detection system for advanced metering infrastructure using short-lived patterns,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 830-840, Jan. 2019. [Baidu Scholar]
R. Qi, J. Zheng, Z. Luo et al., “A novel unsupervised data-driven method for electricity theft detection in AMI using observer meters,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1-10, Jul. 2022. [Baidu Scholar]
E. U. Haq, J. Huang, H. Xu et al., “A hybrid approach based on deep learning and support vector machine for the detection of electricity theft in power grids,” Energy Reports, vol. 7, no. 6, pp. 349-356, Nov. 2021. [Baidu Scholar]
X. Kong, X. Zhao, L. Chao et al., “Electricity theft detection in low-voltage stations based on similarity measure and DT-KSVM,” International Journal of Electrical Power & Energy Systems, vol. 125, no, 3, p. 106544, Feb. 2021. [Baidu Scholar]
P. Jokar, N. Arianpoo, and V. C. M. Leung, “Electricity theft detection in AMI using customers’ consumption patterns,” IEEE Transactions on Smart Grid, vol. 7, no. 1, pp. 216-226, Jan. 2016. [Baidu Scholar]
Z. Qu, H. Li, Y. Wang et al., “Detection of electricity theft behavior based on improved synthetic minority oversampling technique and random forest classifier,” Energies, vol. 13, no. 8, p. 2039, Apr. 2020. [Baidu Scholar]
Z. Yan and H. Wen, “Electricity theft detection base on extreme gradient boosting in AMI,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1-9, Jan. 2021. [Baidu Scholar]
A. Jindal, A. Dua, K. Kaur et al., “Decision tree and SVM-based data analytics for theft detection in smart grid,” IEEE Transactions on Industrial Informatics, vol. 12, no. 3, pp. 1005-1016, Jun. 2016. [Baidu Scholar]
M. Tariq and H. V. Poor, “Electricity theft detection and localization in grid-tied microgrids,” IEEE Transactions on Smart Grid, vol. 9, no. 3, pp. 1920-1929, May 2018. [Baidu Scholar]
D. Yao, M. Wen, X. Liang et al., “Energy theft detection with energy privacy preservation in the smart grid,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7659-7669, Oct. 2019. [Baidu Scholar]
H. Gao, S. Kuenzel, and X. Zhang, “A hybrid CONVLSTM-based anomaly detection approach for combating energy theft,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1-10, Aug. 2022. [Baidu Scholar]
M. N. Hasan, R. N. Toma, A. A. Nahid et al., “Electricity theft detection in smart grid systems: a CNN-LSTM based approach,” Energies, vol. 12, no. 17, pp. 1-18, Aug. 2019. [Baidu Scholar]
A. Takiddin, M. Ismail, M. Nabil et al., “Detecting electricity theft cyber-attacks in AMI networks using deep vector embeddings,” IEEE Systems Journal, vol. 15, no. 3, pp. 4189-4198, Sept. 2021. [Baidu Scholar]
M. Ismail, M. F. Shaaban, M. Naidu et al., “Deep learning detection of electricity theft cyber-attacks in renewable distributed generation,” IEEE Transactions on Smart Grid, vol. 11, no. 4, pp. 3428-3437, Jul. 2020. [Baidu Scholar]
S. Li, W. Hu, D. Cao et al., “Electric vehicle charging management based on deep reinforcement learning,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 3, pp. 719-730, May 2022. [Baidu Scholar]
H. Zhou, Y. Zhou, J. Hu et al., “LSTM-based energy management for electric vehicle charging in commercial-building prosumers,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 5, pp. 1205-1216, May 2022. [Baidu Scholar]
H. Yang, R. C. Qiu, and H. Tong, “Reconstruction residuals based long-term voltage stability assessment using autoencoders,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 6, pp. 1092-1103, Dec. 2020. [Baidu Scholar]
Y. Zhang, Y. Ji, and D. Xiao, “Deep attention-based neural network for electricity theft detection,” in Proceedings of 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, Oct. 2020, pp. 154-157. [Baidu Scholar]
J. Pereira and F. Saraiva, “A comparative analysis of unbalanced data handling techniques for machine learning algorithms to electricity theft detection,” in Proceedings of 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, Jul. 2020, pp. 1-8. [Baidu Scholar]
Y. Kulkarni, S. Hussain, K. Ramamritham et al., “EnsembleNTLDetect: an intelligent framework for electricity theft detection in smart grid,” in Proceedings of 2021 International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand, Dec. 2021, pp. 527-536. [Baidu Scholar]
R. Yao, N. Wang, W. Ke et al., “Electricity theft detection in unbalanced sample distribution: a novel approach including a mechanism of sample augmentation,” Applied Intelligence, doi: 10.1007/s10489-022-04069-z [Baidu Scholar]
A. Arif, T. A. Alghamdi, Z. A. Khan et al., “Towards efficient energy utilization using big data analytics in smart cities for electricity theft detection,” Big Data Research, vol. 27, p. 100285, Feb. 2022. [Baidu Scholar]
H. Liu, Z. Li, and Y. Li, “Noise reduction power stealing detection model based on self-balanced data set,” Energies, vol. 13, no. 7, p. 1763, Apr. 2020. [Baidu Scholar]
Commission for Energy Regulation (CER). (2012, Dec.). CER smart metering project - electricity customer behavior trial, 2009-2010, 1st edition, Irish social science data archive. SN: 0012-00. [Online]. Available: https://www.ucd.ie/issda/data/commissionforenergyregulationcer/ [Baidu Scholar]
B. Scholkopf, R. Williamson, A. Smola et al., “Support vector method for novelty detection,” in Proceedings of Conference and Workshop on Neural Information Processing Systems (NIPS), Cambridge, USA, Nov. 1999, pp. 583-588. [Baidu Scholar]
I. Parvez, M Aghili, A. I. Sarwat et al., “Online power quality disturbance detection by support vector machine in smart meter,” Journal of Modern Power Systems and Clean Energy, vol. 7, no. 5, pp. 1328-1339, Sept. 2019. [Baidu Scholar]
J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Proceedings of Conference and Workshop on Neural Information Processing Systems (NIPS), Red Hook, USA, Dec. 2017, pp. 4077-4087. [Baidu Scholar]
H. Yang, X. Zhang, F. Yin et al., “Robust classification with convolutional prototype learning,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, Dec. 2018, pp. 3474-3482. [Baidu Scholar]
K. Cho, B. V. Merriënboer, C. Gulcehre et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1724-1734. [Baidu Scholar]
J. J. Davis and M. H. Goadrich, “The relationship between precision-recall and ROC curves,” in Proceedings of the 23rd International Conference on Machine Learning, New York, USA, Jun. 2006, pp. 233-240. [Baidu Scholar]
H. Zhao, Y. Gao, H. Liu et al., “Fault diagnosis of wind turbine bearing based on stochastic subspace identification and multi-kernel support vector machine,” Journal of Modern Power Systems and Clean Energy, vol. 7, no. 2, pp. 350-356, Apr. 2018. [Baidu Scholar]
Z. Qu, H. Liu, Z. Wang et al., “A combined genetic optimization with AdaBoost ensemble model for anomaly detection in buildings electricity consumption,” Energy and Buildings, vol. 248, p. 111193, Oct. 2021. [Baidu Scholar]