Data-Based Fault Diagnosis of Power Cable System ... · Therefore, it is crucial to diagnose ground faults, i.e. to identify their cause, when they are detected. Fault diagnosis problems

Data-Based Fault Diagnosis of Power CableSystem: Comparative Study of k-NN,

ANN, Random Forest, and CART

Iftikhar Ahmad ∗ Hiroyuki Mabuchi ∗ Manabu Kano ∗

Shinji Hasebe ∗ Yoshikazu Inoue ∗∗ Hiroaki Uegaki ∗∗

∗ Kyoto University, Nishikyo-ku, Kyoto 615-8510, Japan(e-mail: [email protected])

∗∗ Kansai Electrical Safety Inspection Association, Moriguchi, Osaka570-0005, Japan

Abstract: Ground faults are major problems of power cable systems. They are caused byvarious accidents and need to be fixed as early as possible. Therefore, it is crucial to diagnosea ground fault. Time-series data of voltage and current when a ground fault occurs is availablefor diagnosis. In the present work, a data-based fault diagnosis system of power cable systemsis developed. In order to achieve the high fault diagnosis performance, new feature variables aregenerated by using both wavelet analysis and cepstrum analysis. In addition, six classificationtechniques, i.e. k-nearest neighbor (k-NN), artificial neural network (ANN), boosted ANN (B-ANN), random forest (RF), classification and regression trees (CART) and boosted CART(B-CART), are compared. Furthermore, B-ANN and B-CART are combined with the naiveBayes classifier to cope with multiclass problems. The results of applying the proposed methodsto real ground fault data show that B-ANN with the naive Bayes classifier can achieve the bestdiagnosis performance, which satisfies the requirement for its industrial application.

Keywords: Fault diagnosis; Electric power system; Power cable system; Ground fault;Classification; Boosting; Classification and regression trees; Naive Bayes classifier; Waveletanalysis; Cepstrum analysis.

1. INTRODUCTION

A power cable system is a very important infrastructurefor our daily life; thus it is always monitored to detectany fault that can damage electric power supply. Groundfaults are major problems of power cable systems. They arecaused by various accidents and need to be fixed as early aspossible. Therefore, it is crucial to diagnose ground faults,i.e. to identify their cause, when they are detected.

Fault diagnosis problems can be formulated as classifi-cation problems. The most basic classification method islinear discriminant analysis (LDA) followed by quadraticdiscriminant analysis (QDA). Their extension includes dis-criminant partial least squares (DPLS) and kernel-basednonlinear discriminant analysis. As a classification methodusing kernels, support vector machine (SVM) has been ap-plied to various problems. Another well-known technique isartificial neural networks (ANN). Although these methodsrequire statistical models, there are memory-based meth-ods such as k-nearest neighbors (k-NN). In addition, tree-based methods such as classification and regression trees(CART) and random forest (RF) have been widely usedfor classification.

Approaches for fault diagnosis of power cable systemshave been mostly based on the use of wavelet analysisfor feature extraction and ANN for classification as shownin Table 1. Wavelet analysis has many advantages over

the conventional Fourier analysis ( Heydt et al. (1997)).The mother wavelet in a wavelet transform employs timecompression or dilation rather than frequency modula-tion used in Fourier analysis. Techniques such as thefast Fourier transform generally are computationally morecomplex than wavelet transform. ANN can implicitly de-tect complex nonlinear relationship between independentand dependent variables, but it is prone to overfitting.

In the present research, a data-based fault diagnosis sys-tem of power cable systems is developed. Our preliminarystudy has shown that the conventional system based onwavelet analysis and ANN does not achieve the enoughdiagnosis performance for industrial application. There-fore, in order to further improve the performance, newfeature variables are generated by using cepstrum analysisas well as wavelet analysis. In addition, six classificationtechniques, i.e. k-NN, ANN, boosted ANN (B-ANN), RF,CART and boosted CART (B-CART), are compared. Fur-thermore, B-ANN and B-CART are combined with thenaive Bayes classifier to cope with multiclass problems.The developed systems are tested by using real industrialground-fault data recorded by commercial ground-faultdetection units.

In the next section, six classification techniques are brieflyexplained. Then, feature variables derived from waveletanalysis and cepstrum analysis are introduced in section3. The application results of the developed fault diagnosis

Preprints of the 18th IFAC World CongressMilano (Italy) August 28 - September 2, 2011

Copyright by theInternational Federation of Automatic Control (IFAC)

12880

Table 1. Classification and features extraction techniques used for fault diagnosis of power cablesystem in the literature.

Reference Classification Features extraction

( Lee et al. (1997)) Learning vector quantization (LVQ) Bispectra( Chung et al. (2002)) Hidden markov models (HMM) Wavelet analysis( Zhao et al. (2002)) Rule-based method Wavelet analysis( Yeo et al. (2003)) Adaptive network-based fuzzy inference system (ANFIS) Root-mean-square values( Liao et al. (2004)) Fuzzy-expert system and artificial neural network Fourier transform and wavelet analysis( Zadeh et al. (2006)) Artificial neural network Butterworth filters and finite impulse

response (FIR) digital filters( Uyar (2008)) Artificial neural network Wavelet analysis( Mammone et al. ( 2009)) Artificial neural network Fourier transform, Hartley transform

and wavelet analysis

system are described in section 4, which is followed byconcluding remarks.

2. METHODS

This section briefly describes classification methods: k-NN,ANN, RF, CART, B-ANN, B-CART, and naive Bayesclassifier.

2.1 k-Nearest Neighbor (k-NN)

The k-NN classifier is a supervised learning algorithmbased on minimum distance from the new test sample tothe training samples. The test sample is assigned to themost frequently occurring class among its k nearest neigh-bours. Closeness is usually defined in terms of Euclideandistance.

2.2 Artificial Neural Networks (ANN)

ANN tries to mimic biological brain neural network. Neu-rons are constitutive units in ANN, which receive inputsand use a given transfer function to produce outputs.Input-output structure of each neuron follows a patternshown in Fig. 1.

Fig. 1. Input output structure of neuron in ANN

For a dataset xi, each input feature fl(l = 1, · · · , L) istransmitted through a connection, which multiplies itsstrength by the scalar weight wl, and the weighted inputfeatures are summed up. Then a bias value is addedto get a scaled net input for the activation function φ,which is typically a step function or a sigmoid functionand produces the output. In ANN, weights are adjustableparameters.

2.3 Random Forest (RF)

RF is an ensemble classifier that consists of many decisiontrees ( Breiman (2001)). RF combines Breiman’s baggingidea and the random selection of features ( Ho (1995);Ho (1998)). Bagging is a mechanism used to improvemachine learning of classification and regression modelsin terms of stability and classification accuracy. Givena training set D of size N , bagging generates M newtraining sets D∗

m(m = 1, · · · ,M), whose size N ′ = N , byrandom sampling from D with replacement. The set D∗

mis expected to have 63.2 percent of the unique datasets inD and the rest is duplicated. The newly created trainingsets are called bootstrapped samples while the fraction oforiginal data that is not bootstrapped is termed out-of-bag(OOB) data. In addition, the number of split variables isdetermined by trial and error, and RF selects the best splitvariables from the total input variables on the basis of thesplit performance at each node.

RF creates multiple trees; each tree is trained by usingthe bootstrapped samples and the fixed number of splitvariables. The out-of-bag (OOB) datasets are used forerror calculation of the respective trees.

2.4 Classification and Regression Trees (CART)

CART is a classification method based on decision treesand uses binary recursive partitioning ( Breiman et al.(1984)). First, the overall set including all training datasetsis split into two subsets by using the best predictor of theoutput. This binary partitioning is recursively applied tothe derived subsets until no further significant partitioningis found or the subsets become too small.

2.5 B-ANN and B-CART

Boosting technique ( Hastie et al. (2009)) has been usedto enhance the performance of classifiers such as ANNand CART. The boosting algorithm works on the idea ofcombining several weak classifiers to make a single highlyaccurate classifier. There are many boosting algorithms.The main variation is their methods of weighting trainingdatasets. AdaBoost is one of the most significant boostingalgorithms. It calls a weak classifier repeatedly in a seriesof rounds. On each round, the weights of incorrectly clas-sified datasets are increased and the weights of correctlyclassified datasets are decreased simultaneously.

To develop B-ANN, Pseudo-Loss AdaBoost (AdaBoost.M2)( Freund et al. (1997)) is used in this work. Suppose


12881

Fig. 2. Accumulation of boosted classifiers (ANNs)

a training set D has N datasets (x1, y1), · · · , (xN , yN ),where xi is the i th input and yi is the correspondingtarget. The weight of xi on round k is denoted as wk(i).For round one, i.e k = 1, identical weights are allotted toall datasets. These weighted datasets are fed to an ANNwhich gives hypothesis hk(i). Then pseudo-loss εk of hk(i)is calculated.

εk =12

N∑i=1

wk(i)(1 − ∆hk(i)) (1)

where ∆hk(i) denotes the degree of the correct predictionof target for the i th dataset on round k. The weight ofeach dataset is updated through:

wk+1(i) =wk(i)zk

β1+∆hk(i)

2k (2)

where zk is a normalization factor and βk is a function ofεk.

Using the updated weights, N training datasets are ran-domly resampled from D. In the new round AdaBoost.M2focuses on samples which are misclassified or hard todiscriminate.

After several rounds, the weak classifiers, i.e. ANNs, arecombined as shown in Fig. 2. The resulting final hypothesishfin is a weighted majority vote of all weak hypothesis:

hfin(i) = arg maxy

∑k

log(1βkhk(i)) (3)

where log 1βk

represents the weight of its weak hypothesis.The target y is the label of ground fault cause in ourapplication.

The boosting improves the classification performance butboosted classifiers perform better only in binary classifi-cation. In order to overcome such deficiency, one-againstall ( Rifkin et al. (2004)), pair wise ( Hastie et al.(1998)), and binary hierarchical ( Heydt et al. (1997))techniques are often used. Although AdaBoost.M2 canhandle multiclass problems, multiclass problem is split intobinary class problem in this work to improve classificationperformance. Naive Bayes rule ( Yang et al. (2002)) isused to split multiclass problem into binary class problem.

Table 2. Datasets

No Ground fault cause The numberof datasets

1 Contamination of high-voltage equip. 272 Contact of tree 423 Crack of insulator 344 Switch surge 265 Breakdown of high-voltage equip. 526 Thunder 407 Rain 248 Insulation failure 119 Artificial ground-fault experiment 1610 Breakdown of 6kV cable 20

2.6 Naive Bayes Classifier

The Bayesian classifier is a well-known probabilistic in-duction method. Each dataset is described by a vectorf = (f1, · · · , fL) of L feature variables. Given trainingdatasets with known classes cj(j = 1, · · · , J), the Bayesianclassifier learns the relation between the datasets and thecorresponding classes. In order to classify a test dataset,the naive Bayes classifier initially calculates the prior prob-ability p(cj) of each class. Then likelihood p(f |cj) is calcu-lated under the assumption that features are conditionallyindependent given the class.

p(f |cj) =L∏

l=1

p(fl|cj) (4)

Although this strong assumption is often violated inreal applications, naive Bayes classifiers outperform farmore sophisticated classification techniques ( Hastie et al.(2009)). Finally, posterior probability p(cj |f) is derived.

p(cj |f) = p(cj)L∏

l=1

p(fl|cj) (5)

Naive Bayes classifiers have been used in many practicalapplications. They have significant advantages in terms ofsimplicity, learning speed, classification speed, and storagespace ( Yang et al. (2005)).

3. FEATURE VARIABLES SELECTION

3.1 Wavelet Analysis

A commercial ground fault detection system records time-series data of voltage and current when a ground faultoccurs. The data provided by an electrical safety inspec-tion association consist of ten major data groups listed inTable 2. Each group is related to a particular ground-faultcause. Each dataset includes four variables: current (I),voltage (V), dI, and dV. Here dI and dV are the differencesof I and V, respectively. The above-mentioned classifica-tion methods cannot be directly applied to these time-series data because the number of sampling points, i.e.input variables for classifiers, is too large. For dimension-ality reduction, feature variables are generated from orig-inal time-series data. Discrete wavelet transform (DWT),Daubechies (db2) for 10 levels, is used to transform alldatasets:

W (j, k) =∑

j

∑k

x(t)2−j/2ψ(2−jt− k) (6)


12882

Fig. 3. Original current signal of ”contact of tree” fault(top) and reconstructed signals at ten transformedlevels

where W (j, k) represents the transformed data signal, x(t)the original data signal, ψ the mother wavelet, j thescaling parameter, and k the shifting parameter. The tentranformed levels of origninal signal are reconstructed asshown in Fig. 3, i.e d1 to d10, separately from the detailedcoefficients by using inverse wavelet transform. Thesereconstructed signals are then subjected to calculationof their root mean square (rms), inter quartile range(iqr), mean, and maximum (max). Then based on theclassification performance, combinations of these valueswere optimized and rms/max, iqr and mean/max wereadopted. The use of three features, i.e combinations,resulted in 120 inputs (4 original signals × 10 levels ×3 features) for the models.

3.2 Cepstrum Analysis

Real cepstrum coefficients of datasets are calculated asshown in Fig. 4. Real cepstrum coefficients ct of a signalx(t) is derived by calculating the natural logarithm ofmagnitude of its Fourier transform z(p) followed by theinverse Fourier transform:

ct =1T

T∑p=1

log|z(p)|ω−(t−1)(p−1) (7)

where T is the total number of sample points in the signal.The first ten cepstrum coefficients of each signal are usedas feature variables. These coefficients extract valuableinformation of the signal as demonstrated in Fig. 4, whichcompares Fourier transform of the first 200 and first 10cepstrum coefficients. It is clear that the actual spectrumis approximated by using only 10 cepstrum coefficients.The cepstrum coefficients add further 40 feature variables(4 original signals × 10 cepstrum coefficients) and increasethem to a sum of 160.

Fig. 4. Cepstrum coefficients of the faulty signal

4. RESULTS AND DISCUSSIONS

A total of 292 datasets shown in Table 2 was availablein this study. 17 percent of datasets were reserved astesting datasets. The training and testing datasets wereused to build models and to evaluate their classificationperformance respectively. For each classification method,two types of models were built with different features: (a)features extracted through wavelet and cepstrum analysisand (b) features extracted only through wavelet analysis.

The ANN model was trained by using the backpropagationalgorithm in MATLAB environment. The number of nodesin a hidden layer was optimized and 25 was selected forboth (a) and (b). There were 10 outputs corresponding tothe major ground-fault causes listed in Table 2. The targetoutput vector had values of either 1 (true) or 0 (false). For


12883

Fig. 5. Overview of hierarchical binary classification usingnaive Bayes technique

Table 3. Parameters for B-ANN

Features (a) Features (b)

Binary step Hidden layers Nodes Hidden layers Nodes

1 2 20 2 202 2 20 2 203 2 20 2 204 1 25 2 55 1 25 1 56 2 5 2 57 1 20 2 58 1 15 1 59 3 10 2 5

k-NN, the number of nearest neighbors (k) was optimizedand 3 was selected. The random forest R package was usedin MATLAB environment. Different combinations of thenumber of trees and split of variables were tested. Thebest classification performance was obtained by setting thenumber of trees 20 for (a) and 15 for (b) while the split ofvariables in each node is 12 for (a) and 8 for (b).

For B-ANN and B-CART, naive Bayes technique was usedto divide the training datasets into binary subsets in ninesteps as shown in Fig. 5. At each step, various pairs ofclasses (causes) shown in Table 2 were tested to form thebest seperable binary groups. Initially, the best pair ofclasses 9 and 10 were selected to train the naive Bayesclassifier. Then the remaining eight classes were classifiedinto either of the two classes, i.e. 9 or 10, by using thetrained classifier. The child classes were further dividedinto binary classes until each child node consists of onlyone of the 10 classes. Then B-ANN and B-CART wereapplied to each of these binary classification steps, and atthe end all these binary classifiers (B-ANN or B-CART)were accumulated. The number of training datasets variesat each binary step, thus the number of hidden layers andnodes were optimized accordingly. The optimal parametersof B-ANN are listed in Table 3.

The success rates of the developed classifiers are shownin Table 4. The success rate is the fraction of correctlyclassified datasets to the total number of datasets. Theoverall success rates of k-NN, ANN, B-ANN, RF, CART

and B-CART are 74, 88, 90, 78, 70, and 82 percentrespectively in case (a) and 76, 84, 88, 76, 68, and 78percent respectively in case (b). From these results, itis evident that the use of cepstrum coefficients improvedthe classification performance. Only in case of k-NN thediagnosis performance drops from 76 to 74 percent whencepstrum coefficients were used as features variables. Inboth cases, B-ANN achieved the best performance amongall classifiers.

A confusion matrix of the B-ANN results is shown inTable 5. The diagonal numbers represent the correctlyclassified datasets. The rest of numbers represent themisclassified datasets. In all six models either group 5 ismisclassified into group 10 or group 10 is misclassified intogroup 5. In this table, 2 out of 7 elements of group 5 aremisclassified into group 10 in case (a) while 3 out of 4elements of group 10 are misclassified into group 5 in case(b).

5. CONCLUSIONS

In the present work, a data-based fault diagnosis systemof power cable systems was developed. The wavelet anal-ysis and cepstrum analysis were used to generate newfeature variables. The conventional features, i.e. rms, iqr,mean, and max, were derived from reconstructed signalsof wavelet analysis and rms/max, iqr, and mean/maxwere used as features together with first 10 cepstrumcoefficients. These new features made the data-based faultdiagnosis approach more effective. In addition, in orderto achieve high diagnosis performance, six classificationmethods, i.e. k-NN, ANN, B-ANN, RF, CART and B-CART, were compared. The naive Bayes technique wasused to mold datasets into a binary hierarchical format;then B-ANN and B-CART were applied to the multi-class classification problem. The results of applying thedeveloped ground fault diagnosis system to real industrialdata have shown that B-ANN with naive Bayes classifierachieved the best diagnosis performance. The achievedclassification performance of around 90 percent is suitablefor real industrial application.

REFERENCES

Breiman, L., Friedman, J., Olshen, R., and Stone, C. J.(1984). Classification and regression trees, Chapmanand Hall, New York.

Breiman, L. (2001). Random forests, Machine learning,45, 5-32.

Chung J., Powers E.J., Grady W.M., and Bhatt S.C.,(2002). Power disturbance classifier using a rule-basedmethod and wavelet packet-based hidden markov model,IEEE Trans. on Power Delivery, 17, 233-241.

Freund, Y. and Schapire, R.E. (1997). Decision-theoreticgeneralization of on-line learning and an application toboosting, Journal of Computer and System Sciences, 55,119–139.

Hastie, T. and Tibshirani, R. (1998). Classification bypairwise coupling, The Annals of Statistics, 26, 451- 471.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). Theelements of statistical learning, Second Edition, 337–386.


12884

Table 4. Classification results of models

Output Success rate

K-NN ANN B-ANN RF CART B-CART

a b a b a b a b a b a b

1 100 100 100 100 100 100 100 100 100 100 100 1002 67 50 83 67 67 50 83 83 83 67 80 803 60 80 80 60 100 100 80 60 40 80 100 1004 25 25 75 75 75 100 50 25 25 0 75 1005 86 86 86 86 71 100 86 71 86 86 100 866 100 100 100 100 100 100 67 100 83 83 84 677 100 100 100 100 100 100 100 100 100 100 100 1008 100 100 100 100 100 100 100 100 100 100 100 1009 75 100 100 100 100 100 100 100 50 75 100 10010 0 0 50 50 100 25 0 0 0 0 0 0

Overall 74 76 88 84 90 88 78 76 70 68 82 78

Table 5. Confusion matrix of B-ANN results

Features (a) Features (b)

Predicted fault causes Predicted fault causes1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Act

ualfa

ult

cause

s

1 6 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 02 0 4 0 0 2 0 0 0 0 0 0 3 1 0 2 0 0 0 0 03 0 0 5 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 04 0 0 0 3 0 1 0 0 0 0 0 0 0 4 0 0 0 0 0 05 0 0 0 0 5 0 0 0 0 2 0 0 0 0 7 0 0 0 0 06 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 6 0 0 0 07 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 5 0 0 08 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 3 0 09 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 4 010 0 0 0 0 0 0 0 0 0 4 0 0 0 0 3 0 0 0 0 1

Heydt, G. T. and Galli, A. W. (1997). Transient powerquality problems analyzed using wavelets, IEEE Trans-actions on Power Delivery, 12(2), 908-915.

Ho, T. K., (1995). Random decision forest, Proceedings ofthe 3rd Int’l Conf. on Document Analysis and Recogni-tion, Montreal, Canada, August 14–18, 278–282.

Ho, T. K., (1998). The random subspace method forconstructing decision forests, IEEE Transactions onPattern Analysis and Machine Intelligence, 20(8), 832–844.

Jun, G. and Ghosh, J., (2009). Multi-class boosting withclass hierarchies, Springer-Lecture Notes in ComputerScience, 55(19), 32–41.

Lee, J., S., Lee, C., H., Kim, J., O., and Nam, S.,W.(1997). Classification of power quality disturbances us-ing orthogonal polynomial approximation and bispectra,Electronic Letters, 33, 18.

Liao, Y. and Lee. J. B. (2004). A fuzzy-expert system forclassifying power quality disturbances, Electrical Powerand Energy Systems, 26, 199-205.

Mammone, A., Turchi, M., and Cristianini, N. (2009).Support vector machines, John Wiley and Sons, Inc.WIREs Comp Stat, 283-289.

Rifkin, R. and Klautau, A. (2004). In defense of one-vs-allclassification, Journal of Machine Learning Research, 5,101–141.

Schapire, R. (1990). Strength of weak learnability, Ma-chine Learning, 5, 197-227.

Uyar, M., Yildirim, S., and Gencoglu, M., T. (2008).An effective wavelet-based feature extraction methodfor classification of power quality disturbance signals,Electric Power Systems Research, 78, 1747-1755.

Vezhnevets, A., and Vezhnevets, V. (2005). ModestadaBoost–teaching adaBoost to generalize better, In

Graphicon, 322–325.Yang, Y., and Webb, G. I,. (2002). A comparitive study

of discretization methods for naive-bayes classifiers,ThePacific Rim Knowledge Acquisition Workshop, Tokyo,Japan, 159–173.

Yang, Z., Zhong, S., and Wright, R. (2005). Privacypreserving classification of customer data without lossof accuracy, Proceedings of the 5th SIAM InternationalConference on Data Mining, Newport beach, California,92–102.

Yeo, S., M., Kim, C., H., Hong, K., S., Lim, Y., B.,Aggarwal, R., K., Johns, A.,T., and Choi, M., S. (2003).A novel algorithm for fault classification in transmissionlines using a combined adaptive network and fuzzyinference system, Electrical Power and Energy Systems,25, 747-758.

Zadeh, H., K., and Aghaebrahimi, M., R. (2006). A novelapproach to fault classification and fault location formedium voltage cables based on artificial neural net-work, International Journal of Computational Intelli-gence, 2, 2.

Zhao, W., Song, Y. H., and Min, Y. (2000). Wavelet anal-ysis based scheme for fault detection and classificationin underground power cable systems, Electric PowerSystems Research, 53, 23-30.


12885

Documents

Data-Based Fault Diagnosis of Power Cable System ... · Therefore, it is crucial to diagnose ground faults, i.e. to identify their cause, when they are detected. Fault diagnosis problems