#### Introduction

Electroencephalogram (EEG) is a recording of brain electrical oscillations from the scalp by using surface electrodes (1). Clinically, EEG is particularly helpful for epilepsy. In addition to clinical history and imaging studies, EEG findings help us to determine the diagnosis and the types of epilepsy (1,2). It is also helpful for the treatment of the disease (3). Epilepsy goes with seizures that occur in a sudden and unexpected nature. Mostly, seizures don’'t observe during the event by the medical staff. When they are not observed or having described an atypical features, EEG helps us to reveal whether it is a seizure or not. In general, analysis of EEG signals done by expert physicians by visual analysis. Visual recognition of epileptic waveforms is sometimes difficult and time consuming for physicians who especially have not got enough expertise (4). There is also inter-reader differencesare also inter-reader differences during the visual analysis and it suggests that the visual analysis could be insufficient. With that reason, new computer evaluation techniques are developed and performed in healthy and diseased individuals (5-13). Most of these studies consist of two steps: feature extraction from the EEG signals and then classification of these features. In many study, extracted signals are derived at the time of seizure and at normal periods. So, classification performance of these studies were performance of these studies was quite high (9,11-21). In fact, visual assessment might be sufficient at that time. In that point, the question arise that whether there is an advanced statistical techniques clearly differentiate the normal and the patients with epilepsy while there is no seizure. In this study, our aim was to classify normal and epileptic patients by using their EEG data sets that derived from the archives of patients who known as epileptic (without a seizure activity at that time) and normal.

#### Materials and Methods

The EEG data were sampled from the EEG laboratory of the Department of Neurology and Clinical Neurophysiology in Adnan Menderes University. Study was approved by Adnan Menderes University Local Ethics Committee with (protocol approval number: 2016/873). The EEG data were recorded by Micromed EEG device (16 channels). Ten patients with epilepsy (5 male - 5 female, mean age 34±4 years) and 10 normal (5 male - 5 female, mean age 35±5 years) were the study group. EEG signals of patients with epilepsy were contains only seizure free- epochs. Nine mm, round, golden-cup electrodes were placed according to 10-20 international electrode placement system (Figure 1). Sampling frequency was 256 Hz. Reference montage (A1 and A2) measurements were derived from each channels and the duration of epoch was 30 seconds. In both group, each EEG datum was added one another and a single EEG data that was 300 seconds long were obtained. Thus each channel consisted of total 76800 samples and then for each of those channels, 30 rectangular windows were formed which consists of 256 discrete data. Finally; total of 600 EEG segments, 300 epileptic and 300 normal, were obtained. EEG data was retrospectively collected.

#### 2.1 Extracting Features with Discrete Wavelet Transform

Most of the biological signals like EEG are non-stationary signals. In other words, the amplitude, phase, and frequency of EEG signals are constantly changing signals. Various methods are used to analyse changes in the EEG signal (5). Wavelet transform (WT) is one of the most common methods which is used for time-frequency analysis of EEG signals. WT provides optimum time-frequency resolution over all frequency ranges (22). Therefore it has been widely used to provide a quantitative measure of the frequency distribution of the EEG and detect the presence of particular patterns (6).

WT analysis can be classified as two types: Continuous wavelet transform (CWT) and discrete WT (DWT). CWT is obtained by taking a projection of the signal to the functions created by scaling and shifting of a mother wavelet function. The mother wavelet function is a prototype function used to generate wavelets (23). According to the principle, the CWT is defined as:

where x(t), ψ, s and τ denote the signal to be processed, the wavelet function, scaling and shifting parameters, respectively. The different window functions used for the transformation are derived by shifting and scaling the mother wavelet. The shifting parameter τ changes the position of the window function on the signal. Hence the window moves on the signal. The scale parameter s expands or contracts the window function. Large values of s are suitable for general views and small values are suitable for detailed views. 1/√s is the normalization multiplier that ensures that the energy is the same for all values of s (13,22).

Calculation of the wavelet coefficients for every possible scale causes unnecessary information to be received from the signal. Moreover it takes a long time (24). If the scaling and shifting parameters are chosen as powers of 2, the analysis becomes more effective and faster (25). This method is called DWT and can be defined as:

where the parameters s and τ are replaced by 2j and 2j k.

In the DWT, the signal is decomposed into approximation and detail coefficients at the first level by using low and high pass filters (Figure 2). Then the approximation coefficients are further decomposed into next level of approximation and detail coefficients (26,27).

It is very important to determine the appropriate wavelet function and the level of decomposition. The level of decomposition is chosen based on the dominant frequency components of the signal (13).

In this study DWT was employed to decompose the EEG signals into different frequency bands for different wavelet functions. Due to its high success the Daubechies wavelet order in 4 (Db4) was used to construct the feature vectors (13,15). Since the EEG signals do not have any useful frequency components above 30 Hz, the number of levels was chosen to be 6. After decomposition, D1-D6 details and A6 approximation coefficients were obtained (Figure 3).

The extracted wavelet coefficients give a compact illustration that shows the energy distribution of the EEG signal in time and frequency. To characterize the EEG signals, statistics over the set of the wavelet coefficients were calculated (5, 6, 13, 27, 28). The following statistics were calculated from the wavelet coefficients:

1. Mean of the absolute values of the coefficients in each sub-band

2. The minimum value of the coefficients in each sub-band (min(x_{i })).

3. The maximum value of the coefficients in each sub-band (max(x_{i})).

4. Standard deviation of the coefficients in each sub-band

5. Entropy of the coefficients in each sub-band (S=∑p_{x}logp_{x}).

6. Energy of the coefficients in each sub-band (E=∑|x_{i}|^{2}).

Since the frequency components above 30 Hz is lack of use in epilepsy analysis, the features were extracted from D4 (16-32 Hz), D5 (8-16 Hz), D6 (4-8 Hz) detail coefficients and A6 (0-4 Hz) approximation coefficients. Thus 24 statistical features were obtained from each channel. In total 384 features were obtained from 16 channels and normalized in [0,1].

#### Dimensionality Reduction of Features

#### Principal Component Analysis

Principal component analysis (PCA) is a transformation technique that reduces the dimension of p-dimensional data set containing correlated variables to a lower dimensional space containing uncorrelated variables while preserving the existing variability in the data set as much as possible. The variables obtained by the transformation are called the principal components of the original variables. The first principal component captures the maximum variance in the data set and the others capture the remaining variance according to decreasing order (29,30).

The number of principal components that can be obtained for p number of variables is at most p, and the principal components are formed as linear combinations of variables (29). A linear combination of any **x** random vector can be expressed as:

where a_{11}, a_{21},…,a_{p1} are the weighting coefficients of the weight vector **a**_{1} and **y**_{1} represents the first principal component. The variance of **y**_{1} depends on the norm and direction of **a**_{1}. As the norm of a_{1} increases, the variance of **y**_{1} will also increases. Therefore it is aimed to obtain the maximum variance by introducing a constraint such that the norm of **a**_{1} is 1. Under this constraint, the variance of the first principal component expressed as (29):

where **C**_{x}=E[**xx'**] denotes covariance matrix. The result that maximizes Var(**a**_{1}) is obtained by calculation of eigenvectors **v**_{1},…,**v**_{n} corresponding to eigenvalues λ_{1},…,λ_{n} (λ_{1}≥.......≥λ_{n}) of **C**_{x} matrix. The first principal component is expressed as (29):

The second principal component with the constraint that is uncorrelated to the first principal component (E[**y**_{1}y_{2} ]=0) is expressed as (29):

By this way, the mth principal component such that 1≤ m ≤ p and E[**y**_{k} y_{m} ]=0 (k≠m) is expressed as (29):

PCA has been frequently used in studies on epilepsy diagnosis with EEG signals (13, 31, 32).

#### Independent Component Analysis

Independent component analysis (ICA) is a statistical method that tries to distinguish between multiple randomly mixed signals without knowing the mixing mechanism. ICA assumes that each measured signal is a linear combination of independent signals. It decomposes multidimensional data vector linearly to statistically independent components (5). The mixing model can be written as:

#### x=As

where x denotes the random vector whose elements are the mixtures x_{1},…,x_{n}, and s denotes the vector of the original source signals with elements s_{1},…,s_{n}, and A denotes the mixing matrix with elements a_{ij} (33). In equation (8) neither A nor s are known. If a matrix W can be found as the inverse of of A, the original source signals can be estimated. The estimated signals can be expressed as:

#### y=Wx

where y denotes the vector whose elements are the estimations of the original source signals (33). A number of algorithms have been developed for estimating W. One of these algorithms is the fast fixed-point algorithm (FastICA) developed by Hyvärinen (34). FastICA provides fast convergence, easy to apply and reliable results. In this study FastICA algorithm was used to estimate the W.

#### Classification of Features

#### Linear Discriminant Analysis

The goal of linear discriminant analysis (LDA) is to derive a discriminant function to maximize the difference between the groups. In LDA, the number of discriminant functions is determined according to the number of the groups. If there are two groups, then one discriminant function is used. A discriminant function consists of a linear combination of predictors. The weights of the predictors are calculated such that the ratio of the variance between classes to the variance within class is maximized. The discriminant function for two groups and p predictors expressed as:

D=w_{0}+w_{1} X_{1}+w_{2} X_{2}+.......+w_{p} X_{p}

where w_{0}, w_{i} and X_{i} (i = 1,…,p) denote the constant, the weights of the predictors and the predictors, respectively.

#### Support Vector Machine

In machine learning, support vector machines (SVMs) are supervised learning models with associated learning algorithms that analyse data used for classification and regression analysis (35).

SVM aims to find the best separating hyper plane (optimal hyper plane), with the maximum distance between observations in the two classes. The basic support vector classifier for linear separable data is shown in Figure 4. Where w is the normal of optimal hyper plane, b is bias and x is the features vector. The optimal hyper plane (**w**^{T}x+b=0) divides the plane into two sets depending on the sign of **w**^{T}x+b (36).

SVM maps the data that cannot be separated linearly into a higher dimensional space in which they can be separated linearly by using an appropriate kernel function (Figure 5) (9, 35, 37).

In this study, the linear kernel: K(x_{i},x_{j})=x_{i}.x_{j} and the radial basis function (RBF) kernel: K(x_{i},x_{j} )=exp(-γ || x_{i}-x_{j}|| ^{2}) were used as the kernel functions.

The hyper parameters C and γ of the SVM classifier were determined through a 10-fold cross-validation grid search performed and the parameters giving the highest accuracy were used in SVM classifiers.

#### Results

A comparison of both the effects of PCA and ICA on the classification performance and the classification performances of LDA and SVM with linear and RBF kernels was done on the EEG data that derived from the normal and patients with epilepsy. Classifications were performed for three different feature matrices: (a) features without dimension reduction (384 features), (b) features being reduced by PCA (34 features) and the and (3) features being reduced by ICA (30 features). The eigenvalues-greater-than-one rule proposed by Kaiser (1960) was used to determine for the number of components. These three feature matrices were classified by using LDA, SVM with linear kernel and RBF kernel. Sensitivity, specificity and accuracy rates were used for the performance measures of the classifiers. Additionally, for all feature matrices used in the classifications, training and test data sets were randomly divided into two parts: 70% training (n=420) and 30% test (n=180) data corresponding to the same points. The process performed in the application is given in Figure 6.

In training sets; among the classifiers, SVM with RBF kernel reached to the highest accuracy rate (96.2%) at the features without dimension reduction, SVM with linear kernel reached to the highest at the (94.7%) at similar to the previous; LDA reached to the highest rate (79.8%) at the features reduced by PCA (Table 1).

In test sets, the highest accuracy (88.9%) was obtained by SVM with RBF kernel without dimension reduction. The highest accuracy (82.2%) for SVM with linear kernel was obtained with the features being reduced by PCA. The highest accuracy (78.9%) for LDA was obtained for the features being reduced by both PCA and ICA (Table 2).

#### Discussion

A number of studies have been done to classify EEG signals for the diagnosis of epilepsy. A direct comparison of the previous studies that using EEG signals is hard due to the variety of EEG datasets, wavelet types, decomposition levels and also the variety of the statistical features used in the classification process (15). Previously, many researchers used the same EEG data which included five sets (named as A-E) described by Andrzejak, et al. (38). Set A and B were obtained from normal, set C-E were obtained from patients with epilepsy. Set C and D were included seizure-free interval while set E were include seizure-related interval. Wavelet-based features that obtained from these sets (Set E and the others) were used to assess the performances of classifiers and to detection of a seizure activity. Nearly or exactly 100% accuracy rates were obtained with different classification methods. Among them, Xie and Krishnan (39) used k-nearest neighborhood method, Kumar et al. (11) used artificial neural network, Das et al. (9) used SVM method and all of these studies accuracy rates were 100%. In these sets, there were significant differences between the signals that were derived at the time of seizure and at normal periods, and no need to use of complicated statistical methods instead of using a simple threshold value or even visual assessment. It was also difficult to say that any classification method was superior to other methods. Different from these studies, we did not use an epileptic seizure activity in our EEG data set in patients with epilepsy. As a result, classification attempts of our data were more difficult than the previous data set and showed the real discrimination ability of these methods.

Orhan et al. (40) classified the wavelet-based features that obtained from data set A and D described by Andrzejak, et al. (38) and obtained 96% accuracy rate with a multilayer perception neural network model. Subasi and Ercelebi (41) used similar data sets that we used in this study. They classified the wavelet-based features and obtained respectively 93.0% and 89.0% accuracy rates for artificial neural network and logistic regression. In this study, we also classified some wavelet-based features and got respectively 88.9%, 82.2% and 78.9% accuracy rates for SVM with RBF kernel, SVM with linear kernel, and LDA. One of the most important reasons for different accuracy rates among the studies was the using of different data sets. The ratio of epileptic abnormalities in EEG data sets was variable and it made difficult to compare of the classification studies.

There is still a debate that linear or non-linear method has more successful than each others for classifying of EEG signals. Garrett et al. (42) classified five different mental states using EEG signals and achieved respectively 66.0%, 69.4% and 72.0% accuracy rates for LDA, artificial neural network and SVM with RBF kernel. They favoured that non-linear methods were more successful than linear methods for the classification of EEG signals. Lehmann, et al. (43) did the same classification of the EEG data that derived from the Alzheimer patients and normal, and they got 91.0% and 95.0% accuracy rates with LDA and SVM with RBF kernel. Supriya et al. (19) classified the EEG data sets A and E [described by Andrzejak et al. (38)] and achieved 86.87%, 99.25% and 100% accuracy rates with LDA, SVM with linear kernel and SVM with polynomial kernel. In this study, we achieved respectively 78.9%, 82.2% and 88.9% accuracy rates with LDA, SVM with linear kernel and SVM with RBF kernel. In this context, we have achieved similar results with previous studies.

Subasi and Gursoy (13) used PCA, ICA and LDA to reduce the dimension of the features that obtained from EEG signals and compared the performances on classification success. They used SVM with RBF kernel to classify the reduced features and achieved respectively 98.7%, 99.5% and 100% accuracy rates for PCA, ICA and LDA. We used SVM with RBF kernel to classify the reduced features and for ICA we found the highest accuracy rate 84.4% and for PCA the highest accuracy rate 82.8%. Hence we found similar result with Subasi and Gursoy (13). In addition, with dimensionally reduction, both the performance of LDA and SVM with linear kernel were increased. The highest classification performances in all data sets were got by using SVM with RBF kernel. It achieved the highest accuracy rate (88.9%) for the features without dimension reduction. It was noted that the dimension reduction methods were adversely affected the performance of its.

#### Conclusion

In this study, the classification performances of SVM and LDA, which are widely used for computer supported diagnose of epilepsy, were compared by using wavelet-based features extracted from EEG signals. In addition, PCA and ICA were used to determine the effects of dimension reduction on the classification success. Results showed that, SVM with RBF kernel achieved the highest accuracy rate (88.9%) for the features without dimension reduction. The dimension reduction methods PCA and ICA improved classification performances of LDA and SVM with linear kernel, but decreased the classification performance of SVM with RBF kernel. Consequently, with dimensionally reduction, LDA and SVM with linear kernel perform better classifications.

Ethics

Ethics Committee Approval: Adnan Menderes University Ethics Committee. (approval no: 2016/873).

Informed Consent: It was not taken.

Peer-review: Externally peer-reviewed.

Authorship Contributions

Surgical and Medical Practices: N.K., Concept: M.T., H.Ö., Design: M.T., İ.K.Ö., H.Ö., Data Collection or Processing: N.K., M.T., H.Ö., Analysis or Interpretation: M.T., H.Ö., İ.K.Ö., Literature Search: M.T., H.Ö. Writing: H.Ö.

Conflict of Interest: No conflict of interest was declared by the authors.

Financial Disclosure: The authors declared that this study received no financial support.