Experimental datasets and algorithms

Datasets

For the experiments, 36 standard classification datasets with numeric values were selected from the KEEL-dataset repository.

Dataset

Records

Attributes

Classes

Imbalance Score

1

shuttle

57999

9

7

0.1143

2

ring

7400

20

2

0.9807

3

heart

270

13

2

0.8000

4

optdigits

5620

64

10

0.9858

5

thyroid

7200

21

3

0.1771

6

spambase

4597

57

2

0.6506

7

hepatitis

80

19

2

0.1940

8

vowel

990

13

11

1.0000

9

titanic

2201

3

4

0.5380

10

zoo

101

16

7

0.4364

11

appendicitis

106

7

2

0.2471

12

texture

5500

40

11

1.0000

13

segment

2310

19

7

1.0000

14

twonorm

7400

20

2

0.9984

15

australian

690

14

2

0.8016

16

phoneme

5404

5

2

0.4154

17

wine

178

13

3

0.7735

18

newthyroid

215

5

3

0.4302

19

mammographic

830

5

2

0.9438

20

movement_libras

360

90

15

1.0000

21

dermatology

358

34

6

0.5647

22

led7digit

500

7

10

0.8739

23

wisconsin

683

9

2

0.5383

24

spectfheart

267

44

2

0.2594

25

hayes-roth

160

4

3

0.6486

26

cleveland

297

13

5

0.4136

27

banana

5300

2

2

0.8126

28

satimage

6435

36

6

0.6479

29

tae

151

5

3

0.9613

30

contraceptive

1473

9

3

0.6645

31

marketing

6876

13

9

0.7003

32

monk-2

432

6

2

0.8947

33

glass

214

9

6

0.4079

34

bupa

345

6

2

0.7250

35

coil2000

9822

85

2

0.0634

36

page-blocks

5472

10

5

0.2135

Table 1. Datasets used in experiments.

All datasets used in this research are publicly available; thus, the results can be reproduced. Every dataset defines a supervised classification problem where each of the dataset’s examples is composed of several numerical attributes and the example’s class. To reliably evaluate the SEVQ algorithm, during selection we considered the instances, attributes and classes, and data collections were obtained from multiple domains. The datasets downloaded from the KEEL-dataset repository had already been partitioned by means of a 10-fold stratified cross-validation procedure.

Table 1 shows the details of the datasets selected for the experiments, including the name, counts of instances, attributes and classes (the latter being the number of possible values of the output variable), and an imbalance score of each dataset. Each imbalance score shown in the table represents the average of imbalances calculated for the respective pairs of classes.

Algorithms

In our study, we consider two groups of classifiers that were compared with the SEVQ algorithm. The first group contains traditional algorithms, most of which were implemented using the scikit-learn library, and the sole exception - the XGB method - was implemented using the xgboost library.

  • AdaBoostClassifier (AB) - The AdaBoost-SAMME implementation was used, which extends the original AdaBoost algorithm to the multiclass case without reducing it to multiple two-class problems. The parameter of the maximum number of estimators at which boosting was terminated was set to 50, and the learning rate shrank the contribution of each classifier by 1.

  • DecisionTreeClassifier (DT) is a nonparametric supervised learning method used for classification and regression. It creates a multiway tree finding for each node the categorical feature that will yield the largest information gain for categorical targets. The optimized version of the CART algorithm was used, and the maximum depth of the tree was set to 5.

  • GaussianNB (GNB) is an implementation of a Gaussian naive Bayes learning algorithm based on applying the Bayes’ theorem with the naive assumption of conditional independence between every pair of features given the value of the class variable. The portion of the largest variance of all features that is added to the variances for calculation stability was set to 1*10^(-9).

  • KNeighborsClassifier (KNN) is a simple, effective and nonparametric classification method. For each data record to be classified, its k nearest neighbors are retrieved, and they form its neighborhood. Majority voting among the data records in the neighborhood is usually used to decide the classification for that record with or without consideration of distance-based weights. The number of neighbors was set to 3.

  • MLPClassifier (MLP) is a multilayer perceptron classifier that optimizes the logarithm of the loss function using a stochastic gradient-based optimizer. The L2 penalty parameter was set to 1, and the maximum number of iterations was set to 10,000.

  • NearestCentroid (NC) is a classification method that calculates the centroids for each class and assigns to a given observation the label of the class of training samples with the centroid that is closest to the observation. The (Euclidean) distance metric was used.

  • QuadraticDiscriminantAnalysis (QDA) is a classifier with a quadratic decision boundary, generated by fitting the conditional densities of classes to the data and using the Bayes’ rule. Class proportions were inferred from the training data.

  • RandomForestClassifier (RF) - A random forest is a meta-estimator that fits several decision tree classifiers to various subsamples of the dataset and uses averaging to improve predictive accuracy and control overfitting. The number of trees in the forest was set to 10; the maximum depth of the tree was 5, and the number of features considered when looking for the best split was set to 1.

  • C-Support Vector Classification (SVC) is an implementation of SVM based on LibSVM with a radial basis kernel function. This method uses structural risk minimization, and the risk is minimized by maximizing the margin between the separating hyperplane and the data point closest to the hyperplane.

  • XGBoost (XGB) is a highly scalable decision tree ensemble based on gradient boosting. XGB builds an additive expansion of the objective function by minimizing a loss function. The maximum depth of each tree was set to 5, and the error evaluation for multiclass training was used.

The second group contains incremental algorithms implemented using the scikit-multiflow and NeuPy librariers , except for the SFAM algorithm with an implementation based on AIOpenLab’s code and our custom implementation of EVQ.

  • AdditiveExpertEnsembleClassifier (AEE) is a method for using any online learner for drifting concepts. The maximum number of estimators to hold was set to 5, weights were decreased by 0.8, and the weight of new experts as a ratio to the total ensemble weight was set to 0.1.

  • AdaptiveRandomForestClassifier (ARF) is an algorithm for classification of evolving data streams using the effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimization for different datasets. The parameter of the number of trees in the ensemble was set to 10, and the lambda value for bagging was 6.

  • DynamicWeightedMajorityClassifier (DWM) is an algorithm that copes with concept drift by training online learners of the ensemble, weighting such learners based on their performance, removing them also based on their performance, and adding new experts based on the global performance of the ensemble. The maximum number of estimators to be held was set to 5; the interval between expert removal, creation, and weight update was 50; weights were decreased by 0.5, and the minimum fraction of weight per model was 0.01.

  • ExtremelyFastDecisionTreeClassifier (EFDT) is a method that constructs a tree incrementally by seeking to select and perform a split as soon as there is confidence that the split will be useful, and reevaluating that decision, replacing the split if it subsequently becomes evident that a better split is available. The number of instances a leaf should observe between split attempts was 200, and the number of instances a node should observe before reevaluating the best split was 20.

  • HoeffdingAdaptiveTreeClassifier (HAT) is a method that monitors the performance of branches on the tree and replaces them with new branches when their accuracy decreases if the new branches are more accurate. The default parameter settings were used.

  • HoeffdingTreeClassifier (HT) is an incremental anytime decision tree induction algorithm capable of learning from massive data streams, assuming that the distribution of generated examples does not change over time. The implementation based on MOA was used. The number of instances a leaf should observe between split attempts was 200.

  • KNNClassifier (KNNI) is an incremental version of the KNN classifier that keeps track of the last maximum size of the window storing the last observed samples. The number of nearest neighbors was set to 5, and the maximum number of samples that could be stored in one leaf node was 30.

  • NaiveBayes (NB) is a classifier that provides classical Bayesian predictions while naively assuming that all inputs are independent. All attributes were numerical.

  • OzaBaggingClassifier (OB) is an ensemble learning method that improves the bagging ensemble method for the batch setting; this version of the method can effectively handle data streams. The size of the ensemble was set to 10. The KNN classifier with the ADWIN change detector was used as the base estimator.

  • SimplifiedFuzzyARTMAP (SFAM) is a simpler and faster variant of FAM that provides the same functionality as the original FAM but uses a simpler structure. The algorithm’s implementation was based on AIOpenLab’s code; the learning rate was set to 1 (fast learning); the regularization term was 0.0001; vigilance was 0.75, and a complement coding scheme for inputs was used.

  • EvolvingVectorQuantization (EVQ) - is a supervised version of the original VQ which is able to evolve new clusters on demand by comparing new incoming samples with already generated clusters. It includes the label information in the training process by introducing a hit matrix and extending the feature space, and it comes with a new weighted classification strategy. We used our custom implementation based on EVQ-Class (variant B) pseudocode included in with vigilance parameter value set to 0.2.

  • Learning Vector Quantization (LVQ) - is a neural network-based algorithm strictly meant for statistical classification or recognition. It defines class regions in the input data space by placing a subset of similarly labeled prototype vectors into each class region. The quantization regions are defined by mid-planes between neighboring prototype vectors. The default parameter settings were used.

  • Learning Vector Quantization 2 (LVQ2) - an improved version of the LVQ algorithm. The classification decision in this algorithm is identical to the LVQ. However, in learning two prototype vectors that are the nearest neighbors are updated simultaneously. The default parameter settings were used.

  • Learning Vector Quantization 2.1 (LVQ2.1) - an improved version of the LVQ2 algorithm that allows the mid plane be the closest prototype vector. The default parameter settings were used.

  • Learning Vector Quantization 3 (LVQ3) - an improved version of the LVQ2.1 algorithm, which was developed to overcome the reference vectors diverging drawback. The default parameter settings were used, except the step parameter, which was set to 0.01 to get better results.