Experimental datasets and algorithms¶
Datasets¶
For the experiments, 36 standard classification datasets with numeric values were selected from the KEEL-dataset repository.
Dataset |
Records |
Attributes |
Classes |
Imbalance Score |
|
|---|---|---|---|---|---|
1 |
shuttle |
57999 |
9 |
7 |
0.1143 |
2 |
ring |
7400 |
20 |
2 |
0.9807 |
3 |
heart |
270 |
13 |
2 |
0.8000 |
4 |
optdigits |
5620 |
64 |
10 |
0.9858 |
5 |
thyroid |
7200 |
21 |
3 |
0.1771 |
6 |
spambase |
4597 |
57 |
2 |
0.6506 |
7 |
hepatitis |
80 |
19 |
2 |
0.1940 |
8 |
vowel |
990 |
13 |
11 |
1.0000 |
9 |
titanic |
2201 |
3 |
4 |
0.5380 |
10 |
zoo |
101 |
16 |
7 |
0.4364 |
11 |
appendicitis |
106 |
7 |
2 |
0.2471 |
12 |
texture |
5500 |
40 |
11 |
1.0000 |
13 |
segment |
2310 |
19 |
7 |
1.0000 |
14 |
twonorm |
7400 |
20 |
2 |
0.9984 |
15 |
australian |
690 |
14 |
2 |
0.8016 |
16 |
phoneme |
5404 |
5 |
2 |
0.4154 |
17 |
wine |
178 |
13 |
3 |
0.7735 |
18 |
newthyroid |
215 |
5 |
3 |
0.4302 |
19 |
mammographic |
830 |
5 |
2 |
0.9438 |
20 |
movement_libras |
360 |
90 |
15 |
1.0000 |
21 |
dermatology |
358 |
34 |
6 |
0.5647 |
22 |
led7digit |
500 |
7 |
10 |
0.8739 |
23 |
wisconsin |
683 |
9 |
2 |
0.5383 |
24 |
spectfheart |
267 |
44 |
2 |
0.2594 |
25 |
hayes-roth |
160 |
4 |
3 |
0.6486 |
26 |
cleveland |
297 |
13 |
5 |
0.4136 |
27 |
banana |
5300 |
2 |
2 |
0.8126 |
28 |
satimage |
6435 |
36 |
6 |
0.6479 |
29 |
tae |
151 |
5 |
3 |
0.9613 |
30 |
contraceptive |
1473 |
9 |
3 |
0.6645 |
31 |
marketing |
6876 |
13 |
9 |
0.7003 |
32 |
monk-2 |
432 |
6 |
2 |
0.8947 |
33 |
glass |
214 |
9 |
6 |
0.4079 |
34 |
bupa |
345 |
6 |
2 |
0.7250 |
35 |
coil2000 |
9822 |
85 |
2 |
0.0634 |
36 |
page-blocks |
5472 |
10 |
5 |
0.2135 |
Table 1. Datasets used in experiments.
All datasets used in this research are publicly available; thus, the results can be reproduced. Every dataset defines a supervised classification problem where each of the dataset’s examples is composed of several numerical attributes and the example’s class. To reliably evaluate the SEVQ algorithm, during selection we considered the instances, attributes and classes, and data collections were obtained from multiple domains. The datasets downloaded from the KEEL-dataset repository had already been partitioned by means of a 10-fold stratified cross-validation procedure.
Table 1 shows the details of the datasets selected for the experiments, including the name, counts of instances, attributes and classes (the latter being the number of possible values of the output variable), and an imbalance score of each dataset. Each imbalance score shown in the table represents the average of imbalances calculated for the respective pairs of classes.
Algorithms¶
In our study, we consider two groups of classifiers that were compared with the SEVQ algorithm. The first group contains traditional algorithms, most of which were implemented using the scikit-learn library, and the sole exception - the XGB method - was implemented using the xgboost library.
AdaBoostClassifier (AB) - The AdaBoost-SAMME implementation was used, which extends the original AdaBoost algorithm to the multiclass case without reducing it to multiple two-class problems. The parameter of the maximum number of estimators at which boosting was terminated was set to 50, and the learning rate shrank the contribution of each classifier by 1.
DecisionTreeClassifier (DT) is a nonparametric supervised learning method used for classification and regression. It creates a multiway tree finding for each node the categorical feature that will yield the largest information gain for categorical targets. The optimized version of the CART algorithm was used, and the maximum depth of the tree was set to 5.
GaussianNB (GNB) is an implementation of a Gaussian naive Bayes learning algorithm based on applying the Bayes’ theorem with the naive assumption of conditional independence between every pair of features given the value of the class variable. The portion of the largest variance of all features that is added to the variances for calculation stability was set to 1*10^(-9).
KNeighborsClassifier (KNN) is a simple, effective and nonparametric classification method. For each data record to be classified, its k nearest neighbors are retrieved, and they form its neighborhood. Majority voting among the data records in the neighborhood is usually used to decide the classification for that record with or without consideration of distance-based weights. The number of neighbors was set to 3.
MLPClassifier (MLP) is a multilayer perceptron classifier that optimizes the logarithm of the loss function using a stochastic gradient-based optimizer. The L2 penalty parameter was set to 1, and the maximum number of iterations was set to 10,000.
NearestCentroid (NC) is a classification method that calculates the centroids for each class and assigns to a given observation the label of the class of training samples with the centroid that is closest to the observation. The (Euclidean) distance metric was used.
QuadraticDiscriminantAnalysis (QDA) is a classifier with a quadratic decision boundary, generated by fitting the conditional densities of classes to the data and using the Bayes’ rule. Class proportions were inferred from the training data.
RandomForestClassifier (RF) - A random forest is a meta-estimator that fits several decision tree classifiers to various subsamples of the dataset and uses averaging to improve predictive accuracy and control overfitting. The number of trees in the forest was set to 10; the maximum depth of the tree was 5, and the number of features considered when looking for the best split was set to 1.
C-Support Vector Classification (SVC) is an implementation of SVM based on LibSVM with a radial basis kernel function. This method uses structural risk minimization, and the risk is minimized by maximizing the margin between the separating hyperplane and the data point closest to the hyperplane.
XGBoost (XGB) is a highly scalable decision tree ensemble based on gradient boosting. XGB builds an additive expansion of the objective function by minimizing a loss function. The maximum depth of each tree was set to 5, and the error evaluation for multiclass training was used.
The second group contains incremental algorithms implemented using the scikit-multiflow and NeuPy librariers , except for the SFAM algorithm with an implementation based on AIOpenLab’s code and our custom implementation of EVQ.
AdditiveExpertEnsembleClassifier (AEE) is a method for using any online learner for drifting concepts. The maximum number of estimators to hold was set to 5, weights were decreased by 0.8, and the weight of new experts as a ratio to the total ensemble weight was set to 0.1.
AdaptiveRandomForestClassifier (ARF) is an algorithm for classification of evolving data streams using the effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimization for different datasets. The parameter of the number of trees in the ensemble was set to 10, and the lambda value for bagging was 6.
DynamicWeightedMajorityClassifier (DWM) is an algorithm that copes with concept drift by training online learners of the ensemble, weighting such learners based on their performance, removing them also based on their performance, and adding new experts based on the global performance of the ensemble. The maximum number of estimators to be held was set to 5; the interval between expert removal, creation, and weight update was 50; weights were decreased by 0.5, and the minimum fraction of weight per model was 0.01.
ExtremelyFastDecisionTreeClassifier (EFDT) is a method that constructs a tree incrementally by seeking to select and perform a split as soon as there is confidence that the split will be useful, and reevaluating that decision, replacing the split if it subsequently becomes evident that a better split is available. The number of instances a leaf should observe between split attempts was 200, and the number of instances a node should observe before reevaluating the best split was 20.
HoeffdingAdaptiveTreeClassifier (HAT) is a method that monitors the performance of branches on the tree and replaces them with new branches when their accuracy decreases if the new branches are more accurate. The default parameter settings were used.
HoeffdingTreeClassifier (HT) is an incremental anytime decision tree induction algorithm capable of learning from massive data streams, assuming that the distribution of generated examples does not change over time. The implementation based on MOA was used. The number of instances a leaf should observe between split attempts was 200.
KNNClassifier (KNNI) is an incremental version of the KNN classifier that keeps track of the last maximum size of the window storing the last observed samples. The number of nearest neighbors was set to 5, and the maximum number of samples that could be stored in one leaf node was 30.
NaiveBayes (NB) is a classifier that provides classical Bayesian predictions while naively assuming that all inputs are independent. All attributes were numerical.
OzaBaggingClassifier (OB) is an ensemble learning method that improves the bagging ensemble method for the batch setting; this version of the method can effectively handle data streams. The size of the ensemble was set to 10. The KNN classifier with the ADWIN change detector was used as the base estimator.
SimplifiedFuzzyARTMAP (SFAM) is a simpler and faster variant of FAM that provides the same functionality as the original FAM but uses a simpler structure. The algorithm’s implementation was based on AIOpenLab’s code; the learning rate was set to 1 (fast learning); the regularization term was 0.0001; vigilance was 0.75, and a complement coding scheme for inputs was used.
EvolvingVectorQuantization (EVQ) - is a supervised version of the original VQ which is able to evolve new clusters on demand by comparing new incoming samples with already generated clusters. It includes the label information in the training process by introducing a hit matrix and extending the feature space, and it comes with a new weighted classification strategy. We used our custom implementation based on EVQ-Class (variant B) pseudocode included in with vigilance parameter value set to 0.2.
Learning Vector Quantization (LVQ) - is a neural network-based algorithm strictly meant for statistical classification or recognition. It defines class regions in the input data space by placing a subset of similarly labeled prototype vectors into each class region. The quantization regions are defined by mid-planes between neighboring prototype vectors. The default parameter settings were used.
Learning Vector Quantization 2 (LVQ2) - an improved version of the LVQ algorithm. The classification decision in this algorithm is identical to the LVQ. However, in learning two prototype vectors that are the nearest neighbors are updated simultaneously. The default parameter settings were used.
Learning Vector Quantization 2.1 (LVQ2.1) - an improved version of the LVQ2 algorithm that allows the mid plane be the closest prototype vector. The default parameter settings were used.
Learning Vector Quantization 3 (LVQ3) - an improved version of the LVQ2.1 algorithm, which was developed to overcome the reference vectors diverging drawback. The default parameter settings were used, except the step parameter, which was set to 0.01 to get better results.