Getting Start

CDSC_AL: A Clustering-based Data Stream Classification framework using Active Learning

The "Supplemental Result.pdf" includes the results for comparison with semi-supervised methods using 5%, 15%, 20% labeled data. Also, the comparison results between supervised methods and CDSC-AL method with 5%, 15%, and 20% labeled data respectively.

Example Usage

There are two python codes with different settings for the benchmark data streams:

The main_final_draft.py file is developed for arranging data streams to have abrupt drifts and run this code on

Synthetic-1, Synthetic-2, Sea, and Shuttle

The main_final_draft4.py file is developed for simulating data streams with gradual concept drift and run this code on

KDD cup 99, Forest covtype, Gas Sensor Drift, MNIST, CiFAR-10

The two synthetic datasets (Synthetic-1 and Synthetic-2) are generated by the authors and thus we include them here. For the remaining seven datasets, it can found from the following links:

https://archive.ics.uci.edu/ml/index.php
http://users.rowan.edu/ ∼polikar/nse.html

To run the "main_final_draft.py" or "main_final_draft4.py" code with different datasets, go to line 17 to change the name of dataset.

In line 11, the global variable label_ratio allows for users to change the proportion of labeled data in each incoming data chunk.

Two different evaluation metrics are used:

BAcc1Hist: A vector of the Balanced Classification Accuracy values for the entire data streams
F1Hist: A vector of the Macro-average values of the F1-score for the entire data streams

Dependencies:

Numpy
Pandas
Scikit-learn
Scipy

Citation Format

For any use of this project, please refer to the following article:

Yan, Xuyang and Homaifar, Abdollah and Sarkar, Mrinmoy and Girma, Abenezer and Tunstel, Edward. "A Clustering-based framework for Classifying Data Streams." In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI2021).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Getting Start

Example Usage

Dependencies:

Citation Format

Files

README.md

Latest commit

History

README.md

File metadata and controls

Getting Start

Example Usage

Dependencies:

Citation Format