Skip to content

Latest commit

 

History

History
41 lines (25 loc) · 1.99 KB

README.md

File metadata and controls

41 lines (25 loc) · 1.99 KB

Getting Start

CDSC_AL: A Clustering-based Data Stream Classification framework using Active Learning

The "Supplemental Result.pdf" includes the results for comparison with semi-supervised methods using 5%, 15%, 20% labeled data. Also, the comparison results between supervised methods and CDSC-AL method with 5%, 15%, and 20% labeled data respectively.

Example Usage

There are two python codes with different settings for the benchmark data streams:

  1. The main_final_draft.py file is developed for arranging data streams to have abrupt drifts and run this code on
Synthetic-1, Synthetic-2, Sea, and Shuttle
  1. The main_final_draft4.py file is developed for simulating data streams with gradual concept drift and run this code on
KDD cup 99, Forest covtype, Gas Sensor Drift, MNIST, CiFAR-10

The two synthetic datasets (Synthetic-1 and Synthetic-2) are generated by the authors and thus we include them here. For the remaining seven datasets, it can found from the following links:

  1. https://archive.ics.uci.edu/ml/index.php

  2. http://users.rowan.edu/ ∼polikar/nse.html

To run the "main_final_draft.py" or "main_final_draft4.py" code with different datasets, go to line 17 to change the name of dataset.

In line 11, the global variable label_ratio allows for users to change the proportion of labeled data in each incoming data chunk.

Two different evaluation metrics are used:

  1. BAcc1Hist: A vector of the Balanced Classification Accuracy values for the entire data streams

  2. F1Hist: A vector of the Macro-average values of the F1-score for the entire data streams

Dependencies:

  • Numpy
  • Pandas
  • Scikit-learn
  • Scipy

Citation Format

For any use of this project, please refer to the following article:

  • Yan, Xuyang and Homaifar, Abdollah and Sarkar, Mrinmoy and Girma, Abenezer and Tunstel, Edward. "A Clustering-based framework for Classifying Data Streams." In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI2021).