PyDRO:

A Python reimplementation of the Distributional Random Oversampling method for binary text classification

This repo is a stand-alone (re)implementation of the Distributional Random Oversampling (DRO) method presented in SIGIR'16. The former implementation was part of the JaTeCs framework for Java.

Distributional Random Oversampling (DRO) is an oversampling method to counter data imbalance in binary text classification. DRO generates new random minority-class synthetic documents by exploiting the distributional properties of the terms in the collection. The variability introduced by the oversampling method is enclosed in a latent space; the original space is replicated and left untouched.

It comes with a main file showing an example of how to use it on Reuters-21578.

Reference:

@inproceedings{moreo2016distributional,
  title={Distributional Random Oversampling for Imbalanced Text Classification},
  author={Moreo, Alejandro and Esuli, Andrea and Sebastiani, Fabrizio},
  booktitle={SIGIR 2016, 39th ACM Conference on Research and Development in Information Retrieval, Pisa, IT},
  year={2016}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyDRO:

A Python reimplementation of the Distributional Random Oversampling method for binary text classification

About

Releases

Packages

Languages

HLT-ISTI/pydro

Folders and files

Latest commit

History

Repository files navigation

PyDRO:

A Python reimplementation of the Distributional Random Oversampling method for binary text classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages