-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasets for benchmarking strategies #8
Conversation
…eprocessing of datasets on the UCI database
…UCI (and some others that are stored as R mat files or excel xls files)
…torage on some UCI datasets
…l dataset instead of just the train portion using concatdataset
Updated with some tests for Unfortunately I don't have the rights to access the secrets settings for the repo so I can't finish the instructions there. I've added a reference documentation page. This could be followed by another tutorial at a later date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Modules to download and process datasets from online sources into
torch.utils.data.Dataset
instances, with additional attributes for (stratified) k-fold CV as described in the paper.This incurs a few new dependencies namely
openpyxl
,xlrd
, andpyreadr
for processing the excel and R data storage formats for original raw datas.Also included are utility functions for transforming each of the datasets into datamanagers that have "cold" or "warm" label initialisations for benchmarking AL strategies on the datasets.
Includes tests for all the modules implemented