Skip to content

Conversation

@paulmorio
Copy link
Collaborator

Modules to download and process datasets from online sources intotorch.utils.data.Dataset instances, with additional attributes for (stratified) k-fold CV as described in the paper.

This incurs a few new dependencies namely openpyxl, xlrd, and pyreadr for processing the excel and R data storage formats for original raw datas.

Also included are utility functions for transforming each of the datasets into datamanagers that have "cold" or "warm" label initialisations for benchmarking AL strategies on the datasets.

Includes tests for all the modules implemented

…UCI (and some others that are stored as R mat files or excel xls files)
…l dataset instead of just the train portion using concatdataset
@paulmorio
Copy link
Collaborator Author

Updated with some tests for uci_datasets I looked into updating the coverage dynamically and the best solution I've come across so far is described here for github actions: https://github.com/marketplace/actions/dynamic-badges

Unfortunately I don't have the rights to access the secrets settings for the repo so I can't finish the instructions there.

I've added a reference documentation page. This could be followed by another tutorial at a later date.

thomasgaudelet
thomasgaudelet previously approved these changes May 9, 2022
a-pouplin
a-pouplin previously approved these changes May 11, 2022
@paulmorio paulmorio dismissed stale reviews from a-pouplin and thomasgaudelet via f6c0513 May 22, 2022 15:26
thomasgaudelet
thomasgaudelet previously approved these changes May 23, 2022
Copy link
Contributor

@thomasgaudelet thomasgaudelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

a-pouplin
a-pouplin previously approved these changes May 24, 2022
@paulmorio paulmorio dismissed stale reviews from a-pouplin and thomasgaudelet via 110cc48 May 25, 2022 12:09
@paulmorio paulmorio merged commit 86899d1 into main May 26, 2022
@thomasgaudelet thomasgaudelet deleted the datasets branch May 30, 2022 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants