Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding loader for OrchideaSOL #547

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

lucaspbastos
Copy link

[WIP] Adding loader for OrchideaSOL

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

  • Describe annotations included in the dataset
  • Indicate the size of the datasets (e.g. number files and duration, hours)
  • Mention the origin of the dataset (e.g. creator, institution)
  • Describe the type of music included in the dataset
  • Indicate any relevant papers related to the dataset
  • Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
  • Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.
  • Create a module in mirdata, e.g. mirdata/my_dataset.py
  • Create tests for your loader in tests/datasets/, e.g. test_my_dataset.py
  • Add your module to docs/source/mirdata.rst and docs/source/quick_reference.rst
  • Run tests/test_full_dataset.py on your dataset.

If your dataset is not fully downloadable there are two extra steps you should follow:

  • Contacting the mirdata organizers by opening an issue or PR so we can discuss how to proceed with the closed dataset.
  • Show that the version used to create the checksum is the "canonical" one, either by getting the version from the dataset creator, or by verifying equivalence with several other copies of the dataset.
  • Make sure someone has run pytest -s tests/test_full_dataset.py --local --dataset my_dataset once on your dataset locally and confirmed it passes

@harshpalan harshpalan closed this Oct 19, 2022
@harshpalan harshpalan reopened this Oct 19, 2022
@harshpalan
Copy link
Collaborator

Hello @lucaspbastos, thank you for contributing to mirdata. If you could solve the conflicts, we can help you further with errors in the scripts and get this pull request merged asap. Do let us know if you have any questions.

@guillemcortes guillemcortes added the new loader request to add a specific dataset loader label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new loader request to add a specific dataset loader
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants