Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update M3DocVQA download - fix bugs & support parallel download & simplify splits #7

Merged
merged 1 commit into from
Feb 15, 2025

Conversation

j-min
Copy link
Contributor

@j-min j-min commented Feb 15, 2025

Several scripts about M3DocVQA downloading were outdated (e.g., #2).
This PR fixes the bug + supports parallel download + simplifies file splitting (split document ids of train/dev splits first, instead of splitting them after downloading the whole train/dev splits)

@oir this is the same as PR #3 but with the current main branch to resolve conflicts.

@oir oir merged commit 29e6ac2 into bloomberg:main Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants