New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Size or other cut-off points for building git-annex links #674

Open

emmetaobrien opened this issue Jul 14, 2021 · 10 comments

Assignees

Labels

Discussion Required

Collaborator

emmetaobrien commented Jul 14, 2021

Right now, our general policy when ingesting new datasets is to build a git-annex link to every file in a dataset by default, with a couple of specific exceptions (README.md and DATS.json). However, the utility of building links rather than just storing small files directly in github is questionable, and in tests with the microstructure_informed_connectomics dataset, which contains ~11,300 files, building git-annex links to each file took nearly twice as long as building links only to files larger than a cut-off of 200kb (estimated by manual examination of some subdirectories) and downloading the rest directly.

Do we want to consider size-based or other criteria for which files get git-annex links (such as storing all text files directly) ?

emmetaobrien added the Discussion Required label

emmetaobrien assigned jbpoline, samirdas and cmadjar

emmetaobrien mentioned this issue

Create sftp-crawler.pl #671

Merged

Collaborator

cmadjar commented Sep 24, 2021

This might be tricky for datasets that require third-party accounts since small files can still include data that should not be in the open.

For fully open datasets, I don't see the harm in doing that. @emmetaobrien maybe it could be something to add to the agenda of next week or the week after if we do not have time since we might be doing the roadmap planning?

Collaborator Author

emmetaobrien commented Sep 24, 2021

Indeed, I was only thinking of this as applying to open datasets.

cmadjar added this to CONP winter 2022

github-actions bot commented Feb 22, 2022

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions bot added the Stale label

github-actions bot commented May 23, 2022

This issue was closed because it has been stalled for 3 months with no activity.

github-actions bot closed this as completed

github-actions bot moved this to Done in CONP winter 2022

cmadjar removed the Stale label

cmadjar reopened this

github-actions bot commented Oct 22, 2022

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions bot added the Stale label

github-actions bot commented Jan 21, 2023

This issue was closed because it has been stalled for 3 months with no activity.

github-actions bot closed this as not planned

emmetaobrien reopened this

github-actions bot removed the Stale label

github-actions bot commented Sep 16, 2023

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions bot added the Stale label

emmetaobrien removed the Stale label

github-actions bot commented Feb 16, 2024

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions bot added the Stale label

emmetaobrien removed the Stale label

github-actions bot commented Jul 16, 2024

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions bot added the Stale label

emmetaobrien removed the Stale label

emmetaobrien assigned GHPBZ

github-actions bot commented Dec 14, 2024

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions bot added the Stale label

emmetaobrien removed the Stale label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment