-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Size or other cut-off points for building git-annex links #674
Comments
This might be tricky for datasets that require third-party accounts since small files can still include data that should not be in the open. For fully open datasets, I don't see the harm in doing that. @emmetaobrien maybe it could be something to add to the agenda of next week or the week after if we do not have time since we might be doing the roadmap planning? |
Indeed, I was only thinking of this as applying to open datasets. |
This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months. |
This issue was closed because it has been stalled for 3 months with no activity. |
This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months. |
This issue was closed because it has been stalled for 3 months with no activity. |
This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months. |
This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months. |
This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months. |
This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months. |
Right now, our general policy when ingesting new datasets is to build a git-annex link to every file in a dataset by default, with a couple of specific exceptions (README.md and DATS.json). However, the utility of building links rather than just storing small files directly in github is questionable, and in tests with the microstructure_informed_connectomics dataset, which contains ~11,300 files, building git-annex links to each file took nearly twice as long as building links only to files larger than a cut-off of 200kb (estimated by manual examination of some subdirectories) and downloading the rest directly.
Do we want to consider size-based or other criteria for which files get git-annex links (such as storing all text files directly) ?
The text was updated successfully, but these errors were encountered: