Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size or other cut-off points for building git-annex links #674

Open
emmetaobrien opened this issue Jul 14, 2021 · 10 comments
Open

Size or other cut-off points for building git-annex links #674

emmetaobrien opened this issue Jul 14, 2021 · 10 comments

Comments

@emmetaobrien
Copy link
Collaborator

Right now, our general policy when ingesting new datasets is to build a git-annex link to every file in a dataset by default, with a couple of specific exceptions (README.md and DATS.json). However, the utility of building links rather than just storing small files directly in github is questionable, and in tests with the microstructure_informed_connectomics dataset, which contains ~11,300 files, building git-annex links to each file took nearly twice as long as building links only to files larger than a cut-off of 200kb (estimated by manual examination of some subdirectories) and downloading the rest directly.

Do we want to consider size-based or other criteria for which files get git-annex links (such as storing all text files directly) ?

@cmadjar
Copy link
Collaborator

cmadjar commented Sep 24, 2021

This might be tricky for datasets that require third-party accounts since small files can still include data that should not be in the open.

For fully open datasets, I don't see the harm in doing that. @emmetaobrien maybe it could be something to add to the agenda of next week or the week after if we do not have time since we might be doing the roadmap planning?

@emmetaobrien
Copy link
Collaborator Author

Indeed, I was only thinking of this as applying to open datasets.

@github-actions
Copy link

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

@github-actions github-actions bot added the Stale label Feb 22, 2022
@github-actions
Copy link

This issue was closed because it has been stalled for 3 months with no activity.

@github-actions github-actions bot moved this to Done in CONP winter 2022 May 23, 2022
@cmadjar cmadjar removed the Stale label May 24, 2022
@cmadjar cmadjar reopened this May 24, 2022
@github-actions
Copy link

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

@github-actions github-actions bot added the Stale label Oct 22, 2022
@github-actions
Copy link

This issue was closed because it has been stalled for 3 months with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 21, 2023
@emmetaobrien emmetaobrien reopened this Apr 17, 2023
@github-actions github-actions bot removed the Stale label Apr 18, 2023
@github-actions
Copy link

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

Copy link

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

Copy link

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

Copy link

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

5 participants