Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bids dataset detection #1073

Closed
satra opened this issue Jul 18, 2022 · 7 comments
Closed

bids dataset detection #1073

satra opened this issue Jul 18, 2022 · 7 comments
Labels

Comments

@satra
Copy link
Member

satra commented Jul 18, 2022

transferring thought process to new issue from #1069

code finds BIDS datasets

shouldn't this only look up towards root from current location (and obviously not look if allow any path is used).

find dandiset.yaml to determine where to look for bids dataset_description.json files

also i'm assuming 26 is a culprit in the thinking, so any depth traversal could be a heuristic depth based on where dandiset.yaml is. (as in a dataset_description.json has to be either at the level of dandiset.yaml, or perhaps at most 2-3 directories down from dandiset.yaml).

even with our hacked setup, there shouldn't be a need to traverse everything. i can't offhand think of other scenarios that violate this.

[000108] / {dandiset.yaml}, dataset_description.json
[000026] / dandiset.yaml
[000026] / [rawdata] / dataset_description.json
[000026] / [derivatives] / [EPIC] / dataset_description.json

and we could formalize this into our validator.

@yarikoptic
Copy link
Member

generally BIDS is recursive through rawdata or sourcedata so you could navigate through those subfolders to deep lengths depending on how "deep" the case is.

@satra
Copy link
Member Author

satra commented Jul 18, 2022

i don't think dataset_descriptions are ever recursive, are they?

@yarikoptic
Copy link
Member

i don't think dataset_descriptions are ever recursive, are they?

no but it could be e.g. [somedatasettop] / [rawdata] / [rawdata] / [rawdata] / dataset_description.json or [somedatasettop] / [derivatives] / some / [rawdata] / derivatives / another / [rawdata] / dataset_description.json and so on.

@yarikoptic
Copy link
Member

I guess we could restrict "search" for BIDS (sub)datasets to some pre-defined set of sub-directories paths, but for that we should start with a promise that the top of the dandiset then must be a BIDS dataset then, as identified by having dataset_description.json, which ATM https://dandiarchive.org/dandiset/000026/draft/files?location= lacks. That is partially why there is currently this wild chase for all possible rabbit holes.

@satra
Copy link
Member Author

satra commented Jul 19, 2022

for a dandiset if you start with first detecting dandiset.yaml, then everything else should flow from that. however, for 26, i would indeed like to reorganize that one into a better bids structure, but mostly it would be removing rawdata as a prefix from any asset that has it. derivatives would stay at the same place.

@jwodder jwodder added the BIDS label Jul 27, 2022
@TheChymera
Copy link
Contributor

Since the last BIDS asset redesign, we traverse the directory tree until we find the topmost dataset_description.json assets. Can we consider this resolved?

@yarikoptic
Copy link
Member

let's do so unless proven otherwise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants