-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch datasets #814
Comments
Would it be an alternative to simply update the However, without dedicated tooling, this would be way too challenging to handle for a majority of end users IMHO. Do you or (others you know) want to develop tools for this and need support for this in the spec first? |
The idea is transparency as it is a kind of derivative dataset, and you could choose another. It would not be difficult to write a tool to make a view of a dataset with a patch applied. |
Briefly mentioned with @christinerogers in context of third-party annotations, where permissions to modify the annotated dataset are limited. If there's broader interest, this could be a discussion in Copenhagen. Not sure how much flexibility there is for adding topics. cc @melanieganz @CPernet |
from a software point of view, I think this could be best represented as another derivative layer, but the concept/ label for it is valuable. |
Could you clarify what you mean by "another derivative layer"? I feel like you understood me, but I don't understand the distinction between this and my post. |
we can discuss what we want, but annotation is not high on the list of priorities |
Note: For a complete "patch" semantic it should also reserve ways to "delete" any file/entry (in .tsv, .json, etc), which IMHO wouldn't be quite trivial if reasonably possible at all. |
I agree that deleting entries is complicated. I don't think we need an ultimate patch spec to have something useful. I think there's value in having a lightweight way of adding annotations to datasets that you don't necessarily control, and as much as I love DataLad, making mastery of that the only way to publish them seems like an unnecessary barrier. These datasets already exist, they are just poorly defined. MRIQC, for example, has no BIDS valid files because it purely produces metadata and the data that are described exist elsewhere. Right now, people dump MRIQC results in a Another case is NeuroScout. There, regressors are generated from the movies that subjects were watching and then stored in a bundle, which then needs to be indexed alongside a dataset containing the preprocessed BOLD data and any other confounds that might be desired. PyBIDS handles this fine, but this is largely thanks to who works on PyBIDS, and not due to there being a well-defined mechanism for combining related datasets. If a third party wants to host BIDS annotations as a database, do they need to buy into the DataLad model just to publish an annotation? |
In
dataset_description.json
, we currently have twoDatasetType
s:"raw"
and"derivative"
. Here I propose an additional type"patch"
(or similar). This is a particular kind of derivative that would have the interpretation that the data/metadata elements contained in the patch dataset should be added to or supersede those of the source dataset.Consider the following use cases:
IntendedFor
.I'm sure there are others. Right now, this can be done in an ad hoc way, but a clear directive that this is how derived data is declared to augment/override raw data would help avoid inconsistencies among tools.
xref nipreps/mriqc#885 (comment)
The text was updated successfully, but these errors were encountered: