-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement var- and obs- aligned multidimensional arrays (obsm, varm) #3
Comments
ProblemI think there is agreement that Multidimensional annotations may be created from an individual Consider an experiment with three data modalities:
A common analysis that stores results in
I see two requirements:
Possible SolutionsI see two, but am interested if clever individuals can do better. 1. Create
|
Thank you for laying out these options, @ambrosejcarr! I agree with all points you make. However, I overall arrive at favoring option 1 for the following reasons
I'm looking forward to hearing more opinions! Footnotes
|
Alternative solutionI think theres a third option here, which is to not have I'm thinking of a structure like:
This could be expressed in flexible way by defining subsets of observations and variables at the Something I like about this is you could also specifically express transcript to protein connections with:
I would also agree that complete provenance seems out of scope for the storage schema. Maybe what we're trying to do is express "associated with", not "derived from"? |
Thanks very much for the thoughts, and @ivirshup for identifying the missing option. I was hoping there would be more ideas I missed.
Thanks both for picking up on this and apologies for the lack of clarity in my initial write up. What I was trying to express is an anticipated analysis toolchain use case, wherein they would want to be able to explain provenance of objects to communicate how they should be used to their end-users. @falexwolf I find your points about creating flexibility to enable a downstream syntax discussion compelling. Option 1 is indeed constraining. Extending that line of thought, @ivirshup I wonder if your Option 3 the same flaws as Option 1: it constrains use prematurely. I'm tentatively comfortable with Option 1 (dataset & group level obsm), because I think @falexwolf 's statement is correct:
... and Option 1 would enable the organization you suggest @ivirshup. What do you two think? |
Hi all, sorry I am late to the party, but this is a really interesting discussion. One question that I have is if the
Thus, having @ivirshup's option 3 available would be nice. Note that option 1 and 3 are not necessarily mutually exclusive either, we could still have a dataset-wide |
@joshua-d-campbell, this is a case I was thinking of as well.
To me, yes. Some statistics/ annotation will only be meaningful for some modalities. As you probably don't want to mix modalities when calculating "mean". @ambrosejcarr option 1 could definitely be a superset of option 3. One big question I have about option 1 is: does the top level I would also really like to hear @gtca's thoughts on this. My understanding is that both MuData and MultiAssayExperiment have gone with something like option 1, but Danila would have more context here. |
I couldn't understand the bolded part of your use case description. Could you add a bit more detail?
We are imagining a specification that supports this, but also enables operations at the dataset level. I think this might be an answer to your question question above, but I'm not sure and would like to hear more about the use case to know if there are gaps here. I think you might be highlighting one.
|
See |
Additionally, explore whether there are opportunities to specify metadata standards for derived analysis results (e.g. reduced dimensionality representations)
From @LTLA
From "open questions" in the gdoc version:
The text was updated successfully, but these errors were encountered: