Version 1 #71

mortonjt · 2019-08-14T18:59:35Z

This is a dummy PR, representing the new release of rhapsody (continuing on #70)

This PR is too large to be merged into master, so a new branch on the main repository will be opened. This is also likely too large for review.

The main contribution here is a refactor of mmvec using pytorch instead of tensorflow.
There are many benefits of this, namely

The installation process is much more streamlined now (pytorch actively supports their conda channel).
DataLoaders have been tightly coupled with biom, enabling fast data offloading to GPUs using Pytorch's multipooling scheme (also was much easier to implement than in Tensorflow) - so it is less memory intensive, faster and scalable.
More accurate, now that it uses an alternating minimization scheme and as a result has much faster convergence (runtime is reduced from ~2 days to 3 hours).
Numerical issues have been avoided by returning all model parameters in log coordinates
The code has been modularized in layers, enabling easier test / debugging

This branch can be installed via

pip install git+https://github.com/biocore/rhapsody.git@version1

Issues #32, #33, #40, #44, #54, #55 have been addressed here.

Before the version 1 release, we will want

Fix remaining travis errors
A tutorial on how to link differentials obtained from diff abundance (i.e. songbird or aldex2).
A tutorial on how to build paired heatmaps (@nbokulich , any interest in this?)
Docker containers, @mwang87 already has docker containers here for the tensorflow version - maybe able to just adapt those.
Switch the main branch to version1, similar to what is being done with Emperor.

CC @fedarko, @ElDeveloper, @nbokulich, @mwang87 just for a heads up

mortonjt · 2019-08-14T20:20:16Z

See #72 . Accidentally merged :(

mwang87 · 2019-08-19T15:45:40Z

Sounds good, created issue for myself #73 will try to get that done this week.

Ming

fedarko · 2019-08-19T20:58:20Z

@mortonjt Just to be clear—the FeatureData[Conditional] artifact is what could be used in Qurro in lieu of FeatureData[Differential], as mentioned in #60? Are there any considerations that need to be taken into account here, or is just visualizing the literal conditional probabilities in the rank plot ok?

From what I can tell this format looks like a normal dense TSV file, so the unzipped conditional data should work with standalone Qurro. However, FeatureData[Conditional] not being a part of q2-types yet means that I'd have to add some hacky code to get q2-qurro to recognize and accept inputs of this type (like we were doing before with the Songbird FeatureData[Differential] code).

mortonjt · 2019-08-19T21:18:25Z

Possibly, the conditional probabilities outputted are ranks.

However, we'll need to be very careful about interpretation - these aren't two sided ranks like the ones returned in Songbird. A small conditional probability doesn't imply negative correlation (like in songbird) -- so only positive conditional probabilities are important.

I'm not crazy about the idea of having FeatureData[Conditional] feed into qurro atm -- we'll need a lot more search functionality / interactive visualizations to facilitate that. Those rank files can get huge, and it won't be ideal to make the user scroll through all possible pairs of microbe/metabolite interactions.

A more fruitful direction is to feed in the OrdinationResults biplot into qurro the same way that qurro accepts deicode output. This can help narrow down the possible features based on the top PC axes. I haven't done this yet, the closest thing that I have is something like the following plots

Another thing to consider is that both loadings could be important, so we may want a way to systematically "flip" the ordination, so that microbe loadings are points and metabolite loadings are arrows and vice-versa. I raised an issue under q2-diversity below - @nbokulich maybe this is more up your alley?

qiime2/q2-diversity#258

nbokulich · 2019-08-19T21:34:06Z

A tutorial on how to build paired heatmaps (@nbokulich , any interest in this?)

sure, I can add this to the main README after I get the paired heatmaps ready (I am hoping to do a PR with that code this week)

Another thing to consider is that both loadings could be important, so we may want a way to systematically "flip" the ordination, so that microbe loadings are points and metabolite loadings are arrows and vice-versa. I raised an issue under q2-diversity below - @nbokulich maybe this is more up your alley?

Interesting — I need to mull this a little bit more and have some ideas I will post on the issue in question. It looks like you already posted some code there so I don't see how I am needed but let's see how others respond on the q2-diversity issue.

bump

cdb514b

mortonjt mentioned this pull request Aug 14, 2019

Dataloader #70

Closed

mortonjt merged commit cdb514b into biocore:version1 Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1 #71

Version 1 #71

mortonjt commented Aug 14, 2019

mortonjt commented Aug 14, 2019

mwang87 commented Aug 19, 2019

fedarko commented Aug 19, 2019

mortonjt commented Aug 19, 2019

nbokulich commented Aug 19, 2019 •

edited

Loading

Version 1 #71

Version 1 #71

Conversation

mortonjt commented Aug 14, 2019

mortonjt commented Aug 14, 2019

mwang87 commented Aug 19, 2019

fedarko commented Aug 19, 2019

mortonjt commented Aug 19, 2019

nbokulich commented Aug 19, 2019 • edited Loading

nbokulich commented Aug 19, 2019 •

edited

Loading