Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1 #71

Merged
merged 1 commit into from
Aug 14, 2019
Merged

Version 1 #71

merged 1 commit into from
Aug 14, 2019

Conversation

mortonjt
Copy link
Collaborator

This is a dummy PR, representing the new release of rhapsody (continuing on #70)

This PR is too large to be merged into master, so a new branch on the main repository will be opened. This is also likely too large for review.

The main contribution here is a refactor of mmvec using pytorch instead of tensorflow.
There are many benefits of this, namely

  • The installation process is much more streamlined now (pytorch actively supports their conda channel).
  • DataLoaders have been tightly coupled with biom, enabling fast data offloading to GPUs using Pytorch's multipooling scheme (also was much easier to implement than in Tensorflow) - so it is less memory intensive, faster and scalable.
  • More accurate, now that it uses an alternating minimization scheme and as a result has much faster convergence (runtime is reduced from ~2 days to 3 hours).
  • Numerical issues have been avoided by returning all model parameters in log coordinates
  • The code has been modularized in layers, enabling easier test / debugging

This branch can be installed via

pip install git+https://github.com/biocore/rhapsody.git@version1

Issues #32, #33, #40, #44, #54, #55 have been addressed here.

Before the version 1 release, we will want

  • Fix remaining travis errors
  • A tutorial on how to link differentials obtained from diff abundance (i.e. songbird or aldex2).
  • A tutorial on how to build paired heatmaps (@nbokulich , any interest in this?)
  • Docker containers, @mwang87 already has docker containers here for the tensorflow version - maybe able to just adapt those.
  • Switch the main branch to version1, similar to what is being done with Emperor.

CC @fedarko, @ElDeveloper, @nbokulich, @mwang87 just for a heads up

@mortonjt mortonjt mentioned this pull request Aug 14, 2019
@mortonjt mortonjt merged commit cdb514b into biocore:version1 Aug 14, 2019
@mortonjt
Copy link
Collaborator Author

See #72 . Accidentally merged :(

@mwang87
Copy link

mwang87 commented Aug 19, 2019

Sounds good, created issue for myself #73 will try to get that done this week.

Ming

@fedarko
Copy link

fedarko commented Aug 19, 2019

@mortonjt Just to be clear—the FeatureData[Conditional] artifact is what could be used in Qurro in lieu of FeatureData[Differential], as mentioned in #60? Are there any considerations that need to be taken into account here, or is just visualizing the literal conditional probabilities in the rank plot ok?

From what I can tell this format looks like a normal dense TSV file, so the unzipped conditional data should work with standalone Qurro. However, FeatureData[Conditional] not being a part of q2-types yet means that I'd have to add some hacky code to get q2-qurro to recognize and accept inputs of this type (like we were doing before with the Songbird FeatureData[Differential] code).

@mortonjt
Copy link
Collaborator Author

Possibly, the conditional probabilities outputted are ranks.

However, we'll need to be very careful about interpretation - these aren't two sided ranks like the ones returned in Songbird. A small conditional probability doesn't imply negative correlation (like in songbird) -- so only positive conditional probabilities are important.

I'm not crazy about the idea of having FeatureData[Conditional] feed into qurro atm -- we'll need a lot more search functionality / interactive visualizations to facilitate that. Those rank files can get huge, and it won't be ideal to make the user scroll through all possible pairs of microbe/metabolite interactions.

A more fruitful direction is to feed in the OrdinationResults biplot into qurro the same way that qurro accepts deicode output. This can help narrow down the possible features based on the top PC axes. I haven't done this yet, the closest thing that I have is something like the following plots

image

Another thing to consider is that both loadings could be important, so we may want a way to systematically "flip" the ordination, so that microbe loadings are points and metabolite loadings are arrows and vice-versa. I raised an issue under q2-diversity below - @nbokulich maybe this is more up your alley?

qiime2/q2-diversity#258

@nbokulich
Copy link
Contributor

nbokulich commented Aug 19, 2019

A tutorial on how to build paired heatmaps (@nbokulich , any interest in this?)

sure, I can add this to the main README after I get the paired heatmaps ready (I am hoping to do a PR with that code this week)

Another thing to consider is that both loadings could be important, so we may want a way to systematically "flip" the ordination, so that microbe loadings are points and metabolite loadings are arrows and vice-versa. I raised an issue under q2-diversity below - @nbokulich maybe this is more up your alley?

Interesting — I need to mull this a little bit more and have some ideas I will post on the issue in question. It looks like you already posted some code there so I don't see how I am needed but let's see how others respond on the q2-diversity issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants