Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: transposing ranks for final output. #99

Merged
merged 13 commits into from
Oct 17, 2019
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# mmvec changelog

## Version 1.0.1 (2019-10-17)
# Enhancements
- Ranks are transposed and viewable in qiime metadata tabulate [#99](https://github.com/biocore/mmvec/pull/99)

# Bug fixes
- Ranks are now calculated consistently between q2 and standalone cli [#99](https://github.com/biocore/mmvec/pull/99)

## Version 1.0.0 (2019-09-30)
# Enhancements
- Paired heatmaps are available [#89](https://github.com/biocore/mmvec/pull/89)
33 changes: 20 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -3,6 +3,8 @@
# mmvec
Neural networks for estimating microbe-metabolite interactions through their co-occurence probabilities.

![](https://github.com/biocore/mmvec/raw/master/img/mmvec.png "mmvec")

# Installation

MMvec can be installed via pypi as follows
@@ -45,7 +47,7 @@ More information can found under `mmvec --help`

# Qiime2 plugin

If you want to make this qiime2 compatible, install this in your
If you want to run this in a qiime environment, install this in your
qiime2 conda environment (see qiime2 installation instructions [here](https://qiime2.org/)) and run the following

```
@@ -76,26 +78,22 @@ qiime mmvec paired-omics \
--o-conditionals ranks.qza \
--o-conditional-biplot biplot.qza
```

In the results, there are two files, namely `results/conditional_biplot.qza` and `results/conditionals.qza`. The conditional biplot is a biplot representation the
conditional probability matrix so that you can visualize these microbe-metabolite interactions in an exploratory manner. This can be directly visualized in
Emperor as shown below. We also have the estimated conditional probability matrix given in `results/conditionals.qza`,
which an be unzip to yield a tab-delimited table via `unzip results/conditionals`. Each row can be ranked,
so the top most occurring metabolites for a given microbe can be obtained by identifying the highest co-occurrence probabilities for each microbe.

It is worth your time to investigate the logs (labeled under `logdir**`) that are deposited using Tensorboard.
The actual logfiles within this directory are labeled `events.out.tfevents.*` : more discussion on this later.

These log conditional probabilities can also be viewed directly with `qiime metadata tabulate`. This can be
created as follows

Tensorboard can be run via
```
tensorboard --logdir .
qiime metadata tabulate \
--m-input-file results/conditionals.qza \
--o-visualization conditionals-viz.qzv
```

You may need to tinker with the parameters to get readable tensorflow results, namely `--p-summary-interval`,
`--epochs` and `--batch-size`.

A description of these two graphs is outlined in the FAQs below.


Then you can run the following to generate a emperor biplot.

@@ -197,6 +195,14 @@ More information behind the actions and parameters can found under `qiime mmvec

3. More model parameters : The standalone script will return the bias parameters learned for each dataset (i.e. microbe and metabolite abundances). These are stored under the summary directory (specified by `--summary`) under the names `embeddings.csv`. This file will hold the coordinates for the microbes and metabolites, along with biases. There are 4 columns in this file, namely `feature_id`, `axis`, `embed_type` and `values`. `feature_id` is the name of the feature, whether it be a microbe name or a metabolite feature id. `axis` corresponds to the name of the axis, which either corresponds to a PC axis or bias. `embed_type` denotes if the coordinate corresponds to a microbe or metabolite. `values` is the coordinate value for the given `axis`, `embed_type` and `feature_id`. This can be useful for accessing the raw parameters and building custom biplots / ranks visualizations - this also has the advantage of requiring much less memory to manipulate.

It is also important to note that you don't have to explicitly chose - it is very doable to run the standalone version first, then import those output files into qiime2. Importing can be done as follows

```
qiime tools import --input-path <your ranks file> --output-path conditionals.qza --type FeatureData[Conditional]
qiime tools import --input-path <your ordination file> --output-path ordination.qza --type 'PCoAResults % ("biplot")'
```

**Q** : You mentioned that you can use GPUs. How can you do that??

**A** : This can be done by running `pip install tensorflow-gpu` in your environment. See details [here](https://www.tensorflow.org/install/gpu).
@@ -209,7 +215,7 @@ At the moment, these capabilities are only available for the standalone CLI due

**Q** : I'm confused, what is Tensorboard?

**A** : Tensorboard is a diagnostic tool that runs in a web browser. To open tensorboard, make sure you’re in the mmvec environment and cd into the folder you are running the script above from. Then run:
**A** : Tensorboard is a diagnostic tool that runs in a web browser - note that this is only explicitly supported in the standalone version of mmvec. To open tensorboard, make sure you’re in the mmvec environment and cd into the folder you are running the script above from. Then run:

```
tensorboard --logdir .
@@ -237,7 +243,8 @@ The x-axis is the number of iterations (meaning times the model is training acro

The y-axis is the average number of counts off for each feature. The model is predicting the sequence counts for each feature in the samples that were set aside for testing. So in the graph above it means that, on average, the model is off by ~0.75 intensity units, which is low. However, this is ABSOLUTE error not relative error (unfortunately we don't know how to compute relative errors because of the sparsity in these datasets).

You can also compare multiple runs with different parameters to see which run performed the best. If you are doing this, be sure to look at the `training-column` example make the testing samples consistent across runs.
You can also compare multiple runs with different parameters to see which run performed the best. Useful parameters to note are `--epochs` and `--batch-size`. If you are committed to fine-tuning parameters, be sure to look at the `training-column` example make the testing samples consistent across runs.


**Q** : What's up with the `--training-column` argument?

212 changes: 212 additions & 0 deletions examples/cf/check_rhamnolipids.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"!ls"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[34mlatent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95\u001b[m\u001b[m\r\n",
"latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_embedding.txt\r\n",
"latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ordination.txt\r\n",
"latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt\r\n"
]
}
],
"source": [
"!ls testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Standalone check"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"fname = 'latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt'\n",
"ranks = pd.read_csv(f'testing/{fname}', sep='\\t', index_col=0)\n",
"microbe_metadata = pd.read_csv('microbe-metadata.txt', sep='\\t', index_col=0)\n",
"metabolite_metadata = pd.read_csv('metabolite-metadata.txt', sep='\\t', index_col=0)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"microbe_metadata = microbe_metadata.loc[ranks.columns]\n",
"i = microbe_metadata.Taxon.apply(lambda x: 'Pseudomonas' in x)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"pseudomonas = microbe_metadata.loc[i].index"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"metabolite_metadata = metabolite_metadata.dropna(subset=['expert_annotation'])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"19"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sum(ranks.loc[metabolite_metadata.index, pseudomonas[0]] > 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# qiime2 check"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
]
}
],
"source": [
"import qiime2\n",
"ranks = qiime2.Artifact.load('ranks.qza').view(pd.DataFrame)\n",
"microbe_metadata = pd.read_csv('microbe-metadata.txt', sep='\\t', index_col=0)\n",
"metabolite_metadata = pd.read_csv('metabolite-metadata.txt', sep='\\t', index_col=0)\n",
"microbe_metadata = microbe_metadata.loc[ranks.columns]\n",
"i = microbe_metadata.Taxon.apply(lambda x: 'Pseudomonas' in x)\n",
"pseudomonas = microbe_metadata.loc[i].index"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"metabolite_metadata = metabolite_metadata.dropna(subset=['expert_annotation'])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"19"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sum(ranks.loc[metabolite_metadata.index, pseudomonas[0]] > 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
6 changes: 3 additions & 3 deletions examples/cf/q2_run.sh
Original file line number Diff line number Diff line change
@@ -8,6 +8,7 @@ qiime mmvec paired-omics \
--p-learning-rate 1e-3 \
--o-conditionals ranks.qza \
--o-conditional-biplot biplot.qza \
--p-summary-interval 1 \
--verbose

qiime emperor biplot \
@@ -27,6 +28,5 @@ mmvec paired-omics \
--metabolite-file lcms_nt.biom \
--epochs 100 \
--learning-rate 1e-3 \
--summary-dir testing

qiime tools import --input-path testing/latent_dim_3_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt --output-path ranks.qza --type FeatureData[Conditional]
--summary-interval 1 \
--summary-dir summary
160 changes: 135 additions & 25 deletions examples/soils/check_soils.ipynb
Original file line number Diff line number Diff line change
@@ -11,6 +11,13 @@
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# standalone check"
]
},
{
"cell_type": "code",
"execution_count": 2,
@@ -21,9 +28,9 @@
"output_type": "stream",
"text": [
"\u001b[34mlatent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95\u001b[m\u001b[m\r\n",
"latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_embedding.csv\r\n",
"latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_embedding.txt\r\n",
"latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ordination.txt\r\n",
"latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.csv\r\n"
"latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95_ranks.txt\r\n"
]
}
],
@@ -38,7 +45,7 @@
"outputs": [],
"source": [
"datadir = 'summarydir/latent_dim_1_input_prior_1.00_output_prior_1.00_beta1_0.90_beta2_0.95'\n",
"ranks = pd.read_csv(datadir + '_ranks.csv', index_col=0)"
"ranks = pd.read_csv(datadir + '_ranks.txt', index_col=0, sep='\\t')"
]
},
{
@@ -49,11 +56,12 @@
{
"data": {
"text/plain": [
"(2,3-dihydroxy-3-methylbutanoate) 0.000000\n",
"(2,5-diaminohexanoate) 0.030698\n",
"(3-hydroxypyridine) 0.030141\n",
"(3-methyladenine) 0.032587\n",
"(4-oxoproline) 0.036312\n",
"featureid\n",
"(2,3-dihydroxy-3-methylbutanoate) -5.243021\n",
"(2,5-diaminohexanoate) -1.290612\n",
"(3-hydroxypyridine) 0.002373\n",
"(3-methyladenine) 0.971289\n",
"(4-oxoproline) 2.978444\n",
"Name: rplo 1 (Cyanobacteria), dtype: float64"
]
},
@@ -63,7 +71,7 @@
}
],
"source": [
"ranks.loc['rplo 1 (Cyanobacteria)'].head()"
"ranks['rplo 1 (Cyanobacteria)'].head()"
]
},
{
@@ -84,8 +92,8 @@
"metadata": {},
"outputs": [],
"source": [
"idx = ranks.loc['rplo 1 (Cyanobacteria)'] > 0 \n",
"detected_molecules = set(ranks.columns[idx])"
"idx = ranks['rplo 1 (Cyanobacteria)'] > 0 \n",
"detected_molecules = set(ranks.index[idx])"
]
},
{
@@ -168,19 +176,20 @@
{
"data": {
"text/plain": [
"xanthine 0.030167\n",
"guanine 0.033443\n",
"(N6-acetyl-lysine) 0.031525\n",
"cytosine 0.033451\n",
"adenosine 0.038334\n",
"(3-methyladenine) 0.032587\n",
"4-guanidinobutanoate 0.031690\n",
"uracil 0.032330\n",
"hypoxanthine 0.032204\n",
"7-methyladenine 0.029252\n",
"N-acetylornithine 0.033029\n",
"succinate 0.033501\n",
"adenine 0.032958\n",
"featureid\n",
"cytosine 3.238725\n",
"xanthine 0.687712\n",
"N-acetylornithine 1.247421\n",
"uracil 1.778591\n",
"adenine 4.983674\n",
"(N6-acetyl-lysine) 4.423469\n",
"4-guanidinobutanoate 4.031901\n",
"guanine 3.107524\n",
"hypoxanthine 0.666798\n",
"7-methyladenine 0.302561\n",
"succinate 0.893106\n",
"(3-methyladenine) 0.971289\n",
"adenosine 4.995941\n",
"Name: rplo 1 (Cyanobacteria), dtype: float64"
]
},
@@ -190,7 +199,7 @@
}
],
"source": [
"ranks.loc['rplo 1 (Cyanobacteria)'].loc[microcoleus_metabolites]"
"ranks['rplo 1 (Cyanobacteria)'].loc[microcoleus_metabolites]"
]
},
{
@@ -202,6 +211,107 @@
"assert len(detected_molecules & microcoleus_metabolites) == 13"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# qiime2 check"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"import qiime2\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
"/Users/jmorton/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
]
}
],
"source": [
"ranks = qiime2.Artifact.load('ranks.qza').view(pd.DataFrame)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"idx = ranks['rplo 1 (Cyanobacteria)'] > 0 \n",
"detected_molecules = set(ranks.index[idx])\n",
"assert len(detected_molecules & microcoleus_metabolites) == 13"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"featureid\n",
"cytosine 3.147861\n",
"xanthine 0.842640\n",
"N-acetylornithine 1.281711\n",
"uracil 1.990830\n",
"adenine 5.086781\n",
"(N6-acetyl-lysine) 4.530147\n",
"4-guanidinobutanoate 4.027770\n",
"guanine 3.129724\n",
"hypoxanthine 0.713017\n",
"7-methyladenine 0.492983\n",
"succinate 0.805413\n",
"(3-methyladenine) 1.006567\n",
"adenosine 4.987744\n",
"Name: rplo 1 (Cyanobacteria), dtype: float64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ranks['rplo 1 (Cyanobacteria)'].loc[microcoleus_metabolites]"
]
},
{
"cell_type": "code",
"execution_count": null,
Binary file added examples/soils/metabolites.biom
Binary file not shown.
Binary file added examples/soils/microbes.biom
Binary file not shown.
12 changes: 12 additions & 0 deletions examples/soils/run.sh
Original file line number Diff line number Diff line change
@@ -6,3 +6,15 @@ mmvec paired-omics\
--latent-dim 1 \
--learning-rate 1e-3 \
--epochs 3000

qiime tools import --input-path microbes.biom --output-path microbes.biom.qza --type FeatureTable[Frequency]
qiime tools import --input-path metabolites.biom --output-path metabolites.biom.qza --type FeatureTable[Frequency]

qiime mmvec paired-omics \
--i-microbes microbes.biom.qza \
--i-metabolites metabolites.biom.qza \
--p-epochs 100 \
--p-learning-rate 1e-3 \
--o-conditionals ranks.qza \
--o-conditional-biplot biplot.qza \
--verbose
Binary file added img/mmvec.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion mmvec/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .heatmap import _heatmap_choices, _cmaps

__version__ = "1.0.0"
__version__ = "1.0.1"

__all__ = ['_heatmap_choices', '_cmaps']
12 changes: 11 additions & 1 deletion mmvec/multimodal.py
Original file line number Diff line number Diff line change
@@ -11,7 +11,7 @@ class MMvec(object):

def __init__(self, u_mean=0, u_scale=1, v_mean=0, v_scale=1,
batch_size=50, latent_dim=3,
learning_rate=0.1, beta_1=0.9, beta_2=0.95,
learning_rate=0.1, beta_1=0.8, beta_2=0.9,
clipnorm=10., device_name='/cpu:0', save_path=None):
""" Build a tensorflow model for microbe-metabolite vectors
@@ -205,6 +205,16 @@ def __call__(self, session, trainX, trainY, testX, testY):

tf.global_variables_initializer().run()

def ranks(self):
modelU = np.hstack(
(np.ones((self.U.shape[0], 1)), self.Ubias, self.U))
modelV = np.vstack(
(self.Vbias, np.ones((1, self.V.shape[1])), self.V))

res = np.hstack((np.zeros((self.U.shape[0], 1)), modelU @ modelV))
res = res - res.mean(axis=1).reshape(-1, 1)
return res

def fit(self, epoch=10, summary_interval=1000, checkpoint_interval=3600,
testX=None, testY=None):
""" Fits the model.
23 changes: 5 additions & 18 deletions mmvec/q2/_method.py
Original file line number Diff line number Diff line change
@@ -56,25 +56,12 @@ def paired_omics(microbes: biom.Table,

loss, cv = model.fit(epoch=epochs, summary_interval=summary_interval)

U, V = model.U, model.V
ranks = pd.DataFrame(model.ranks(), index=train_microbes_df.columns,
columns=train_metabolites_df.columns)

U_ = np.hstack(
(np.ones((model.U.shape[0], 1)),
model.Ubias.reshape(-1, 1), U)
)
V_ = np.vstack(
(model.Vbias.reshape(1, -1),
np.ones((1, model.V.shape[1])), V)
)

ranks = pd.DataFrame(
np.hstack((np.zeros((model.U.shape[0], 1)), U_ @ V_)),
index=train_microbes_df.columns,
columns=train_metabolites_df.columns)

ranks = ranks - ranks.mean(axis=1).values.reshape(-1, 1)
ranks = ranks - ranks.mean(axis=0)
u, s, v = svds(ranks, k=latent_dim)
u, s, v = svds(ranks - ranks.mean(axis=0), k=latent_dim)
ranks = ranks.T
ranks.index.name = 'featureid'
s = s[::-1]
u = u[:, ::-1]
v = v[::-1, :]
6 changes: 6 additions & 0 deletions mmvec/q2/_transformer.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import qiime2
import pandas as pd

from mmvec.q2 import ConditionalFormat
@@ -16,3 +17,8 @@ def _2(df: pd.DataFrame) -> ConditionalFormat:
ff = ConditionalFormat()
df.to_csv(str(ff), sep='\t', header=True, index=True)
return ff


@plugin.register_transformer
def _3(ff: ConditionalFormat) -> qiime2.Metadata:
return qiime2.Metadata.load(str(ff))
7 changes: 5 additions & 2 deletions mmvec/q2/tests/test_method.py
Original file line number Diff line number Diff line change
@@ -12,6 +12,7 @@
class TestMMvec(unittest.TestCase):

def setUp(self):
np.random.seed(1)
res = random_multimodal(
num_microbes=8, num_metabolites=8, num_samples=150,
latent_dim=2, sigmaQ=2,
@@ -40,12 +41,14 @@ def setUp(self):
def test_fit(self):
np.random.seed(1)
tf.reset_default_graph()
latent_dim = 2
tf.set_random_seed(0)
latent_dim = 2
res_ranks, res_biplot = paired_omics(
self.microbes, self.metabolites,
epochs=1000, latent_dim=latent_dim
epochs=1000, latent_dim=latent_dim,
min_feature_count=1, learning_rate=0.1
)
res_ranks = clr_inv(res_ranks.T)
s_r, s_p = spearmanr(np.ravel(res_ranks), np.ravel(self.exp_ranks))

self.assertGreater(s_r, 0.5)
17 changes: 4 additions & 13 deletions mmvec/tests/test_multimodal.py
Original file line number Diff line number Diff line change
@@ -18,6 +18,7 @@
class TestMMvec(unittest.TestCase):
def setUp(self):
# build small simulation
np.random.seed(1)
res = random_multimodal(
num_microbes=8, num_metabolites=8, num_samples=150,
latent_dim=2, sigmaQ=2,
@@ -49,11 +50,6 @@ def test_fit(self):
coo_matrix(self.testX.values), self.testY.values)
model.fit(epoch=1000)

modelU = np.hstack(
(np.ones((model.U.shape[0], 1)), model.Ubias, model.U))
modelV = np.vstack(
(model.Vbias, np.ones((1, model.V.shape[1])), model.V))

U_ = np.hstack(
(np.ones((self.U.shape[0], 1)), self.Ubias, self.U))
V_ = np.vstack(
@@ -62,7 +58,7 @@ def test_fit(self):
u_r, u_p = spearmanr(pdist(model.U), pdist(self.U))
v_r, v_p = spearmanr(pdist(model.V.T), pdist(self.V.T))

res = softmax(np.hstack((np.zeros((d1, 1)), modelU @ modelV)))
res = softmax(model.ranks())
exp = softmax(np.hstack((np.zeros((d1, 1)), U_ @ V_)))
s_r, s_p = spearmanr(np.ravel(res), np.ravel(exp))

@@ -108,16 +104,11 @@ def test_soils(self):
coo_matrix(self.testX.values), self.testY.values)
model.fit(epoch=1000)

modelU = np.hstack(
(np.ones((model.U.shape[0], 1)), model.Ubias, model.U))
modelV = np.vstack(
(model.Vbias, np.ones((1, model.V.shape[1])), model.V))

ranks = pd.DataFrame(
np.hstack((np.zeros((d1, 1)), modelU @ modelV)),
model.ranks(),
index=self.microbes.ids(axis='observation'),
columns=self.metabolites.ids(axis='observation'))
ranks = ranks - ranks.mean(axis=1).values.reshape(-1, 1)

microcoleus_metabolites = [
'(3-methyladenine)', '7-methyladenine', '4-guanidinobutanoate',
'uracil', 'xanthine', 'hypoxanthine', '(N6-acetyl-lysine)',
9 changes: 4 additions & 5 deletions scripts/mmvec
Original file line number Diff line number Diff line change
@@ -189,15 +189,14 @@ def paired_omics(microbe_file, metabolite_file,
df.to_csv(embeddings_file, sep='\t')

# Save to a ranks file
ranks = pd.DataFrame(model.U @ V.T, index=train_microbes_df.columns,
ranks = pd.DataFrame(model.ranks(), index=train_microbes_df.columns,
columns=train_metabolites_df.columns)
ranks = ranks - ranks.mean(axis=1).values.reshape(-1, 1)

u, s, v = svds(ranks - ranks.mean(axis=0), k=latent_dim)
ranks = ranks.T
ranks.index.name = 'featureid'
ranks.to_csv(ranks_file, sep='\t')

# Save to an ordination file
ranks = ranks - ranks.mean(axis=0)
u, s, v = svds(ranks, k=latent_dim)
s = s[::-1]
u = u[:, ::-1]
v = v[::-1, :]