Skip to content

Commit

Permalink
Large aggregation (#192)
Browse files Browse the repository at this point in the history
* Squashed commit of the following:

commit 8564eb0
Merge: f6ef62c 07cc7d8
Author: Kendell Clement <k.clement.dev@gmail.com>
Date:   Tue Jan 11 16:20:15 2022 -0500

    Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

commit 07cc7d8
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 15:29:59 2021 -0700

    Fix bug in `find_indels_substitutions`

    This bug occurred when there was a deletion at the end of a sequence, and was
    thus not properly accounted for.

commit f6ef62c
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 15:29:59 2021 -0700

    Fix bug in `find_indels_substitutions`

    This bug occurred when there was a deletion at the end of a sequence, and was
    thus not properly accounted for.

commit 7212f87
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 15:26:17 2021 -0700

    Add a unit test for `find_indels_substitutions`

    This unit test checks for deletions at the end of a sequence, which are
    inherently outside of the include_indx_set window.

commit d50b4e9
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 15:03:22 2021 -0700

    Fix a bug in `find_indels_substitutions`

    The bug that this commit fixes is when an insertion occurs at the edge of the
    include indexes. The trouble with this earlier was that it was using the `idx`
    to calculate the size of the insertion, but the `idx` wasn't being incremented
    anymore because it was outside of the include window.

commit 4db066f
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 15:01:39 2021 -0700

    Add test case for `find_indels_substitutions`

    This test case is extracted from the CRISPRessoBatch integration test and
    provides an example where there is an insertion at the edge of the include
    index.

commit 3b3a741
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 11:37:07 2021 -0700

    Fix bug in CRISPRessoCompare where sample names were not properly set

    This was a place where it was (partially) missed during the crispresso2_info
    object refactoring.

commit e9f5eff
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 15:26:17 2021 -0700

    Add a unit test for `find_indels_substitutions`

    This unit test checks for deletions at the end of a sequence, which are
    inherently outside of the include_indx_set window.

commit d4d45a9
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 15:03:22 2021 -0700

    Fix a bug in `find_indels_substitutions`

    The bug that this commit fixes is when an insertion occurs at the edge of the
    include indexes. The trouble with this earlier was that it was using the `idx`
    to calculate the size of the insertion, but the `idx` wasn't being incremented
    anymore because it was outside of the include window.

commit 13f00bb
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 15:01:39 2021 -0700

    Add test case for `find_indels_substitutions`

    This test case is extracted from the CRISPRessoBatch integration test and
    provides an example where there is an insertion at the edge of the include
    index.

commit 659ae34
Author: Cole Lyman <cole@colelyman.com>
Date:   Fri Dec 10 11:37:07 2021 -0700

    Fix bug in CRISPRessoCompare where sample names were not properly set

    This was a place where it was (partially) missed during the crispresso2_info
    object refactoring.

* Add parameter `--suppress_batch_summary_plots`

If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

* Pep formatting cleanup

* Add summary nucleotide plots to aggregate

* Aggregate plots are paginated

* Update CRISPRessoAggregateCORE.py

Remove max sample limit for plotting

* Add --max_samples_per_summary_plot to CRISPRessoAggregate

Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

* Add plotly function to plot an interactive heatmap

* Fix deprecated numpy type to suppress warning

* Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

These heatmaps are interactive (zoomable and panable) and show for each sample
the percentage of insertions, substitutions, and deletions.

* Add the heatmap summaries to the CRISPRessoAggregate report

* Update Bootstrap to 5.1.3

This is mainly so that we can use the fullscreen modal functionality in this version.

* Move the plotly heatmaps to a Bootstrap modal

* Fix bug where plots were not filling up entire modal.

I have tried countless different ways for this to work, and this is the best
that I can come up with. After the modal is opened it triggers the plot to
resize, and then for some reason you need to trigger the resize event. I think
this is because a `div` changing size won't actually trigger the resizing of the
plot (and neither will just calling `Plotly.Plots.resize`...?!).

* Update the axis labels and add autosize to plotly heatmaps

I'm pretty sure the autosize doesn't do anything, but it is there for good
measure.

* Abandon attempts to make plots fullscreen

This includes removing the Bootstrap modal (two out of the three plots would
resize properly and I couldn't figure out a way to have the plot displayed
outside of the modal). I have left in some javascript to make the plot
fullscreen, but I couldn't get the formatting quite right and the plot wasn't
much bigger in the fullscreen version because there was a ton of space between
the plot and the heatmap. If some brave soul would like to tackle it, feel free!

* Rename and refactor how plot data is passed around

I have consolidated how the plot data is passed around, so that now you can pass
in only one dict with all of the information instead of 4 or 5 separate
parameters. I also renamed the `heatmap_plot_*` to
`allele_modification_heatmap_*`.

* Implement the line plot version of the modification percentages

This also includes correctly resizing the plot when the line plot tab is
selected!

* Change default `max_samples_per_summary_plot` to be 150 instead of 250

* Remove extra assignments of `this_number_samples` and suppress plot

The plot that is suppressed is the large nucleotide quilt when there is a large
number of samples. Is it okay to suppress this plot @kclem?

* Implement parallel plotting in CRISPRessoAggregate

* Fix sample indexing error and heatmap scaling for large number of samples

* Add parameter `--suppress_batch_summary_plots`

If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

* Pep formatting cleanup

* Add summary nucleotide plots to aggregate

* Aggregate plots are paginated

* Update CRISPRessoAggregateCORE.py

Remove max sample limit for plotting

* Add --max_samples_per_summary_plot to CRISPRessoAggregate

Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

* Add plotly function to plot an interactive heatmap

* Fix deprecated numpy type to suppress warning

* Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

These heatmaps are interactive (zoomable and panable) and show for each sample
the percentage of insertions, substitutions, and deletions.

* Add the heatmap summaries to the CRISPRessoAggregate report

* Update Bootstrap to 5.1.3

This is mainly so that we can use the fullscreen modal functionality in this version.

* Move the plotly heatmaps to a Bootstrap modal

* Fix bug where plots were not filling up entire modal.

I have tried countless different ways for this to work, and this is the best
that I can come up with. After the modal is opened it triggers the plot to
resize, and then for some reason you need to trigger the resize event. I think
this is because a `div` changing size won't actually trigger the resizing of the
plot (and neither will just calling `Plotly.Plots.resize`...?!).

* Update the axis labels and add autosize to plotly heatmaps

I'm pretty sure the autosize doesn't do anything, but it is there for good
measure.

* Abandon attempts to make plots fullscreen

This includes removing the Bootstrap modal (two out of the three plots would
resize properly and I couldn't figure out a way to have the plot displayed
outside of the modal). I have left in some javascript to make the plot
fullscreen, but I couldn't get the formatting quite right and the plot wasn't
much bigger in the fullscreen version because there was a ton of space between
the plot and the heatmap. If some brave soul would like to tackle it, feel free!

* Rename and refactor how plot data is passed around

I have consolidated how the plot data is passed around, so that now you can pass
in only one dict with all of the information instead of 4 or 5 separate
parameters. I also renamed the `heatmap_plot_*` to
`allele_modification_heatmap_*`.

* Implement the line plot version of the modification percentages

This also includes correctly resizing the plot when the line plot tab is
selected!

* Change default `max_samples_per_summary_plot` to be 150 instead of 250

* Remove extra assignments of `this_number_samples` and suppress plot

The plot that is suppressed is the large nucleotide quilt when there is a large
number of samples. Is it okay to suppress this plot @kclem?

* Implement parallel plotting in CRISPRessoAggregate

* Fix sample indexing error and heatmap scaling for large number of samples

* Add plotly requrement to setup.py

* Remove space around vertical barcharts

* Add scrollbar to long images in multiReport

* Fill in default (empty) values to allele modification plots

When not running CRISPRessoAggregate, default values for the
`allele_modification_heatmap_plot` and `allele_modification_lin_plot`
dictionaries will be set so that the template can be properly rendered.

* Include CRISPRessoBatch in the refactor of how summary_plot dicts are handled

* Update dockerfile for new docker

* minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

* Allow for flexible parsing of quant window coordinates

* CRISPRessoPooled debug flash command, fix pep formatting

* Set flexiguide homology parameter type to int

* Coerce ints in batch file checking (#200)

* Batch type coerce and r2 file check

* Revert "Batch type coerce and r2 file check"

This reverts commit f917366.

* Coerce int values

* Handle multiple qwcs in batch mode

If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

* Fix bug from old pandas for int cols

Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

* Create allele modification heatmaps and line plots in CRISPRessoBatch

* Add allele modification heatmaps and line plots to CRISPRessoBatch

* Make all plots in CRISPRessoBatch run in parallel

* Make `--suppress_batch_summary_plots` store true

Also, only open and shutdown the process pool when necessary.

* Add blank values for allele_modification entries when not present

Co-authored-by: Kendell Clement <k.clement.dev@gmail.com>
Co-authored-by: dharjanto <dewi.harjanto@gmail.com>
Co-authored-by: Samuel Nichols <Snic9004@gmail.com>
  • Loading branch information
4 people authored May 4, 2022
1 parent f67376f commit 62900e9
Show file tree
Hide file tree
Showing 8 changed files with 1,109 additions and 236 deletions.
391 changes: 320 additions & 71 deletions CRISPResso2/CRISPRessoAggregateCORE.py

Large diffs are not rendered by default.

400 changes: 306 additions & 94 deletions CRISPResso2/CRISPRessoBatchCORE.py

Large diffs are not rendered by default.

86 changes: 86 additions & 0 deletions CRISPResso2/CRISPRessoPlot.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import matplotlib.patches as patches
import matplotlib.cm as cm
import matplotlib.gridspec as gridspec
import plotly.express as px
from collections import defaultdict
from copy import deepcopy
import re
Expand Down Expand Up @@ -3261,6 +3262,7 @@ def plot_unmod_mod_pcts(fig_filename_root,df_summary_quantification,save_png,cut

#if there are rows..
if df.shape[0] > 0:
ax.set_ylim(-0.5, df.shape[0]-0.5)
max_val = max(df['Reads_total'])
space_val = max_val*0.02
pct_labels = []
Expand Down Expand Up @@ -3300,6 +3302,8 @@ def plot_reads_total(fig_filename_root,df_summary_quantification,save_png,cutoff
names = [((name[:20] + "..") if len(name) > 18 else name) for name in df['Name'].values]
ax.set_yticks(xs)
ax.set_yticklabels(names)
if df.shape[0] > 0:
ax.set_ylim(-0.5, df.shape[0]-0.5)
if df['Reads_total'].max() > 100000:
ax.ticklabel_format(style='sci', axis='x', scilimits=(0, 0))
if cutoff is not None:
Expand Down Expand Up @@ -3710,3 +3714,85 @@ def plot_quantification_positions(
)

plt.close(fig)


def plot_allele_modification_heatmap(
sample_values, sample_sgRNA_intervals, plot_path, title,
):
fig = px.imshow(
sample_values,
labels={
'x': 'Amplicon Nucleotide (Position)',
'y': 'Sample (Index)',
'color': '{0} (%)'.format(title),
},
aspect='auto',
)
for sample_id, sgRNA_intervals in zip(
range(sample_values.shape[0]), sample_sgRNA_intervals,
):
for sgRNA_interval in sgRNA_intervals:
fig.add_shape(
type='rect',
x0=sgRNA_interval[0],
y0=sample_id - 0.5,
x1=sgRNA_interval[1],
y1=sample_id + 0.5,
line={'color': 'Black'},
)

fig.update_layout(
autosize=True,
)
fig['layout']['yaxis']['scaleanchor'] = 'x'
fig['layout']['yaxis']['gridcolor'] = 'rgba(0, 0, 0, 0)'
fig['layout']['xaxis']['gridcolor'] = 'rgba(0, 0, 0, 0)'
return fig.write_html(
plot_path,
config={
'responsive': True,
'displaylogo': False,
},
include_plotlyjs='cdn',
full_html=False,
div_id='allele-modification-heatmap-{0}'.format(title.lower()),
)


def plot_allele_modification_line(
sample_values, sample_sgRNA_intervals, plot_path, title,
):
fig = px.line(sample_values.transpose())
sgRNA_intervals = set(
tuple(sgRNA_interval)
for sample_sgRNA_interval in sample_sgRNA_intervals
for sgRNA_interval in sample_sgRNA_interval
)
for sgRNA_interval in sgRNA_intervals:
fig.add_shape(
type='rect',
x0=sgRNA_interval[0],
y0=0,
x1=sgRNA_interval[1],
y1=0.5,
fillcolor='Gray',
opacity=0.2,
line={'color': 'gray'},
)

fig.update_layout(
autosize=True,
xaxis_title='Amplicon Nucleotide (Position)',
yaxis_title='{0} (%)'.format(title),
legend_title='Samples',
)
return fig.write_html(
plot_path,
config={
'responsive': True,
'displaylogo': False,
},
include_plotlyjs='cdn',
full_html=False,
div_id='allele-modification-line-{0}'.format(title.lower()),
)
Loading

0 comments on commit 62900e9

Please sign in to comment.