Skip to content

Commit

Permalink
Passing sgRNA sequences to regular and Batch D3 plots (#73)
Browse files Browse the repository at this point in the history
* Sam/try plots (#71)

* Fix batch mode pandas warning. (#70)

* refactor to call method on DataFrame, rather than Series.
Removes warning.

* Fix pandas future warning in CRISPRessoWGS

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

* Functional

* Cole/fix status file name (#69)

* Update config file logging messages

This removes printing the exception (which is essentially a duplicate),
and adds a condition if no config file was provided. Also changes `json`
to `config` so that it is more clear.

* Fix divide by zero when no amplicons are present in Batch mode

* Don't append file_prefix to status file name

* Place status files in output directories

* Update tests branch for file_prefix addition

* Load D3 and plotly figures with pro with multiple amplicons

* Update batch

* Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix

Before this fix, when using a file_prefix the second run that was compared
would not be displayed as a data in the first figure of the report.

* Import CRISPRessoPro instead of importing the version

When installed via conda, the version is not available

* Remove `get_amplicon_output` unused function from CRISPRessoCompare

Also remove unused argparse import

* Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests

* Allow for matching of multiple guides in the same amplicon

* Fix pandas FutureWarning

* Change test branch back to master

---------

Co-authored-by: Sam <snic9004@gmail.com>

* Try catch all futures

* Fix test fail plots

* Point test to try-plots

* Fix d3 not showing and plotly mixing with matplotlib

* Use logger for warnings and debug statements

* Point tests back at master

---------

Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: Cole Lyman <cole@colelyman.com>

* Sam/fix plots (#72)

* Fix batch mode pandas warning. (#70)

* refactor to call method on DataFrame, rather than Series.
Removes warning.

* Fix pandas future warning in CRISPRessoWGS

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

* Functional

* Cole/fix status file name (#69)

* Update config file logging messages

This removes printing the exception (which is essentially a duplicate),
and adds a condition if no config file was provided. Also changes `json`
to `config` so that it is more clear.

* Fix divide by zero when no amplicons are present in Batch mode

* Don't append file_prefix to status file name

* Place status files in output directories

* Update tests branch for file_prefix addition

* Load D3 and plotly figures with pro with multiple amplicons

* Update batch

* Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix

Before this fix, when using a file_prefix the second run that was compared
would not be displayed as a data in the first figure of the report.

* Import CRISPRessoPro instead of importing the version

When installed via conda, the version is not available

* Remove `get_amplicon_output` unused function from CRISPRessoCompare

Also remove unused argparse import

* Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests

* Allow for matching of multiple guides in the same amplicon

* Fix pandas FutureWarning

* Change test branch back to master

---------

Co-authored-by: Sam <snic9004@gmail.com>

* Try catch all futures

* Fix test fail plots

* Fix d3 not showing and plotly mixing with matplotlib

---------

Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: Cole Lyman <cole@colelyman.com>

* Remove token from integration tests file

* Provide sgRNA_sequences to plot_nucleotide_quilt plots

* Passing sgRNA_sequences to plot

* Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots

* Add max-height to Batch report samples

* Change testing branch

* Fix wrong check for large Batch plots

* Update integration_tests.yml to point back at master

---------

Co-authored-by: Samuel Nichols <Snic9004@gmail.com>
Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: Cole Lyman <cole@colelyman.com>
  • Loading branch information
4 people authored May 9, 2024
1 parent 1c50427 commit 64ef72e
Show file tree
Hide file tree
Showing 5 changed files with 89 additions and 11 deletions.
45 changes: 36 additions & 9 deletions CRISPResso2/CRISPRessoBatchCORE.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,33 @@ def check_library(library_name):
np = check_library('numpy')


def should_plot_large_plots(num_rows, c2pro_installed, use_matplotlib, large_plot_cutoff=300):
"""Determine if large plots should be plotted.
Parameters
----------
num_rows : int
Number of rows in the dataframe.
c2pro_installed : bool
Whether CRISPRessoPro is installed.
use_matplotlib : bool
Whether to use matplotlib when CRISPRessoPro is installed, i.e. value
of `--use_matplotlib`.
large_plot_cutoff : int, optional
Number of samples at which to not plot large plots with matplotlib.
Note that each sample has 6 rows in the datafame. Defaults to 300.
Returns
-------
bool
Whether to plot large plots.
"""
return (
(not use_matplotlib and c2pro_installed)
or (num_rows / 6) < large_plot_cutoff
)


def main():
try:
start_time = datetime.now()
Expand Down Expand Up @@ -395,8 +422,6 @@ def main():
crispresso2_info['results']['general_plots']['allele_modification_line_plot_labels'] = {}
crispresso2_info['results']['general_plots']['allele_modification_line_plot_datas'] = {}

large_plot_cutoff = 300

percent_complete_start, percent_complete_end = 90, 99
if all_amplicons:
percent_complete_step = (percent_complete_end - percent_complete_start) / len(all_amplicons)
Expand Down Expand Up @@ -580,7 +605,7 @@ def main():
sub_modification_percentage_summary_filename = _jp(amplicon_plot_name + 'Modification_percentage_summary_around_sgRNA_'+sgRNA+'.txt')
sub_modification_percentage_summary_df.to_csv(sub_modification_percentage_summary_filename, sep='\t', index=None)

if not args.suppress_plots and not args.suppress_batch_summary_plots and (nucleotide_percentage_summary_df.shape[0] / 6) < large_plot_cutoff:
if not args.suppress_plots and not args.suppress_batch_summary_plots and should_plot_large_plots(sub_nucleotide_percentage_summary_df.shape[0], C2PRO_INSTALLED, args.use_matplotlib):
# plot for each guide
# show all sgRNA's on the plot
sub_sgRNA_intervals = []
Expand Down Expand Up @@ -614,6 +639,7 @@ def main():
'fig_filename_root': f'{this_window_nuc_pct_quilt_plot_name}.json' if not args.use_matplotlib and C2PRO_INSTALLED else this_window_nuc_pct_quilt_plot_name,
'save_also_png': save_png,
'sgRNA_intervals': sub_sgRNA_intervals,
'sgRNA_sequences': consensus_guides,
'quantification_window_idxs': include_idxs,
'custom_colors': custom_config['colors'],
}
Expand All @@ -628,7 +654,7 @@ def main():
crispresso2_info['results']['general_plots']['summary_plot_labels'][plot_name] = 'Composition of each base around the guide ' + sgRNA + ' for the amplicon ' + amplicon_name
crispresso2_info['results']['general_plots']['summary_plot_datas'][plot_name] = [('Nucleotide frequencies', os.path.basename(nucleotide_frequency_summary_filename)), ('Modification frequencies', os.path.basename(modification_frequency_summary_filename))]

if args.base_editor_output and (sub_nucleotide_percentage_summary_df.shape[0] / 6) < large_plot_cutoff:
if args.base_editor_output and should_plot_large_plots(sub_nucleotide_percentage_summary_df.shape[0], False, args.use_matplotlib):
this_window_nuc_conv_plot_name = _jp(amplicon_plot_name + 'Nucleotide_conversion_map_around_sgRNA_'+sgRNA)
conversion_map_input = {
'nuc_pct_df': sub_nucleotide_percentage_summary_df,
Expand Down Expand Up @@ -656,14 +682,15 @@ def main():
]
# done with per-sgRNA plots

if not args.suppress_plots and not args.suppress_batch_summary_plots: # plot the whole region
if not args.suppress_plots and not args.suppress_batch_summary_plots and should_plot_large_plots(nucleotide_percentage_summary_df.shape[0], C2PRO_INSTALLED, args.use_matplotlib): # plot the whole region
this_nuc_pct_quilt_plot_name = _jp(amplicon_plot_name.replace('.', '') + 'Nucleotide_percentage_quilt')
nucleotide_quilt_input = {
'nuc_pct_df': nucleotide_percentage_summary_df,
'mod_pct_df': modification_percentage_summary_df,
'fig_filename_root': f'{this_nuc_pct_quilt_plot_name}.json' if not args.use_matplotlib and C2PRO_INSTALLED else this_nuc_pct_quilt_plot_name,
'save_also_png': save_png,
'sgRNA_intervals': consensus_sgRNA_intervals,
'sgRNA_sequences': consensus_guides,
'quantification_window_idxs': include_idxs,
'custom_colors': custom_config['colors'],
}
Expand All @@ -679,7 +706,7 @@ def main():
crispresso2_info['results']['general_plots']['summary_plot_titles'][plot_name] = ''
crispresso2_info['results']['general_plots']['summary_plot_labels'][plot_name] = 'Composition of each base for the amplicon ' + amplicon_name
crispresso2_info['results']['general_plots']['summary_plot_datas'][plot_name] = [('Nucleotide frequencies', os.path.basename(nucleotide_frequency_summary_filename)), ('Modification frequencies', os.path.basename(modification_frequency_summary_filename))]
if args.base_editor_output and (sub_nucleotide_percentage_summary_df.shape[0] / 6) < large_plot_cutoff:
if args.base_editor_output and should_plot_large_plots(nucleotide_percentage_summary_df.shape[0], False, args.use_matplotlib):
this_nuc_conv_plot_name = _jp(amplicon_plot_name + 'Nucleotide_conversion_map')
conversion_map_input = {
'nuc_pct_df': nucleotide_percentage_summary_df,
Expand All @@ -706,7 +733,7 @@ def main():
crispresso2_info['results']['general_plots']['summary_plot_datas'][plot_name] = [('Nucleotide frequencies', os.path.basename(nucleotide_frequency_summary_filename)), ('Modification frequencies', os.path.basename(modification_frequency_summary_filename))]

else: # guides are not the same
if not args.suppress_plots and not args.suppress_batch_summary_plots:
if not args.suppress_plots and not args.suppress_batch_summary_plots and should_plot_large_plots(nucleotide_percentage_summary_df.shape[0], C2PRO_INSTALLED, args.use_matplotlib):
this_nuc_pct_quilt_plot_name = _jp(amplicon_plot_name.replace('.', '') + 'Nucleotide_percentage_quilt')
nucleotide_quilt_input = {
'nuc_pct_df': nucleotide_percentage_summary_df,
Expand All @@ -724,7 +751,7 @@ def main():
nuc_pct_quilt_plot_names.append(plot_name)
crispresso2_info['results']['general_plots']['summary_plot_labels'][plot_name] = 'Composition of each base for the amplicon ' + amplicon_name
crispresso2_info['results']['general_plots']['summary_plot_datas'][plot_name] = [('Nucleotide frequencies', os.path.basename(nucleotide_frequency_summary_filename)), ('Modification frequencies', os.path.basename(modification_frequency_summary_filename))]
if args.base_editor_output and (sub_nucleotide_percentage_summary_df.shape[0] / 6) < large_plot_cutoff:
if args.base_editor_output and should_plot_large_plots(nucleotide_percentage_summary_df.shape[0], False, args.use_matplotlib):
this_nuc_conv_plot_name = _jp(amplicon_plot_name + 'Nucleotide_percentage_quilt')
conversion_map_input = {
'nuc_pct_df': nucleotide_percentage_summary_df,
Expand All @@ -745,7 +772,7 @@ def main():
crispresso2_info['results']['general_plots']['summary_plot_datas'][plot_name] = [('Nucleotide frequencies', os.path.basename(nucleotide_frequency_summary_filename)), ('Modification frequencies', os.path.basename(modification_frequency_summary_filename))]

# allele modification frequency heatmap and line plots
if C2PRO_INSTALLED and not args.use_matplotlib and not args.suppress_plots and not args.suppress_batch_summary_plots and (nucleotide_percentage_summary_df.shape[0] / 6) < large_plot_cutoff:
if C2PRO_INSTALLED and not args.use_matplotlib and not args.suppress_plots and not args.suppress_batch_summary_plots:
if guides_all_same:
sgRNA_intervals = [consensus_sgRNA_intervals] * modification_frequency_summary_df.shape[0]
else:
Expand Down
7 changes: 7 additions & 0 deletions CRISPResso2/CRISPRessoCORE.py
Original file line number Diff line number Diff line change
Expand Up @@ -3785,6 +3785,7 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_
'sgRNA_intervals': sgRNA_intervals,
'sgRNA_names': sgRNA_names,
'sgRNA_mismatches': sgRNA_mismatches,
'sgRNA_sequences': sgRNA_sequences,
'quantification_window_idxs': include_idxs_list,
'custom_colors': custom_config["colors"],
}
Expand Down Expand Up @@ -3833,6 +3834,7 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_
'sgRNA_intervals': new_sgRNA_intervals,
'sgRNA_names': sgRNA_names,
'sgRNA_mismatches': sgRNA_mismatches,
'sgRNA_sequences': [sgRNA],
'quantification_window_idxs': new_include_idx,
'custom_colors': custom_config["colors"],
}
Expand Down Expand Up @@ -4184,6 +4186,7 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_
sgRNA_intervals = refs[ref_names_for_hdr[0]]['sgRNA_intervals']
sgRNA_names = refs[ref_names_for_hdr[0]]['sgRNA_names']
sgRNA_mismatches = refs[ref_names_for_hdr[0]]['sgRNA_mismatches']
sgRNA_sequences = refs[ref_names_for_hdr[0]]['sgRNA_sequences']
# include_idxs_list = refs[ref_names_for_hdr[0]]['include_idxs']
include_idxs_list = [] # the quantification windows may be different between different amplicons

Expand All @@ -4204,6 +4207,7 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_
'quantification_window_idxs': include_idxs_list,
'sgRNA_names': sgRNA_names,
'sgRNA_mismatches': sgRNA_mismatches,
'sgRNA_sequences': sgRNA_sequences,
'custom_colors': custom_config["colors"],
}
debug('Plotting HDR nucleotide quilt')
Expand Down Expand Up @@ -4789,6 +4793,7 @@ def get_scaffold_len(row, scaffold_start_loc, scaffold_seq):
sgRNA_intervals = refs[ref_names[0]]['sgRNA_intervals']
sgRNA_names = refs[ref_names[0]]['sgRNA_names']
sgRNA_mismatches = refs[ref_names[0]]['sgRNA_mismatches']
sgRNA_sequences = refs[ref_names[0]]['sgRNA_sequences']
include_idxs_list = refs[ref_names[0]]['include_idxs']

plot_root = _jp('11a.Prime_editing_nucleotide_percentage_quilt')
Expand All @@ -4801,6 +4806,7 @@ def get_scaffold_len(row, scaffold_start_loc, scaffold_seq):
'sgRNA_intervals': sgRNA_intervals,
'sgRNA_names': sgRNA_names,
'sgRNA_mismatches': sgRNA_mismatches,
'sgRNA_sequences': sgRNA_sequences,
'quantification_window_idxs': include_idxs_list,
'custom_colors': custom_config['colors']
}
Expand Down Expand Up @@ -4861,6 +4867,7 @@ def get_scaffold_len(row, scaffold_start_loc, scaffold_seq):
'sgRNA_intervals': new_sgRNA_intervals,
'sgRNA_names': sgRNA_names,
'sgRNA_mismatches': sgRNA_mismatches,
'sgRNA_sequences': [sgRNA],
'quantification_window_idxs': new_include_idx,
'custom_colors': custom_config['colors']
}
Expand Down
2 changes: 1 addition & 1 deletion CRISPResso2/CRISPRessoReports/templates/batchReport.html
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
<h5 id="CRISPResso2_Batch_Output">{{report_name}}</h5>
</div>
<div class='card-body p-0'>
<div class="list-group list-group-flush">
<div class="list-group list-group-flush" style="max-height: 25vh; overflow-y: scroll;">
{% for run_name in run_names %}
<a href="{{sub_html_files[run_name]}}" class="list-group-item list-group-item-action" id="{{run_name}}">{{run_name}}</a>
{% endfor %}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ <h5 id="failed_runs" class="mb-0 text-white">Failed Runs</h5>
</div>

<div class='card-body p-0'>
<div class="list-group list-group-flush">
<div class="list-group list-group-flush" style="max-height: 25vh; overflow-y: scroll;">
{% for failed_run in failed_runs %}
{# Toggle the description visibility on click #}
<a href="javascript:void(0)" class="list-group-item list-group-item-action failed-run-name bg-light text-dark"
Expand Down
44 changes: 44 additions & 0 deletions tests/unit_tests/test_CRISPRessoBatchCORE.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from CRISPResso2 import CRISPRessoBatchCORE



def test_should_plot_large_plots():
num_rows = 60
c2pro_installed = False
use_matplotlib = False
large_plot_cutoff = 300
assert CRISPRessoBatchCORE.should_plot_large_plots(num_rows, c2pro_installed, use_matplotlib, large_plot_cutoff)


def test_should_plot_large_plots_c2pro_installed_use_matplotlib_small():
num_rows = 60
c2pro_installed = True
use_matplotlib = True
large_plot_cutoff = 300
assert CRISPRessoBatchCORE.should_plot_large_plots(num_rows, c2pro_installed, use_matplotlib, large_plot_cutoff)


def test_should_plot_large_plots_c2pro_installed():
num_rows = 6000
c2pro_installed = True
use_matplotlib = False
large_plot_cutoff = 300
assert CRISPRessoBatchCORE.should_plot_large_plots(num_rows, c2pro_installed, use_matplotlib, large_plot_cutoff)


def test_should_plot_large_plots_c2pro_installed_use_matplotlib_large():
num_rows = 6000
c2pro_installed = True
use_matplotlib = True
large_plot_cutoff = 300
assert not CRISPRessoBatchCORE.should_plot_large_plots(num_rows, c2pro_installed, use_matplotlib, large_plot_cutoff)


def test_should_plot_large_plots_c2pro_not_installed_use_matplotlib():
num_rows = 6000
c2pro_installed = False
use_matplotlib = True
large_plot_cutoff = 300
assert not CRISPRessoBatchCORE.should_plot_large_plots(num_rows, c2pro_installed, use_matplotlib, large_plot_cutoff)


0 comments on commit 64ef72e

Please sign in to comment.