Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mckay/plot amino acids #87

Open
wants to merge 89 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
fefdfda
added amino acid dict, and notebook changes
mbowcut2 Jul 1, 2024
94ec951
new plot functions
mbowcut2 Jul 2, 2024
040f111
added mixed mode integration tests
mbowcut2 Jul 2, 2024
d4694be
woops
mbowcut2 Jul 2, 2024
70fe5b6
map seq to amino acids
mbowcut2 Jul 3, 2024
87dfccd
prep amino acid func
mbowcut2 Jul 3, 2024
ea8dff1
added color functions. don't work yet.
mbowcut2 Jul 3, 2024
6d14f52
updated notebook
mbowcut2 Jul 3, 2024
9e8bb18
added colors for amino acid plot
mbowcut2 Jul 11, 2024
b98ada3
added tests for amino acid function
mbowcut2 Jul 11, 2024
877908b
removed erroneous 2
mbowcut2 Jul 11, 2024
51fdeb5
added plot fig saving
mbowcut2 Jul 11, 2024
5b5e6d3
plot notebook
mbowcut2 Jul 12, 2024
5af8833
moved amino acid function to shared
mbowcut2 Jul 12, 2024
422116c
updated notebook
mbowcut2 Jul 12, 2024
3dcf2eb
Pin versions of numpy and matplotlib in CI environment (#84) (#452)
Colelyman Jul 9, 2024
567c6eb
changes for pooled mixed-mode default (#83)
mbowcut2 Jul 12, 2024
4682b57
9a plotting
mbowcut2 Jul 15, 2024
ac1b886
spaces
mbowcut2 Jul 15, 2024
d389eb5
removed amino acid '-'
mbowcut2 Jul 15, 2024
3ba2605
plotting
mbowcut2 Jul 15, 2024
63fc53f
get coding sequence with all exon positions
mbowcut2 Jul 15, 2024
cd50356
cleaned up plotting notebook
mbowcut2 Jul 15, 2024
6a6e4b0
formatted caption
mbowcut2 Jul 16, 2024
37b8e4f
added plot 9a to the report
mbowcut2 Jul 16, 2024
8fd1845
removed empty lines, comment
mbowcut2 Jul 16, 2024
b1cee5f
added coding seq to caption
mbowcut2 Jul 16, 2024
0e466f3
-tests
mbowcut2 Jul 17, 2024
47c1528
align AA sequences, then trim to match ref length
mbowcut2 Jul 18, 2024
d827373
updated notebook
mbowcut2 Jul 18, 2024
3e68b8e
added unit test for BLOSUM62
mbowcut2 Jul 18, 2024
7b2ca30
added amino_acid_cut_point for dashed line plotting
mbowcut2 Jul 18, 2024
0d9d96c
udpated notebook
mbowcut2 Jul 18, 2024
968008f
refactored colors function for custom_colors usage
mbowcut2 Jul 22, 2024
b6d095a
added color scheme to default config
mbowcut2 Jul 22, 2024
2468479
added scheme to default style
mbowcut2 Jul 22, 2024
2684354
updated notebook
mbowcut2 Jul 22, 2024
c03498a
Cole/update args (#85) (#456)
Colelyman Jul 18, 2024
897752d
Asymmetrical cut point (#457)
kclem Jul 18, 2024
1e9ca86
D3-Enhancements (#78)
trevormartinj7 Jul 22, 2024
def7c10
added mixed mode integration tests
mbowcut2 Jul 2, 2024
989c56f
woops
mbowcut2 Jul 2, 2024
ba03b85
added mixed mode integration tests
mbowcut2 Jul 2, 2024
683f634
woops
mbowcut2 Jul 2, 2024
4ab2522
moved plot out of sgRNA loop
mbowcut2 Jul 23, 2024
f67a5ec
todo
mbowcut2 Jul 23, 2024
d6e38a2
refactored get_df function without plot cutpoint.
mbowcut2 Jul 23, 2024
c166eb4
set fig width for 9a
mbowcut2 Jul 30, 2024
cc6740c
removed comments, breakpoint
mbowcut2 Jul 30, 2024
1d66672
added stop codon to legend
mbowcut2 Jul 30, 2024
6e6610b
code seq loop
mbowcut2 Jul 30, 2024
c35bfa6
change filename to include coding seq
mbowcut2 Jul 31, 2024
2f82252
wording change
mbowcut2 Aug 20, 2024
0704793
set width based on sequence length
mbowcut2 Aug 20, 2024
e2ebf07
store coding_seqs for plots
mbowcut2 Aug 20, 2024
3007f8c
remove test
mbowcut2 Aug 20, 2024
7d9029a
fix multiple code positions for dataframe
mbowcut2 Aug 21, 2024
db65cf1
comment out sngrna
mbowcut2 Aug 21, 2024
7d4f109
use gap incentive for AA align
mbowcut2 Aug 21, 2024
14139c3
Mckay/halt on plot fail (#103)
mbowcut2 Oct 2, 2024
7e1a55d
Matplotlib Compatibility Fix (#464)
Colelyman Aug 1, 2024
033924b
Cache conda packages in GIthub Actions (#92) (#466)
Colelyman Aug 1, 2024
1829db0
Replace zcat (#94) (#468)
Colelyman Aug 8, 2024
32cf038
Cache read merging step in CRISPRessoPooled on no_rerun (#467)
kclem Aug 9, 2024
7417c1d
Version bump to 2.3.2 (breaks tests because of version change)
kclem Aug 9, 2024
5a41c23
Fix CRISPRessoAggregate bug and other improvements (#95) (#470)
Colelyman Aug 15, 2024
6cc0a4e
Display percentages in the CLI output (#88) (#473)
Colelyman Aug 15, 2024
d1f4475
No pool (#79) (#474)
Colelyman Aug 15, 2024
288291e
Round percentage complete in CLI and add initial 0% complete (#100) (…
Colelyman Aug 22, 2024
ed3fec4
Reduce memory usage for allele plots (#478)
Colelyman Aug 22, 2024
588f0df
Mckay/c2pro reports test (#99) (#479)
Colelyman Aug 23, 2024
7358624
Read Alignment Parallelization (#98) (#480)
trevormartinj7 Aug 27, 2024
95991f1
Add `all_deletion_coordinates` to be returned by `find_indels_substit…
Colelyman Sep 4, 2024
8eb3f47
Move CRISPRessoPro version report to debug
kclem Sep 17, 2024
17b3e7b
CRISPRessoPooled fail on empty amplicon file.
kclem Sep 25, 2024
0b5ca99
Add flexiguide alignment parameters (#107) (#491)
Colelyman Sep 30, 2024
4070cb9
Mckay/halt on plot fail (#103)
mbowcut2 Oct 2, 2024
657331c
fixed loading custom aa colors
mbowcut2 Oct 14, 2024
f492828
D3-Enhancements (#78) (#459)
Colelyman Aug 1, 2024
7a9a07c
Reduce memory usage for allele plots (#478)
Colelyman Aug 22, 2024
76b1db2
Mckay/c2pro reports test (#99) (#479)
Colelyman Aug 23, 2024
b5eefa4
D3-Enhancements (#78) (#459)
Colelyman Aug 1, 2024
3064c8a
Reduce memory usage for allele plots (#478)
Colelyman Aug 22, 2024
4bfd4c0
Mckay/c2pro reports test (#99) (#479)
Colelyman Aug 23, 2024
96513d9
finishing merge
mbowcut2 Oct 14, 2024
7930928
lighter grey for - char
mbowcut2 Oct 16, 2024
d7fe86e
save amino acid df
mbowcut2 Oct 18, 2024
1e74033
Add BLOSUM62 to MANIFEST.in so that it is installed with the package
Colelyman Oct 23, 2024
ee0bcff
wording change
mbowcut2 Oct 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions CRISPResso2/BLOSUM62
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#
# This image is the BLOSUM substitution matrix, which indicates the substitution score for replacing one amino acid with any other amino acid.
#
#
# Lowest score = -4, Highest score = 11
#
A R N D C Q E G H I L K M F P S T W Y V
A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0
R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3
N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3
D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3
C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1
Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2
E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2
G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3
H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3
I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3
L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1
K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2
M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1
F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1
P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2
S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2
T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0
W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3
Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1
V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
2 changes: 2 additions & 0 deletions CRISPResso2/CRISPRessoAggregateCORE.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ def main():

parser.add_argument('--debug', help='Show debug messages', action='store_true')
parser.add_argument('-v', '--verbosity', type=int, help='Verbosity level of output to the console (1-4), 4 is the most verbose', default=3)
parser.add_argument('--halt_on_plot_fail', action="store_true", help="Halt execution if a plot fails to generate")

# CRISPRessoPro params
parser.add_argument('--use_matplotlib', action='store_true',
Expand Down Expand Up @@ -131,6 +132,7 @@ def main():
num_processes=n_processes,
process_pool=process_pool,
process_futures=process_futures,
halt_on_plot_fail=args.halt_on_plot_fail,
)

#glob returns paths including the original prefix
Expand Down
1 change: 1 addition & 0 deletions CRISPResso2/CRISPRessoBatchCORE.py
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,7 @@ def main():
num_processes=n_processes_for_batch,
process_futures=process_futures,
process_pool=process_pool,
halt_on_plot_fail=args.halt_on_plot_fail,
)

window_nuc_pct_quilt_plot_names = []
Expand Down
51 changes: 51 additions & 0 deletions CRISPResso2/CRISPRessoCORE.py
Original file line number Diff line number Diff line change
Expand Up @@ -1606,6 +1606,9 @@ def rreplace(s, old, new):
raise CRISPRessoShared.NTException('The coding sequence contains bad characters:%s' % ' '.join(wrong_nt))

coding_seqs.append(exon_seq)

if len(coding_seqs) > 0:
crispresso2_info['running_info']['coding_seqs'] = coding_seqs

####SET REFERENCES TO COMPARE###
ref_names = [] #ordered list of names
Expand Down Expand Up @@ -3746,6 +3749,7 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_
num_processes=n_processes,
process_pool=process_pool,
process_futures=process_futures,
halt_on_plot_fail=args.halt_on_plot_fail,
)
###############################################################################################################################################
### FIGURE 1: Alignment
Expand Down Expand Up @@ -4551,6 +4555,11 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_
crispresso2_info['results']['refs'][ref_name]['plot_9_roots'] = []
crispresso2_info['results']['refs'][ref_name]['plot_9_captions'] = []
crispresso2_info['results']['refs'][ref_name]['plot_9_datas'] = []

crispresso2_info['results']['refs'][ref_name]['plot_9a_roots'] = []
crispresso2_info['results']['refs'][ref_name]['plot_9a_captions'] = []
crispresso2_info['results']['refs'][ref_name]['plot_9a_datas'] = []

crispresso2_info['results']['refs'][ref_name]['allele_frequency_files'] = []

crispresso2_info['results']['refs'][ref_name]['plot_10d_roots'] = []
Expand Down Expand Up @@ -4774,6 +4783,44 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_
crispresso2_info['results']['refs'][ref_name]['plot_10g_captions'].append("Figure 10g: Non-reference base counts. For target nucleotides in the plotting window, this plot shows the number of non-reference (non-" + args.conversion_nuc_from + ") bases. The number of each target base is annotated on the reference sequence at the bottom of the plot.")
crispresso2_info['results']['refs'][ref_name]['plot_10g_datas'].append([('Nucleotide frequencies at ' + args.conversion_nuc_from +'s', os.path.basename(quant_window_sel_nuc_freq_filename))])


if refs[ref_name]['contains_coding_seq']:
for i, coding_seq in enumerate(coding_seqs):
fig_filename_root = _jp('9a.'+ref_plot_name+'amino_acid_table_around_'+coding_seq)
coding_seq_amino_acids = CRISPRessoShared.get_amino_acids_from_nucs(coding_seq)
amino_acid_cut_point = (cut_point - refs[ref_name]['exon_positions'][0] + 1)// 3
df_to_plot = CRISPRessoShared.get_amino_acid_dataframe(
df_alleles.loc[df_alleles['Reference_Name'] == ref_name],
refs[ref_name]['exon_intervals'][i][0],
len(coding_seq_amino_acids),
os.path.join(_ROOT, "BLOSUM62"),
amino_acid_cut_point)

plot_9a_input = {
'reference_seq': coding_seq_amino_acids,
'df_alleles': df_to_plot,
'fig_filename_root': fig_filename_root,
'custom_colors': custom_config["colors"],
'MIN_FREQUENCY': args.min_frequency_alleles_around_cut_to_plot,
'MAX_N_ROWS': args.max_rows_alleles_around_cut_to_plot,
'SAVE_ALSO_PNG': save_png,
'plot_cut_point': plot_cut_point,
'sgRNA_intervals': new_sgRNA_intervals,
'sgRNA_names': sgRNA_names,
'sgRNA_mismatches': sgRNA_mismatches,
'annotate_wildtype_allele': args.annotate_wildtype_allele,
'cut_point': amino_acid_cut_point,
}

amino_acid_filename = _jp(ref_plot_name+'amino_acid_table_for_'+coding_seq+'.txt')
df_to_plot.to_csv(amino_acid_filename, sep='\t', header=True, index=True)

debug('Plotting amino acids for {0}'.format(ref_name))
plot(CRISPRessoPlot.plot_amino_acid_table, plot_9a_input)
crispresso2_info['results']['refs'][ref_name]['plot_9a_roots'].append(os.path.basename(fig_filename_root))
crispresso2_info['results']['refs'][ref_name]['plot_9a_captions'].append(
"Figure 9a: Visualization of the distribution of identified amino acids based on the coding sequence (" + coding_seq+"). The vertical dashed line indicates the predicted cleavage site.")
crispresso2_info['results']['refs'][ref_name]['plot_9a_datas'].append([('Amino Acid table', os.path.basename(amino_acid_filename))])
info('Done!')

#END GUIDE SPECIFIC PLOTS
Expand Down Expand Up @@ -5174,6 +5221,10 @@ def get_scaffold_len(row, scaffold_start_loc, scaffold_seq):
print_stacktrace_if_debug()
error('Filtering error, please check your input.\n\nERROR: %s' % e)
sys.exit(13)
except CRISPRessoShared.PlotException as e:
print_stacktrace_if_debug()
error(e)
sys.exit(14)
except Exception as e:
print_stacktrace_if_debug()
error('Unexpected error, please check your input.\n\nERROR: %s' % e)
Expand Down
10 changes: 9 additions & 1 deletion CRISPResso2/CRISPRessoMultiProcessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
import pandas as pd
import traceback

from CRISPResso2.CRISPRessoShared import PlotException


def get_max_processes():
return mp.cpu_count()

Expand Down Expand Up @@ -284,7 +287,7 @@ def run_parallel_commands(commands_arr, n_processes=1, descriptor='CRISPResso2',
pool.join()


def run_plot(plot_func, plot_args, num_processes, process_futures, process_pool):
def run_plot(plot_func, plot_args, num_processes, process_futures, process_pool, halt_on_plot_fail):
"""Run a plot in parallel if num_processes > 1, otherwise in serial.

Parameters
Expand All @@ -299,6 +302,8 @@ def run_plot(plot_func, plot_args, num_processes, process_futures, process_pool)
The list of futures that submitting the parallel job will return.
process_pool: ProcessPoolExecutor or ThreadPoolExecutor
The pool to submit the job to.
halt_on_plot_fail: bool
If True, an exception will be raised if the plot fails

Returns
-------
Expand All @@ -311,5 +316,8 @@ def run_plot(plot_func, plot_args, num_processes, process_futures, process_pool)
else:
plot_func(**plot_args)
except Exception as e:
if halt_on_plot_fail:
logger.critical(f"Plot error, halting execution \n")
raise PlotException(f'There was an error generating plot {plot_func.__name__}.')
logger.warn(f"Plot error {e}, skipping plot \n")
logger.debug(traceback.format_exc())
Loading
Loading