Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new pVACsplice tool #911

Merged
merged 226 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
226 commits
Select commit Hold shift + click to select a range
f91f17f
integrating pvacsplice with pvacseq
mrichters Jan 24, 2022
401fdca
modifying files to run PvacsplicePipeline()
mrichters Jan 24, 2022
dd6844d
update splice pipeline
mrichters Mar 9, 2022
8aab576
add pvacsplice docs
mrichters Mar 9, 2022
d64207b
add class to input_file_converter.py
mrichters Mar 10, 2022
f5327b1
recent updates
mrichters Apr 22, 2022
2e2aba9
make pvacsplice compatible with multiple epitope lengths
mrichters Apr 28, 2022
194d68b
add coverage filter
mrichters Apr 28, 2022
38d4960
update splice pipeline & add FastaVariant
mrichters Jun 7, 2022
67a49bd
reformat for indels when combining in pvacsplice
atwollam Jun 9, 2022
8c6dff4
Merge branch 'staging' into staging
mrichters Jun 10, 2022
d2af88d
Merge pull request #4 from griffithlab/staging
mrichters Jun 10, 2022
bb9e65b
Merge pull request #5 from atwollam/splice_old_staging_indref
mrichters Jun 10, 2022
d950132
make FastaVariant compatible
mrichters Jun 10, 2022
f09a951
Merge branch 'staging' of https://github.com/mrichters/pVACtools into…
mrichters Jun 10, 2022
599dfb8
convert variant coordinates to SNV format to match RegTools
mrichters Jun 15, 2022
e74d9bd
Update filtering junctions stategy
mrichters Jun 16, 2022
6c2560f
Remove debugging from filter_regtools_results.py
mrichters Jun 16, 2022
2e666f1
finalize filter junctions and anotate/combine variants steps in splic…
mrichters Jun 20, 2022
7aa2ce9
retain strand change from str to int in merged_df[strand]
mrichters Jun 28, 2022
391825d
add column "fasta index" to combined file for peptide identification
mrichters Jun 28, 2022
909988b
add tsl to junction pipeline parameters
mrichters Jun 28, 2022
25e1726
modify to catch previous filtering errors
mrichters Jun 28, 2022
c28000d
de-duplicate peptide kmers in fasta output
mrichters Jul 6, 2022
e04c73d
create tmp dir and add files not needed in final output
mrichters Jul 7, 2022
a6d2798
Merge remote-tracking branch 'origin/staging' into pvacsplice
mrichters Jul 8, 2022
960ea73
make final edits to splice pipeline & updated output_parser.py
mrichters Jul 8, 2022
9003d64
pvacsplice testing
atwollam Jul 16, 2022
738e3ad
fix combined report error if only 1 epitope length specified
mrichters Jul 19, 2022
61f3e06
change final combined splice file suffix to _junctions.tsv
mrichters Jul 19, 2022
554118f
reset pybiomart host to http://www.ensembl.org
mrichters Jul 19, 2022
5a37f22
delete tmp directory after pipeline is run
mrichters Jul 19, 2022
701b6bb
update Best/Median Score header names to include IC50
mrichters Jul 19, 2022
7097069
add ensembl_version to positional arguments
mrichters Jul 19, 2022
9d6c394
Merge remote-tracking branch 'origin/staging' into staging
mrichters Jul 19, 2022
000db27
add junction anchor to missing coordinate log
mrichters Jul 19, 2022
9fc9526
write log messages and skip steps for already created files
mrichters Jul 20, 2022
bcc2e4e
remove test print statement
mrichters Jul 20, 2022
bba3cb3
reformat parser to run main program
mrichters Jul 20, 2022
db9de9d
change junctions file name in tsv_file_path()
mrichters Jul 20, 2022
7fb90aa
more log formatting
mrichters Jul 20, 2022
2bd7d95
create separate MHC reports for each length and combine at the end
mrichters Jul 20, 2022
ec18629
add newline at end of file
mrichters Jul 20, 2022
7165d82
add kmer index file to retrieve variant info per kmer in fasta
mrichters Jul 22, 2022
f98bf53
Merge branch 'pvacsplice' into pvacsplice
mrichters Jul 22, 2022
b2721f0
Merge pull request #6 from atwollam/pvacsplice
mrichters Jul 22, 2022
8316eca
Merge remote-tracking branch 'mrichters/pvacsplice' into staging
mrichters Jul 22, 2022
b23853f
reset ensembl host to http://www.ensembl.org/
mrichters Jul 22, 2022
b02915e
add pyfaidx to requirements
mrichters Jul 22, 2022
107df90
move pvacsplice scripts to lib
mrichters Jul 22, 2022
aa36d8e
add combine_reports_epitope_lengths() to run.py to combine all epitop…
mrichters Jul 22, 2022
ecd7558
changed file paths to correspond to move to lib
mrichters Jul 22, 2022
62026b5
combine $sample_name.all_epitopes.tsv for all lengths
mrichters Aug 9, 2022
7eb03ff
add sample_name to fasta_to_kmers parameters
mrichters Aug 9, 2022
874d36f
change fasta name to match pVACtools: $sample_name.$epitope_length.fa
mrichters Aug 9, 2022
c40acdd
updated code in pvacsplice according to recent updates in griffithlab…
mrichters Aug 9, 2022
5b84b33
remove pvacsplice-specific filters where not needed
mrichters Aug 10, 2022
dab5168
remove class I/II epitope lengths if there are no class I/II alleles
mrichters Aug 24, 2022
504f6d0
create a pvacsplice-specific IC50 sort method
mrichters Aug 24, 2022
ef0e023
update to create kmer index file and stop duplicate entries in fasta
mrichters Oct 6, 2022
20e9bf5
finalize output file contents
mrichters Oct 7, 2022
4854b66
check to make sure fasta creation is complete
mrichters Oct 7, 2022
2d8cc03
finalize correct variables in run.py for use in pipeline.py
mrichters Oct 7, 2022
82e1b36
Merge pull request #8 from griffithlab/staging
mrichters Oct 7, 2022
3d25e23
update tsv_index to include start position - now all are unique keys
mrichters Oct 7, 2022
6e28d57
add ensembl_version to pvacsplice required arguments
mrichters Oct 7, 2022
4738985
Merge branch 'staging' of https://github.com/mrichters/pVACtools into…
mrichters Oct 7, 2022
bb2e8e0
create pvacsplice-specific filter based on IC50 score
mrichters Oct 22, 2022
90cde99
add function to combine all_epitopes files for all epitope lengths
mrichters Oct 22, 2022
8572c8a
make pvacsplice-specific classes/edits
mrichters Oct 22, 2022
73f8d45
create pvacsplice-specific class
mrichters Oct 22, 2022
0f9dcde
finalize kmer discovery pipeline
mrichters Oct 22, 2022
c438a48
add required parameter - gtf file used in regtools analysis
mrichters Oct 22, 2022
289b1fd
Merge pull request #9 from griffithlab/staging
mrichters Oct 22, 2022
b52e36f
adding frameshift/NMD information to final output
mrichters Oct 25, 2022
dcbdb70
pulling recent changes to pvacsplice
mrichters Oct 25, 2022
879cc49
return combined_df instead of writing the file
mrichters Oct 25, 2022
711a1df
add frameshift info to all_epitopes.tsv and use personalized fasta fo…
mrichters Oct 25, 2022
805b7aa
removed dropna() from self.final_combined_df
mrichters Oct 25, 2022
7ed8220
final changes to detect frameshift junctions and add gtf_path input t…
mrichters Oct 25, 2022
10fc14e
print index for each junction
mrichters Oct 25, 2022
a69773c
modified how to combine files within mhc class and both classes together
mrichters Nov 16, 2022
57088d5
removed max tsl parameter from aggregate_all_epitopes class because t…
mrichters Nov 16, 2022
6c96667
modified how to combine files within mhc class and both classes together
mrichters Nov 16, 2022
10a5f06
return filtered df for input into combine_inputs.py directly
mrichters Nov 16, 2022
1581109
added support for input gtf file and reporting frameshift events in t…
mrichters Nov 16, 2022
592d7f9
added support for input gtf file and reporting frameshift events in t…
mrichters Nov 16, 2022
e635c18
convert gtf file into df for transcript/CDS coordinates
mrichters Nov 16, 2022
5390ec9
using this script as a test for gtfparse / input gtf file
mrichters Nov 16, 2022
e4975a3
added gtf_file to pvacsplice required arguments
mrichters Nov 16, 2022
fbe71cd
add ".execute" to call aggregate_all_epitopes from this module
mrichters Nov 16, 2022
8461fca
removed variant conversion logic, I will match variants by start or s…
mrichters Nov 16, 2022
3e64bed
Merge branch 'staging' into staging
mrichters Nov 16, 2022
4e1ea2a
Merge pull request #10 from griffithlab/staging
mrichters Nov 16, 2022
4e960a5
requirements.txt throwing an error w/ pyfaidx==0.6.4 so moving this l…
mrichters Nov 21, 2022
9a071f3
fixed matching vcf coordinates back to junction variant coordinates (…
mrichters Nov 21, 2022
50bff00
add warning if a transcript or variant in junctions file is not prese…
mrichters Nov 21, 2022
0fa3cd4
deduplicated anchor type from warning: junction coor missing
mrichters Nov 21, 2022
b8da90f
add dropna() to combined junctions df to remove junctions with no wt …
mrichters Nov 21, 2022
f5344fa
filter vcf transcripts by ENST prefix first
mrichters Nov 22, 2022
2f5c7cd
final pipeline changes before testing
mrichters Dec 19, 2022
ba313cd
removed sample name from inputs and converted transcript_version in j…
mrichters Jan 10, 2023
9ecfa9e
dropped exon_number from df since NA vs. blank cells create error
mrichters Jan 10, 2023
3b5ca5d
added optional save_gtf df to file (option -g)
mrichters Jan 10, 2023
06c2d37
removed mods to combined.tsv in junction_to_fasta - was creating a te…
mrichters Jan 10, 2023
298cae5
added tsl filter to gtf dataframe in load_gtf_data.py
mrichters Jan 10, 2023
a5e454b
add testing scripts for combine_inputs and junction_to_fasta
mrichters Jan 10, 2023
d5c93a7
add optional anchor types parameter
mrichters Jan 11, 2023
6b69293
modify test scripts to match current test_data
mrichters Jan 26, 2023
3b693ca
modify splice_pipeline modules as needed based on testing
mrichters Jan 26, 2023
d970ffa
add pvacsplice to pVACtools setup
mrichters Jan 26, 2023
24e2829
update test scripts until all are passing
mrichters Feb 2, 2023
852950c
update splice pipeline modules to pass tests
mrichters Feb 2, 2023
5c5b2b4
add test_data results/ folder
mrichters Feb 2, 2023
1bb18ec
add pvacsplice test_data input files
mrichters Feb 6, 2023
14faf7e
gzip test_data fasta input to avoid github size limits
mrichters Feb 6, 2023
bf1b22e
Merge remote-tracking branch 'origin/staging' into staging
mrichters Feb 6, 2023
78aa46a
Merge branch 'staging' into staging
mrichters Feb 6, 2023
17d612a
Merge pull request #11 from griffithlab/staging
mrichters Feb 6, 2023
8ddcd53
Merge branch 'staging' of https://github.com/mrichters/pVACtools into…
mrichters Feb 6, 2023
8a5da15
update pvacsplice testing scripts
mrichters Feb 7, 2023
92be611
update pvacsplice modules based on testing scripts
mrichters Feb 7, 2023
ef5cf6e
final testing changes
mrichters Feb 10, 2023
da90a59
removed gene_name and gene_id from merging dfs in combine_inputs.py; …
mrichters Feb 15, 2023
cfb6dbe
fixed typo in setup.py
mrichters Feb 15, 2023
c92c9f7
modified this method to filter by tsl (instead of including in filter…
mrichters Feb 27, 2023
0118b8e
modified scripts based on local pvacsplice testing - HCC1395
mrichters Feb 27, 2023
dc81cdd
Merge branch 'glab-staging' into staging
mrichters Feb 28, 2023
1b7c44e
Merge branch 'glab-staging' into pvacsplice
mrichters Feb 28, 2023
e8173bb
Merge branch 'staging' into pvacsplice
mrichters Feb 28, 2023
8d4ef88
update PvacSplice aggregate epitopes class to reflect recent upstream…
mrichters Feb 28, 2023
efdf078
corrected errors from previous commit
mrichters Feb 28, 2023
73717e2
changed object attribute name (self.gtf_df to self.gtf_data) to avoid…
mrichters Feb 28, 2023
cc400c8
added pVACsplice to --file_type choices
mrichters Feb 28, 2023
fc52169
Fix failing tests
susannasiebert Mar 6, 2023
db71e76
Merge branch 'griffithlab:staging' into staging
mrichters Mar 8, 2023
4cc0176
add print_log.py to check for correct pvacsplice inputs
mrichters Mar 9, 2023
0ef96ee
remove .idea pycharm files
mrichters Mar 9, 2023
b355227
add pvacsplice class for post processing
mrichters Mar 12, 2023
7f429d9
format pvacsplice to run aggregate_all_epitopes
mrichters Mar 12, 2023
8611abb
add print_log() method from pipeline to pvacsplice run.py to generate…
mrichters Mar 12, 2023
089e0c1
add file exists filters to skip already created files and modifiy pri…
mrichters Mar 12, 2023
210373e
add post_processing step to pvacsplice execute()
mrichters Mar 12, 2023
fcd1098
add check in output_parser to prevent ValueError if no normal sample …
mrichters Mar 12, 2023
3fcf4cf
Merge branch 'griffithlab:staging' into staging
mrichters Mar 12, 2023
6eff751
add fasta_size is an even number check
mrichters Mar 12, 2023
1570d1c
reformat run_argument_parser.py to make shared/unique args easier to …
mrichters Mar 12, 2023
2137af5
Merge branch 'working' into staging
mrichters Mar 12, 2023
8481a76
remove duplicate allele parameter in pvacsplice method
mrichters Mar 12, 2023
628c532
modify print statements in splice pipeline
mrichters Mar 13, 2023
c413165
Merge branch 'griffithlab:staging' into staging
mrichters Mar 19, 2023
e88c3b8
Merge branch 'griffithlab:staging' into staging
mrichters Apr 10, 2023
50cef08
Fix tests
susannasiebert Apr 12, 2023
53aea3b
Remove support for python 3.6
susannasiebert Apr 12, 2023
0297741
add .idea/ pycharm folder to .gitignore
mrichters Apr 12, 2023
84733b3
add files exist checks for splice pipeline
mrichters Apr 12, 2023
82bed15
Merge branch 'griffithlab:staging' into staging
mrichters Apr 13, 2023
a664eb7
Apply suggestions from code review
mrichters Apr 13, 2023
b9c0908
update combine reports functions
mrichters Apr 13, 2023
a9530b1
remove redundant parentheses
mrichters Apr 13, 2023
b97510d
remove print statement
mrichters Apr 13, 2023
b9ff855
change superclass for PvacspliceAllEpitopes to PvacbindAllEpitopes
mrichters Apr 13, 2023
bc4b8bd
run coverage filter after pvacsplice neoag prediction
mrichters Apr 13, 2023
4db1375
Merge branch 'staging' into working
mrichters Apr 13, 2023
395ad84
do not write kmer_index to file
mrichters Apr 14, 2023
fd075dd
reorganize output_dirs in pipeline
mrichters Apr 14, 2023
e477e8b
reorganize output_dirs in splice pipeline
mrichters Apr 14, 2023
b3df3c5
remove kmer index call and change pipeline output dir to junctions_dir
mrichters Apr 14, 2023
1f87d50
reformat PvacspliceAggregateAllEpitopes __init__()
mrichters Apr 14, 2023
3344429
update post processor to run once in run.py with combined epitopes le…
mrichters Apr 19, 2023
142ca96
remove saving kmer_index df to file by default
mrichters Apr 19, 2023
c0e9f0a
Handle flurry state correctly when running the PostProcessor outside …
susannasiebert Apr 19, 2023
a073f62
Pin latest version of varcode that has no pyvcf dependency and remove…
susannasiebert Apr 19, 2023
3e0f26b
Mock netchop and netmhcstabpan API calls during pVACbind tests
susannasiebert Apr 14, 2023
69f477f
Remove unused code
susannasiebert Apr 14, 2023
ebe895d
Fix test class name
susannasiebert Apr 14, 2023
6f4eead
Format peptide columns in various tables in monospace font
susannasiebert Apr 10, 2023
61dae17
Merge branch 'griffithlab:staging' into staging
mrichters Apr 24, 2023
908f8ce
Fix load_gtf_data tests
mrichters Apr 25, 2023
65f23f3
Add min_fold_change = None to post processing params
mrichters Apr 25, 2023
77ee430
Merge remote-tracking branch 'refs/remotes/origin/staging' into staging
mrichters Apr 25, 2023
c47e481
Merge branch 'staging' into staging
susannasiebert May 2, 2023
be48d7a
Update pvactools/lib/run_argument_parser.py
mrichters May 16, 2023
591a2fc
Update pvactools/lib/run_argument_parser.py
mrichters May 16, 2023
06e9617
Update pvactools/lib/run_argument_parser.py
mrichters May 16, 2023
80826bb
Update pvactools/lib/fasta_to_kmers.py
mrichters May 16, 2023
d51a08c
Update pvactools/lib/fasta_to_kmers.py
mrichters May 16, 2023
4b699bd
removed 'get_flurry_state' from PvacsplicePipeline
mrichters May 25, 2023
567607a
Merge remote-tracking branch 'origin/staging' into staging
mrichters May 26, 2023
962d95d
correct any syntax errors
mrichters May 31, 2023
d8f4faf
Change variable naming to fix error in accessing the parameters after…
susannasiebert Jul 7, 2023
3a0b8b2
Fix some edge case errors
susannasiebert Jul 7, 2023
ebb56f9
Merge remote-tracking branch 'origin' into pvacsplice
susannasiebert Jul 7, 2023
4bb19cf
Merge remote-tracking branch 'origin/master' into pvacsplice
susannasiebert Jan 22, 2024
56eb410
Merge remote-tracking branch 'origin/hotfix' into pvacsplice
susannasiebert Jan 22, 2024
57c38ff
Pin pandas to version 2.0.0
susannasiebert Jan 22, 2024
c16b424
Merge remote-tracking branch 'origin/staging' into mrichters_staging
susannasiebert Apr 10, 2024
7d488a5
Add support for reference proteome similarity step to pVACsplice
susannasiebert Apr 10, 2024
f29a498
Add support for NetChop to pVACsplice
susannasiebert Apr 11, 2024
f45c19e
Add support for NetMHCstabpan to pVACsplice
susannasiebert Apr 11, 2024
1f7d2b6
Enable creation of combined reports and log file writing
susannasiebert Apr 12, 2024
8f9147e
Correctly set the variant index to group epitopes from the same varia…
susannasiebert Apr 12, 2024
ff1206d
Don't pre-filter on TSL but run the TSL filter instead
susannasiebert Apr 15, 2024
4630954
Pre-filter transcripts on new --biotypes parameter for processing in …
susannasiebert Apr 16, 2024
475ee12
Ensure that problematic amino acid identification works for pVACsplice
susannasiebert Apr 16, 2024
5871c98
Update pVACsplice aggregate report tiering, best peptide selection, a…
susannasiebert Apr 17, 2024
bde991b
Add standalone filter commands to pVACsplice
susannasiebert Apr 24, 2024
131619a
Add standalone helper commands
susannasiebert May 10, 2024
fe01c71
Remove unnecessary fields for pVACsplice VCF parsing
susannasiebert May 16, 2024
0fbaff4
Enable pass-only filter in pVACsplice
susannasiebert May 17, 2024
c216c60
Also apply biotype prefiltering to the PvacspliceVcfConverter
susannasiebert May 17, 2024
e144a54
Implement standalone pvacsplice generate_protein_fasta_command
susannasiebert May 21, 2024
3aa9139
Update mock calls for pVACseq tests
susannasiebert May 22, 2024
c2b73d3
Fix issue with pVACsplice calculate_reference_proteome_similarity step
susannasiebert May 22, 2024
5b7f999
Add pipeline tests for pVACsplice
susannasiebert May 22, 2024
2577663
Removed unused code
susannasiebert May 23, 2024
7e27f87
Create pVACsplice example data
susannasiebert May 24, 2024
9a0832a
Fix minor bugs
susannasiebert May 24, 2024
aa4e0e6
Remove pvacsplice install_vep_plugin command since VEP plugin annotat…
susannasiebert May 29, 2024
ff3ac39
Add example data files to setup.py
susannasiebert May 30, 2024
7e64160
Update pVACsplice documentation
susannasiebert May 30, 2024
678f490
Fix problem with docs formatting
susannasiebert Jun 21, 2024
80ffee7
Merge branch 'staging' into staging
susannasiebert Aug 21, 2024
1163c7e
add suggestions to pvacsplice documetation
mhoang22 Dec 10, 2024
d67de9b
edit pvacsplice documentation: add description for Junction Name and …
mhoang22 Dec 10, 2024
20a1f81
Merge remote-tracking branch 'origin/staging' into pvacsplice
susannasiebert Dec 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ build/
dist/
.DS_Store
docs/.DS_Store
.idea/
4 changes: 4 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ tools:
**pVACfuse**
A tool for detecting neoantigens resulting from gene fusions.

**pVACsplice**
A tool for detecting neoantigens resulting from splice site variants.

**pVACvector**
A tool designed to aid specifically in the construction of DNA-based
cancer vaccines.
Expand All @@ -35,6 +38,7 @@ Contents
pvacseq
pvacbind
pvacfuse
pvacsplice
pvacvector
pvacview

Expand Down
13 changes: 5 additions & 8 deletions docs/pvacfuse/output_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,8 @@ created):

* - File Name
- Description
* - ``<sample_name>.tsv``
- An intermediate file with variant and transcript information parsed from the input file(s).
* - ``<sample_name>.tsv_<chunks>`` (multiple)
- The above file but split into smaller chunks for easier processing with IEDB.
* - ``<sample_name>.fasta``
- A fasta file with mutant peptide subsequences for all
processable fusion combinations.
* - ``<sample_name>.net_chop.fa``
- A fasta file with mutant peptide subsequences specific for use in running the net_chop tool.
- A fasta file with mutant peptide subsequences for each fusion.
* - ``<sample_name>.all_epitopes.tsv``
- A list of all predicted epitopes and their binding affinity scores, with
additional variant information from the ``<sample_name>.tsv``.
Expand All @@ -43,6 +36,10 @@ created):
* - ``<sample_name>.all_epitopes.aggregated.tsv.reference_matches`` (optional)
- A file outlining details of reference proteome matches

Additionally, each folder will contain subfolders, one for each selected
epitope length, that contains intermediate files that are specific to each
epitope length.

Filters applied to the filtered.tsv file
----------------------------------------

Expand Down
9 changes: 7 additions & 2 deletions docs/pvacseq/optional_downstream_analysis_tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ section of the documentation on how to create this VCF.

The output may be limited to PASS variants only by setting the ``--pass`` only
flag and to mutant sequences by setting the ``--mutant-only`` flag.
Additionally, variants can be limited to specific transcript biotypes
using the ``--biotypes`` parameters, which is set to only include ``protein_coding``
transcripts by default.

The output can be further limited to only certain variants by providing
a pVACseq report file to the ``--input-tsv`` argument. Only the peptide sequences for the epitopes in the TSV
Expand Down Expand Up @@ -93,7 +96,8 @@ TSV. In its output, it adds to the TSV 3 columns: Best Cleavage Position, Best
Cleavage Sites list. Typically this step is done in the pVACseq run pipeline for the filtered output TSV
when specified. This tool provides a way to manually run this on pVACseq's generated filtered/all_epitopes
TSV files so that you can add this information when not present if desired.
You can view more about these columns for pVACseq in

You can view more information about these columns for pVACseq in
the :ref:`output file documentation <all_ep_and_filtered>`.

NetMHCStab Predict Stability
Expand All @@ -106,7 +110,8 @@ filtered/all_epitopes TSV. In its output, it adds to the TSV 4 columns: Predict
Stability Rank, and NetMHCStab Allele. Typically this step is done in the pVACseq run pipeline for the
filtered output TSV when specified. This tool provides a way to manually run this on pVACseq's generated
filtered/all_epitopes TSV files so that you can add this information when not present if desired.
You can view more about these columns for pVACseq in

You can view more information about these columns for pVACseq in
the :ref:`output file documentation <all_ep_and_filtered>`.

Identify Problematic Amino Acids
Expand Down
16 changes: 16 additions & 0 deletions docs/pvacsplice.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
pVACsplice
==========

pVACsplice predicts neoantigens for novel junctions created from tumor-specific alternative splicing patterns.

.. toctree::
:glob:

pvacsplice/features
pvacsplice/input_file_prep
pvacsplice/getting_started
pvacsplice/run
pvacsplice/output_files
pvacsplice/filter_commands
pvacsplice/additional_commands
pvacsplice/optional_downstream_analysis_tools
27 changes: 27 additions & 0 deletions docs/pvacsplice/additional_commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACseq logo

Additional Commands
===================

To make using pVACsplice easier, several convenience methods are included in the package.

.. _pvacsplice_example_data:

Download Example Data
---------------------

.. program-output:: pvacsplice download_example_data -h

.. _pvacsplice_valid_alleles:

List Valid Alleles
------------------

.. program-output:: pvacsplice valid_alleles -h

List Allele-Specific Cutoffs
----------------------------

.. program-output:: pvacsplice allele_specific_cutoffs -h
140 changes: 140 additions & 0 deletions docs/pvacsplice/features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACsplice logo

Features
========

**Splice Site Analysis**

pVACsplice offers epitope binding predictions for splice site variants
predicted by RegTools.

**No local install of epitope prediction software needed**

pVACsplice utilizes the IEDB RESTful web interface. This means that none of the underlying prediction software, like NetMHC, needs to be installed locally.

.. warning::
We only recommend using the RESTful API for small requests. If you use the
RESTful API to process large VCFs or to make predictions for many alleles,
epitope lengths, or prediction algorithms, you might overload their system.
This can result in the blacklisting of your IP address by IEDB, causing
403 errors when trying to use the RESTful API. In that case please open
a ticket with `IEDB support <http://help.iedb.org/>`_ to have your IP
address removed from the IEDB blacklist.

**Support for local installation of the IEDB Analysis Resources**

pVACsplice provides the option of using a local installation of the IEDB MHC
`class I <http://tools.iedb.org/mhci/download/>`_ and `class II <http://tools.iedb.org/mhcii/download/>`_
binding prediction tools.

.. warning::
Using a local IEDB installation is strongly recommended for larger datasets
or when the making predictions for many alleles, epitope lengths, or
prediction algorithms. More information on how to install IEDB locally can
be found on the :ref:`Installation <iedb_install>` page (note: the pvactools
docker image now contains IEDB).

**MHC Class I and Class II predictions**

Both MHC Class I and Class II predictions are supported. Simply choose the desired
prediction algorithms and HLA alleles during processing and Class I and Class II
prediction results will be written to their own respective subdirectories in your
output directory. pVACsplice supports binding affinity algorithms as well as elution
algortihms.

By using the IEDB RESTful web interface, pVACsplice leverages their extensive support of different prediction algorithms.

In addition to IEDB-supported prediction algorithms, we've also added support
for `MHCflurry <http://www.biorxiv.org/content/early/2017/08/09/174243>`_ and
`MHCnuggets <http://karchinlab.org/apps/appMHCnuggets.html>`_.

================================================= ======= ========================
MHC Class I Binding Affinity Prediction Algorithm Version Supports Percentile Rank
================================================= ======= ========================
NetMHCpan 4.1 yes
NetMHC 4.0 yes
NetMHCcons 1.1 yes
PickPocket 1.1 yes
SMM 1.0 yes
SMMPMBEC 1.0 yes
MHCflurry yes
MHCnuggets no
================================================= ======= ========================

================================================== ======= ========================
MHC Class II Binding Affinity Prediction Algorithm Version Supports Percentile Rank
================================================== ======= ========================
NetMHCIIpan 4.1 yes
SMMalign 1.1 yes
NNalign 2.3 yes
MHCnuggets no
================================================== ======= ========================

======================================== ======= ========================
MHC Class I Elution Prediction Algorithm Version Supports Percentile Rank
======================================== ======= ========================
NetMHCpanEL 4.1 yes
MHCflurryEL | Processing Score: no;
| Presentation Score: yes
BigMHC_EL no
======================================== ======= ========================

========================================= ======= ========================
MHC Class II Elution Prediction Algorithm Version Supports Percentile Rank
========================================= ======= ========================
NetMHCIIpanEL 4.1 yes
========================================= ======= ========================

=============================================== ======= ========================
MHC Class I Immunogenicity Prediction Algorithm Version Supports Percentile Rank
=============================================== ======= ========================
BigMHC_IM no
DeepImmuno no
=============================================== ======= ========================

**Comprehensive filtering**

Automatic filtering on the binding affinity ic50 (nm) value narrows down the results to only include
"good" candidate peptides. The binding filter threshold can be adjusted by the user for each
pVACsplice run. pVACsplice also support the option of filtering on allele-specific binding thresholds
as recommended by `IEDB <https://help.iedb.org/hc/en-us/articles/114094151811-Selecting-thresholds-cut-offs-for-MHC-class-I-and-II-binding-predictions>`_
as well as percentile ranks.
Additional filtering on the binding affitinity can be manually done by the user by running the
:ref:`standalone binding filter <pvacsplice_filter_commands>` on the filtered result file
to narrow down the candidate epitopes even further or on the unfiltered
all_epitopes file to apply different cutoffs.

Readcount and expression data are extracted from an annotated VCF to automatically filter with
adjustable thresholds on depth, VAF, and/or expression values. The user can also manually run
the :ref:`standalone coverage filter <pvacsplice_filter_commands>` to further narrow down their results
from the filtered output file.

pVACsplice will filter on the transcript support level to only keep high-confidence
transcripts of level 1. This filter can also be run :ref:`standalone
<pvacsplice_filter_commands>`.

As a last filtering step, pVACsplice applies the top score filter to only keep the top scoring epitope
for each variant. As with all previous filters, this filter can also be run
:ref:`standalone <pvacsplice_filter_commands>`. Please also see that section for more
details about how the top scoring epitope is determines.

**NetChop and NetMHCstab integration**

Cleavage position predictions are added with optional processing through NetChop.

Stability predictions can be added if desired by the user. These predictions are obtained via NetMHCstabpan.

**Reference proteome similarity analysis**

This optional feature will search for an epitope in the reference proteome
using BLAST or a reference proteome FASTA file to determine if the epitope occurs elsewhere in the proteome and
is, therefore, not tumor-specific.

**Problematic amino acids**

This optional feature allows users to specify a list of amino acids that would
be considered problematic to occur either everywhere or at specific positions
in a neoepitope. This can be useful when certain amino acids would be
problematic during peptide manufacturing.
110 changes: 110 additions & 0 deletions docs/pvacsplice/filter_commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACseq logo

.. _pvacsplice_filter_commands:

Filtering Commands
==================

pVACsplice currently offers four filters: a binding filter, a coverage filter,
a transcript support level filter, and a top score filter.

These filters are always run automatically as part
of the pVACsplice pipeline using default cutoffs.

All filters can also be run manually on the filtered.tsv file to narrow the results down further,
or they can be run on the all_epitopes.tsv file to apply different filtering thresholds.

The binding filter is used to remove neoantigen candidates that do not meet desired peptide:MHC binding criteria.
The coverage filter is used to remove variants that do not meet desired read count and VAF criteria (in normal DNA
and tumor DNA/RNA). The transcript support level filter is used to remove variant annotations based on low quality
transcript annotations. The top score filter is used to select the most promising peptide candidate for each variant.
Multiple candidate peptides from a single somatic variant can be caused by multiple peptide lengths, registers, HLA alleles,
and transcript annotations.

Further details on each of these filters is provided below.

.. note::

The default values for filtering thresholds are suggestions only. While they are based on review of the literature
and consultation with our clinical and immunology colleagues, your specific use case will determine the appropriate values.

Binding Filter
--------------

.. program-output:: pvacsplice binding_filter -h

The binding filter removes variants that don't pass the chosen binding threshold.
The user can chose whether to apply this filter to the ``lowest`` or the ``median`` binding
affinity score by setting the ``--top-score-metric`` flag. The ``lowest`` binding
affinity score is recorded in the ``Best MT IC50 Score`` column and represents the lowest
ic50 score of all prediction algorithms that were picked during the previous pVACseq run.
The ``median`` binding affinity score is recorded in the ``Median MT IC50 Score`` column and
corresponds to the median ic50 score of all prediction algorithms used to create the report.
Be default, the binding filter runs on the ``median`` binding affinity.

When the ``--allele-specific-binding-thresholds`` flag is set, binding cutoffs specific to each
prediction's HLA allele are used instead of the value set via the ``--binding-threshold`` parameters.
For HLA alleles where no allele-specific binding threshold is available, the
binding threshold is used as a fallback. Alleles with allele-specific
threshold as well as the value of those thresholds can be printed by executing
the ``pvacsplice allele_specific_cutoffs`` command.

In addition to being able to filter on the IC50 score columns, the binding
filter also offers the ability to filter on the percentile score using the
``--percentile-threshold`` parameter. When the ``--top-score-metric`` is set
to ``lowest``, this threshold is applied to the ``Best MT Percentile`` column. When
it is set to ``median``, the threshold is applied to the ``Median MT
Percentile`` column.

By default, entries with ``NA`` values will be included in the output. This
behavior can be turned off by using the ``--exclude-NAs`` flag.

Coverage Filter
---------------

.. program-output:: pvacsplice coverage_filter -h

If the pVACsplice input VCF contains readcount and/or expression annotations, then the coverage filter
can be run again on the filtered.tsv report file to narrow down the results even further.
You can also run this filter again on the all_epitopes.tsv report file to apply different cutoffs.

The general goals of these filters are to limit variants for neoepitope prediction to those
with good read support and/or remove possible sub-clonal variants. In some cases the input
VCF may have already been filtered in this fashion. This filter also allows for removal of
variants that do not have sufficient evidence of RNA expression.

For more details on how to prepare input VCFs that contain all of these annotations, refer to
the :ref:`pvacsplice_prerequisites_label` section for more information.

By default, entries with ``NA`` values will be included in the output. This
behavior can be turned off by using the ``--exclude-NAs`` flag.

Transcript Support Level Filter
-------------------------------

.. program-output:: pvacsplice transcript_support_level_filter -h

This filter is used to eliminate variant annotations based on poorly-supported transcripts. By default,
only transcripts with a `transcript support level (TSL) <https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl>`_
of <=1 are kept. This threshold can be adjusted using the ``--maximum-transcript-support-level``
parameter.

By default, entries with ``Not Supported`` values will be included in the output. These occur if VEP was run
without the ``--tsl`` flag or if data is aligned to GRCh37 or older.

Top Score Filter
----------------

.. program-output:: pvacsplice top_score_filter -h

This filter picks the top epitope for each splice site variant. The top epitope is
determined by first selecting epitopes with no Problematic Positions
and among those selecting the one with lowest median/best MT IC50 score for
each splice site variant

By default the ``--top-score-metric`` option is set to ``median`` which will apply this
filter to the ``Median MT IC50 Score`` column. If the ``--top-score-metric``
option is set to ``lowest``, the ``Best MT IC50 Score`` column is used
instead.
Loading
Loading