-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace mapping with de novo assembly #442
Comments
@jeff-k has been using de novo assembly to look at several samples that got strange results with the current MiCall pipeline. It looks like one advantage of the technique will be that we can distinguish between these two scenarios:
With the current MiCall pipeline, both of those scenarios just look like lousy mapping with gaps in coverage. We propose this plan for a full MiCall pipeline that includes de novo assembly:
As you can see, this just affects the One risk is that the de novo assembly step might be much slower in some cases than the remap step is currently. |
Also convert utils/dd.py to Python 3.
Remove G2P step for now. It needs to be added back. Also switch from Python 3.4 to 3.6.
This version is just a copy of the original from the iva project. That should make it easier to apply patches if the iva project makes changes.
For now, just pick the most common length of merged pairs, and calculate a simple consensus sequence from all of those. Import IVA, instead of making a subprocess call. Move the Blast database to a new folder, and construct it out of the projects configuration data.
They caused a TypeError in iva assembly.
Also add merge-mates to Singularity.
Seems to use a lot of memory, and needs split count set carefully.
Add an image comparison to the unit tests.
Also stop merging contigs into the full-genome reference.
Include contigs with merged seeds in genome coverage diagram.
Include position offset in BLAST results.
Some of them stopped working when we stopped merging contigs into a full-genome reference.
Also change consensus comparison in release tests. Merging multiple contigs loses the original nucleotide positions.
We currently use bowtie2 to map reads to a large set of reference sequences. For most samples, it works well. However, we have had some problems with reference drift (#290), calling HCV subtypes (#436), insertion and deletion positions (#398), and samples that produce different results when you rerun the mapping (#405).
We'd like to experiment with using de novo assembly instead of mapping.
Smith-WatermanGotoh to align all the contigs onto the referencesuse Smith-Waterman to align all the primers onto all the contigsmoved to Display primers on contig coverage diagram #478.cascade.csv
use denovo pipeline as a backup for denovo combined pipeline in Kive watchernuc_detail.csv
combineNot needed after embedding contigs in ref.amino_detail.csv
andnuc_detail.csv
by seed groups, not by seeds. For example, sample 1693-1IN2C2-HIV_S16 from the 09-Aug-2019.M01841 run.contig_coverage
files togenome_coverage
, and produce them from both the denovo version and the mapped versiongenome_coverage.svg
files in a separate folder from the other coverage mapsde novo assembly is very slow for some samplesIVA seems better than savage.should G2P continue to use merged reads, or should it switch to aligned reads?moved to Adapt G2P to de novo assembly? #481should we try to report V3LOOP overlap again?moved to Adapt G2P to de novo assembly? #481micall_basespace.py
should we make the contig coverage diagram match what we used to cut up the gene regions? 73051ANS5A1-HCV-NS5a_S89 from 15-Jul-2016.M01841 is an example where they don't match.moved to Make coverage maps consistent with contigs coverage plot #479.use BLAST results to assemble contigs into a full reference? Haven't found any clear cases where they should be combined. HIV3428P100IN200-C19-HIV-S51 from 20 Sep 2019 run is the closest, but it looks like one contig has primer at the end. Samples HIV0887-P2D21-HIV_S3 and HIV0887-P2C12-HIV_S32 from 30 Aug 2019 looks even better, but have very little overlap. Some of the HCV samples look more promising: 73060A-HCV_S46 from 15 Jul 2016, for example.Moved to issue Merge assembled contigs? #484.amino_details.csv
and combination intoamino.csv
deal with HIV references that don't reach 5' and 3' ends or bring back the refs we removedMoved to issue Merge assembled contigs? #484.check for similar problems with other seed groupsMoved to issue Merge assembled contigs? #484.The text was updated successfully, but these errors were encountered: