Add reference sequences for SARS-CoV-2 #549

donkirkby · 2020-03-27T20:57:30Z

Support SARS-CoV-2 samples by adding the seed reference and other details.

Add seed reference.
Add coordinate references for gene regions, plus key positions if needed.
Add landmarks for genome coverage map, based on gene regions.
Handle duplicated base 13468.
Add conseq_all.csv with a minimum coverage of 1, and only MAX cutoff.
Compare consensus sequences with published results for the same samples.
~~Handle duplicated base when aligning deletions.~~ Split to Align deletions after frame shift #558.

The text was updated successfully, but these errors were encountered:

ArtPoon · 2020-03-27T21:02:51Z

I already did this for the BC CDC - want the JSON?

donkirkby · 2020-03-27T21:30:19Z

Thanks, Art. Much appreciated!

Support FASTQ files from the Sequence Read Archive.

donkirkby · 2020-04-03T23:46:30Z

Related conversation at PoonLab#42.

rhliang · 2020-04-08T17:47:03Z

Note to self: look at

line 827 of aln2counts.py (this is the get_consensus_rows method)
line 373 (where it calls self.write_consensus)

Add microtest sample for SARS. Extend orf1a to orf1ab.

donkirkby · 2020-04-10T14:38:00Z

List of samples to download and the toolkit to download with.

Find more samples from SRA by searching for "Severe acute respiratory syndrome-related coronavirus"[orgn:__txid694009]. You can filter by platform, and there are currently 466 Illumina records.

donkirkby · 2020-04-15T19:00:10Z

Comparing consensus sequences to published results for my first 12 samples:

SRA accession	GISAID accession
SRR10903401
SRR10903402
SRR11092056
SRR11092057
SRR11092058
SRR11092064
SRR11140744
SRR11140746	EPI_ISL_408670
SRR11140748
SRR11140750
SRR11177792
SRR11314339

ArtPoon · 2020-04-15T19:02:02Z

You can see our work in progress here (with SRR numbers matched to published accession numbers):
https://github.com/PoonLab/sam2conseq/wiki

I'd be really curious to see what you get!

donkirkby · 2020-04-15T19:05:03Z

Thanks, @ArtPoon, that will save me a bunch of time. I'll let you know what we find.

Part of #549.

donkirkby · 2020-04-21T20:34:36Z

Here's a summary of our results compared to the matches identified by @ArtPoon. (How did you find those, Art? ~~Any suggestions on where to find EPI_ISL_408670? It doesn't show up when I search NCBI.~~)

Run	Compared to	Differences
SRR10903401-SARS_S1	MN988669.1	Very good: 12 mismatches in the first 24 bases under low coverage, and 21 extra A's at the end out of 29881.
SRR10903402-SARS_S2	MN988668.1	Almost perfect: 21 extra A's at the end out of 29881.
SRR11092056-SARS_S3	MN996530	Bad: 899 mismatches, 17761 missing, and 217 added out of 29854.
SRR11092057-SARS_S4	MN996528.1	Very good: 4 mismatches, 33 missing, and 12 added out of 29891. Missing 14 at the start, a gap of 15 with no coverage at 5397, plus 4 single gaps of no coverage within 20 bases. The mismatches are all in low coverage, 3 are mixtures when coverage is 2. 12 extra A's at the end..
SRR11092058-SARS_S5	MN996527.1	Bad: lots of sections with no coverage. 38 mismatches, 7606 missing, and 26 added out of 29825.
SRR11092064-SARS_S6	MN996531.1	Bad: lots of sections with no coverage. 24 mismatches, 4667 missing, and 33 added out of 29857.
SRR11140744-SARS_S7	EPI_ISL_408670	Almost perfect: 28 missing from the start, and poly-A tail replaced with ACAGATATATACGCC out of 29879.
SRR11140746-SARS_S8	EPI_ISL_408670	Almost perfect: poly-A tail replaced with AATAWMAACAAACAGAGCCTAAAAAGGACAAAA4 out of 29879.
SRR11140748-SARS_S9	EPI_ISL_408670	Almost perfect: 6 missing from poly-A tail out of 29879.
SRR11140750-SARS_S10	EPI_ISL_408670	Almost perfect: 9 missing from the start, and poly-A tail replaced with ACAATTGCAACAATC out of 29879.
SRR11177792-SARS_S11	MT072688	Almost perfect: 57 added out of 29811. A few added to start, most added at end: AGTGCTGAG + poly-A tail.
SRR11314339-SARS_S12	MT192765	Almost perfect: 38 added out of 29829. A few added to start, most added at end: CCATGTGATTTTAATAG + poly-A tail.

donkirkby · 2020-04-21T20:50:25Z

I forgot that I already had EPI_ISL_408670 downloaded. It came from GISAID. (Click on Browse, and search by accession id.)

ArtPoon · 2020-04-21T21:16:55Z

I queried the SRR number in the NCBI SRA database to get the sample description and then searched for a similar description in the GISAID annotations. Not perfect, I know.

donkirkby · 2020-04-21T23:24:07Z

Closing issue, now that basic reporting is working.

donkirkby added the enhancement label Mar 27, 2020

donkirkby added this to the 7.13 milestone Mar 27, 2020

donkirkby changed the title ~~Add reference sequences for COVID-19~~ Add reference sequences for SARS-CoV-2 Apr 3, 2020

donkirkby added a commit that referenced this issue Apr 3, 2020

Add SARS-CoV-2 sequences, as part of #549.

1d1bb9b

Support FASTQ files from the Sequence Read Archive.

donkirkby mentioned this issue Apr 3, 2020

Consider switching from cutadapt to pTrimmer #552

Closed

6 tasks

donkirkby added a commit that referenced this issue Apr 9, 2020

Handle duplicated nucleotide in SARS-CoV-2, for #549.

987cbb5

Add microtest sample for SARS. Extend orf1a to orf1ab.

donkirkby added a commit that referenced this issue Apr 10, 2020

Add nsp gene regions, as part of #549.

be884aa

donkirkby mentioned this issue Apr 15, 2020

Test more SARS-CoV-2 samples #555

Open

rhliang pushed a commit that referenced this issue Apr 15, 2020

Added methods to write a conseq_all.csv file as per #549.

ecf3bd0

rhliang pushed a commit that referenced this issue Apr 15, 2020

aln2counts.py now outputs a conseq_all.csv file as per #549.

7dd0bc6

donkirkby added a commit that referenced this issue Apr 16, 2020

Fix region lengths in project_scoring.json, as part of #549.

7ea42f8

donkirkby mentioned this issue Apr 17, 2020

Consider replacing Gotoh alignment algorithm #556

Open

3 tasks

donkirkby added a commit that referenced this issue Apr 20, 2020

Start comparing consensus sequences to published results.

090d9c4

Part of #549.

donkirkby added a commit that referenced this issue Apr 21, 2020

Add summary to conseq_compare.py as part of #549.

ce80885

donkirkby mentioned this issue Apr 21, 2020

Align deletions after frame shift #558

Open

donkirkby closed this as completed Apr 21, 2020

donkirkby mentioned this issue Apr 28, 2020

Docker version freezes #561

Closed

donkirkby added a commit that referenced this issue Aug 18, 2020

Add conseq_all.csv to micall_kive.py, for #549.

c0a6682

donkirkby added a commit that referenced this issue Aug 18, 2020

Add conseq_all.csv to microtests, for #549.

34eaa26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reference sequences for SARS-CoV-2 #549

Add reference sequences for SARS-CoV-2 #549

donkirkby commented Mar 27, 2020 •

edited

Loading

ArtPoon commented Mar 27, 2020

donkirkby commented Mar 27, 2020

donkirkby commented Apr 3, 2020

rhliang commented Apr 8, 2020

donkirkby commented Apr 10, 2020 •

edited

Loading

donkirkby commented Apr 15, 2020

ArtPoon commented Apr 15, 2020

donkirkby commented Apr 15, 2020

donkirkby commented Apr 21, 2020 •

edited

Loading

donkirkby commented Apr 21, 2020

ArtPoon commented Apr 21, 2020

donkirkby commented Apr 21, 2020

Add reference sequences for SARS-CoV-2 #549

Add reference sequences for SARS-CoV-2 #549

Comments

donkirkby commented Mar 27, 2020 • edited Loading

ArtPoon commented Mar 27, 2020

donkirkby commented Mar 27, 2020

donkirkby commented Apr 3, 2020

rhliang commented Apr 8, 2020

donkirkby commented Apr 10, 2020 • edited Loading

donkirkby commented Apr 15, 2020

ArtPoon commented Apr 15, 2020

donkirkby commented Apr 15, 2020

donkirkby commented Apr 21, 2020 • edited Loading

donkirkby commented Apr 21, 2020

ArtPoon commented Apr 21, 2020

donkirkby commented Apr 21, 2020

donkirkby commented Mar 27, 2020 •

edited

Loading

donkirkby commented Apr 10, 2020 •

edited

Loading

donkirkby commented Apr 21, 2020 •

edited

Loading