Skip to content

Commit

Permalink
Merge pull request #219 from nextstrain/mpox-update-2024-07
Browse files Browse the repository at this point in the history
Add mpox clade I dataset
  • Loading branch information
corneliusroemer authored Aug 1, 2024
2 parents d5cdf78 + 4a70e42 commit 359b1c8
Show file tree
Hide file tree
Showing 18 changed files with 33,327 additions and 18,550 deletions.
1 change: 1 addition & 0 deletions data/nextstrain/collection.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
"nextstrain/rsv/a/EPI_ISL_412866",
"nextstrain/rsv/b/EPI_ISL_1653999",
"nextstrain/mpox/all-clades",
"nextstrain/mpox/clade-i",
"nextstrain/mpox/clade-iib",
"nextstrain/mpox/lineage-b.1",
"nextstrain/flu/h3n2/pb1",
Expand Down
3 changes: 3 additions & 0 deletions data/nextstrain/mpox/clade-i/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Unreleased

Initial release of this dataset.
28 changes: 28 additions & 0 deletions data/nextstrain/mpox/clade-i/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Nextclade dataset for "Mpox virus (Clade I)"

| Key | Value |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| authors | [Cornelius Roemer](https://neherlab.org), [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) |
| data source | Genbank |
| workflow | [github.com/nextstrain/mpox/nextclade](https://github.com/nextstrain/mpox/nextclade) |
| nextclade dataset path | nextstrain/mpox/clade-i |
| reference | [DQ011155.1](https://www.ncbi.nlm.nih.gov/nuccore/DQ011155.1), isolate `Zaire_1979-005`, an early complete clade I sequence |
| annotation | based on [DQ011155.1](https://www.ncbi.nlm.nih.gov/nuccore/DQ011155.1), but with genes called by modern names (OPGXXX) |
| clade definitions | [github.com/mpxv-lineages/lineage-designation](https://github.com/mpxv-lineages/lineage-designation) |
| related datasets | Mpox virus (All clades): `nextstrain/mpox/all-clades`<br>Mpox virus (clade IIb) `nextstrain/mpox/clade-iib`<br>Mpox virus (Lineage B.1 within clade IIb) `nextstrain/mpox/lineage-b.1` |

## Scope of this dataset

This dataset is for Mpox viruses of clade I (Ia and Ib). A broader dataset for all clades I, IIa and IIb is available under `nextstrain/mpox/all-clades`.

## Reference sequence and reference tree

The reference used in this dataset is [DQ011155.1](https://www.ncbi.nlm.nih.gov/nuccore/DQ011155.1), an early complete clade I sequence (Isolate `Zaire_1979-005`).

This is in contrast to the reference used in the other Nextclade mpox datasets, which use a clade IIb reference sequence.

The reference tree consists of all good quality clade I sequences available within Genbank at the time of dataset creation (with identical sequences deduplicated to 1), as well as 3 outgroup genomes (a reconstructed ancestor of all clades, and one sequence for each of clade IIa and clade IIb).

## Further reading

Read more about Nextclade datasets in the Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
381 changes: 381 additions & 0 deletions data/nextstrain/mpox/clade-i/genome_annotation.gff3

Large diffs are not rendered by default.

75 changes: 75 additions & 0 deletions data/nextstrain/mpox/clade-i/pathogen.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
{
"alignmentParams": {
"excessBandwidth": 100,
"terminalBandwidth": 300,
"allowedMismatches": 8,
"windowSize": 40,
"minSeedCover": 0.1,
"gapAlignmentSide": "left"
},
"attributes": {
"name": "Mpox virus (Clade I)",
"reference accession": "DQ011155.1",
"reference name": "Zaire_1979-005"
},
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
},
"deprecated": false,
"enabled": true,
"experimental": false,
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"official": true,
"qc": {
"frameShifts": {
"enabled": true,
"ignoredFrameShifts": [
],
"scoreWeight": 20
},
"missingData": {
"enabled": true,
"missingDataThreshold": 20000,
"scoreBias": 1000
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 40
},
"privateMutations": {
"cutoff": 50,
"enabled": true,
"typical": 5,
"weightLabeledSubstitutions": 6,
"weightReversionSubstitutions": 6,
"weightUnlabeledSubstitutions": 1
},
"snpClusters": {
"clusterCutOff": 5,
"enabled": true,
"scoreWeight": 10,
"windowSize": 1000
},
"stopCodons": {
"enabled": true,
"ignoredStopCodons": [
],
"scoreWeight": 40
}
},
"schemaVersion": "3.0.0",
"shortcuts": [
],
"version": {
"tag": "unreleased"
}
}
2 changes: 2 additions & 0 deletions data/nextstrain/mpox/clade-i/reference.fasta

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions data/nextstrain/mpox/clade-i/sequences.fasta

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data/nextstrain/mpox/clade-i/tree.json

Large diffs are not rendered by default.

45 changes: 45 additions & 0 deletions data_output/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -1748,6 +1748,51 @@
}
}
},
{
"path": "nextstrain/mpox/clade-i",
"enabled": true,
"attributes": {
"name": "Mpox virus (Clade I)",
"reference accession": "DQ011155.1",
"reference name": "Zaire_1979-005"
},
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"capabilities": {
"clades": 4,
"qc": [
"frameShifts",
"missingData",
"mixedSites",
"privateMutations",
"snpClusters",
"stopCodons"
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
}
],
"version": {
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
}
},
{
"path": "nextstrain/mpox/clade-iib",
"shortcuts": [
Expand Down
Loading

0 comments on commit 359b1c8

Please sign in to comment.