-
Notifications
You must be signed in to change notification settings - Fork 98
MongoDB Variant Storage
The _id
key must allow sort results with indexes. To do this, the key must be sortable lexicographically. This key is used in both variant and stage collections.
The key is a concatenation of chromosome, position, reference and alternate separated by colon.
CHR:POS:REF:ALT
Where:
- CHR starts with " " if it's a single number chromosome, to sort 2 digits chromosomes.
- POS has a left padding of 10 positions
- REF and ALT are a SHA1 of the original allele if is bigger than Variant#SV_THRESHOLD
Example:
Variant | _id |
---|---|
22 156 A T | 22:.......156:A:T |
3 56789 CACA - | .3:.....56789:CACA: |
X 68432 - GCC | X:.....68432::GCC |
* spaces has been replaced with dots
Each document in this collection represents a variant. Depending if the variant have been moved already to the Variants collection, this objects will contain also the compressed variant.
The variants information is grouped by studies, where the key is the studyId
. Each file will be stored inside with the fileId
as key. If may happen that a file, for the same variant, have more than one variant. The duplicated variants are stored as an array of variants.
{
"_id" : "22: 123456:A:T",
"end" : 123456,
"ref" : "A",
"alt" : "T",
"3" : {
"4" : ["BinData"]
}
}
Once the file is moved to the Variants collection, the content is removed (set to null) and a new flag "new : false" is added.
{
"_id" : "22: 123456:A:T",
"end" : 123456,
"ref" : "A",
"alt" : "T",
"3" : {
"4" : null,
"new" : false,
}
}
{
"_id" : "22: 123456:A:T",
"chromosome" : "1",
"start" : 123456,
"end" : 123456,
"reference" : "A",
"alternate" : "T",
"length" : 1,
"type" : "SNV",
"_at" : {
"chunkIds" : [
"22_123_1k",
"22_12_10k"
]
},
"studies" : [
{
"sid" : 3,
"gt": {
"0|1" : [54, 78, 254, 623],
"1|1" : [84, 89, 156],
"?/?" : [110,111,112,113,114,115,116,117,118,119,120]
},
"files" : [
{
"fid" : 4,
"attrs" : {}
}, {
"fid" : 5,
"attrs" : {}
}
]
}
],
"stats" : [ {
"sid" : 3,
"cid": 6,
"maf": 0.00638977624475956,
"mgf": 0,
"mafAl": "T",
"mgfGf": "1|1",
"missAl": 0,
"missGt": 0,
"numGt": {
"0/0" : 562,
"1|1" : 3,
"0|1" : 4,
}
} ],
"annotation" : [ {
"id" : "?",
"ct" : [
{
"so" : [ 1628 ]
} , {
"so" : [ 1566 ]
}
],
"cr_score" : [
{
"sc" : 0.8619999885559082,
"src" : "gerp"
} , {
"sc" : 0.004999999888241291,
"src" : "phastCons"
} , {
"sc" : 0.11299999803304672,
"src" : "phylop"
}
],
"popFq" : [
{
"study" : "1000GENOMES_phase_3",
"pop" : "ALL",
"refFq" : 0.9986000061035156,
"altFq" : 0.0006000000284984708
} , {
"study" : "1000GENOMES_phase_3",
"pop" : "EAS",
"refFq" : 0.9970200061798096,
"altFq" : 0.0029800001066178083
} , {
"study" : "1000GENOMES_phase_3",
"pop" : "EUR",
"refFq" : 0.998009979724884,
"altFq" : 0
}
],
...
} ],
"customAnnotation" : {
}
}
OpenCGA is an open source project and it is freely available.
General
- Home
- Architecture
- Data Models
- RESTful Web Services
- Configuration
- Download and Installation
- Tutorials
OpenCGA Catalog
OpenCGA Storage
About