You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copied this over from #131.
Once the beacon integration goes live and sees some use, the limitations described here should be revisited.
The import method uses the following info fields
AN will map to callCount in the beacon DB. Has 2 * num_called as a fallback (num_called is calculated from VCF)
AF will map to frequency in the beacon DB. Has AC / AN as a fallback
VT will map to varianttype in the beacon DB. Database field is nullable, so it still imports fine without this
AC will map to alleleCount in the beacon DB. Will break the import when missing (for this dataset) - I added a line that AC is required.
There is an option min_ac for filtering out variants that were seen less than a minimal amount (1 by default). I currently set this to 0 - setting this to anything higher than 0 will also break the import for anything that does not contain VT (and maybe others too)
The import is a bit "python-esc" 😅
It has an _unpack method, that reads the INFO fields into nested lists.
While inserting variants list entries are just accessed by indices, leading to "index out of bounds" exceptions whenever something is not set.
There is a try/catch block around the whole for each variant in variants loop that catches these exceptions, cancelling 1000 variant imports a pop.
There is also a whole other block that is calling for SVTYPE and MATEID info field. I just never had any data with variant.is_sv == true
On another note, the same variant is never added twice duo to ON CONFLICT (datasetId, chromosome, start, reference, alternate) DO NOTHING.
In an ideal world we would increment sample and allele counts and recalculate the allele frequency.
But I´d argue that its not that big of a deal, since the datasets uploaded by users are arbitrary and therefore allele frequency across this data has not much meaning anyway.
TL;DR;
Had to add AC info field as a requirement.
The import routine that comes with beacon-python was written for a specific kind of dataset. It does the job for now, but if the feature sees some use we will write our own importer to handle all kinds of data (as suggested in the docs).
The text was updated successfully, but these errors were encountered:
Copied this over from #131.
Once the beacon integration goes live and sees some use, the limitations described here should be revisited.
The import method uses the following info fields
2 * num_called
as a fallback (num_called is calculated from VCF)AC / AN
as a fallbackThere is an option
min_ac
for filtering out variants that were seen less than a minimal amount (1 by default). I currently set this to 0 - setting this to anything higher than 0 will also break the import for anything that does not contain VT (and maybe others too)The import is a bit "python-esc" 😅
It has an _unpack method, that reads the INFO fields into nested lists.
While inserting variants list entries are just accessed by indices, leading to "index out of bounds" exceptions whenever something is not set.
There is a
try/catch
block around the wholefor each variant in variants
loop that catches these exceptions, cancelling 1000 variant imports a pop.There is also a whole other block that is calling for SVTYPE and MATEID info field. I just never had any data with
variant.is_sv == true
On another note, the same variant is never added twice duo to
ON CONFLICT (datasetId, chromosome, start, reference, alternate) DO NOTHING
.In an ideal world we would increment sample and allele counts and recalculate the allele frequency.
But I´d argue that its not that big of a deal, since the datasets uploaded by users are arbitrary and therefore allele frequency across this data has not much meaning anyway.
TL;DR;
Had to add AC info field as a requirement.
The import routine that comes with beacon-python was written for a specific kind of dataset. It does the job for now, but if the feature sees some use we will write our own importer to handle all kinds of data (as suggested in the docs).
The text was updated successfully, but these errors were encountered: