-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRGTdb #6
TRGTdb #6
Conversation
mainly placeholder stuff
leveraging bit symmetry to enable decoding compliment strand
By creating the database as a collection of independent parquet files, we 1. ensure we're getting the columnar compression 2. enable fast de-identification by simply removing trgtdb/sample*.pq
Need to do lots of testing now and I need to clean up the code
It runs. Now gotta check if its valid
should be able to `trgt db create -o consolidated.tdb input1.vcf input2.vcf.gz input3.tdb ...` where we consolidate into a new location. Separately there is the command `append dest.tdb input.[tdb|vcf]` where we consolidate into an existing database.
getting commands to work and filling out library. Need to start making functional tests on this to ensure its working Then can move on to multi-sample vcf/tdb. Then will be finished with alpha
simpler allele table building now handles multi sample vcfs, tdb
To test if letting gzip/parquet do the compression helps
parquet was truncating it for some reason
It will make it a tiny bit smaller, but something weird is happening during consolidation
it compresses better, but the array is unhashable.
can't assume it works, yet
Now time to clean
removing debug cruft from what is now jaccard new query
locus_ji has parameters on the query. Still need to figure out how to best expose query parameters to the CLI
Made the allele count queries a little more useful with sample subsetting
cleaning tdb_tutorial - might want to remove that for Introduction.ipynb
This change needs to be tested. Also updating the notebook formerly known as ProbandOnly
Making it a little cleaner and correcting an experiment
Hi @ACEnglish, I got an issue runing the following cmd
The error message is
Do you have any preprocessing step for importing trgt output to trgtdb ? my trgt cmd is
|
Database tool has been refactored and placed into a repository at https://github.com/ACEnglish/tdb. @zqfang - Please try from that repository and if the error still happens, open a ticket there. |
Adding code for coverting a TRGT output VCF into a database. See
tdb_tutorial.md
for usage details.TODOs:
truvari.vcf2df
. Until v4.0 is released, truvari will need to be manually installed. After truvari v4.0 is cut, we can simply uncomment the line fromtrgt/setup.py
that installs it (line 31).trgt.database.dbutils.pull_saps
assumes the allele length range is stored in the vcf asFORMAT/ALLR
. However trgt v0.3.4 writesFORMAT/ALCI
. Therefore, this code isn't compatible with trgt v0.3.4.trgt.__main__
for wrapping the trgt main executable. If we want to distribute trgt with a single command line interface (e.g.trgt run
,trgt viz
,trgt db
), we'll need to place the executables into the repository, updateMANIFEST.in
to package those executables, and then make external calls fromrun_main
(e.g.Popen(os.path.join(trgt.__file__, 'bin', 'trgt')
)