-
Notifications
You must be signed in to change notification settings - Fork 10
study add
Use the biograph vdb study add
command to add variants to a study. Variant entries may come from imported VCFs or from other studies. Any number of sample names, aids, or wildcards may be specified. If a sample name is specified, all VCFs matching that sample name will be added. Use aids to specify specify specific VCFs.
$ biograph vdb study add my_study HG002 HG003 9a174215-8fc5-4c6f-bc4b-134654f65b99
Matching VCFs:
HG002: 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c
HG003: a98626d6-758b-468d-add9-fbfbac47d207
HG004: 9a174215-8fc5-4c6f-bc4b-134654f65b99
Adding 13,538,956 variants from 3 VCFs to study my_study
All variants in a study must be called against the same genetic reference. Attempting to add variants that were not called on the same reference will cause biograph to exit with an error.
$ biograph vdb study add my_study HG002_hs37d5
Matching VCFs:
HG002_hs37d5: 7df5a719-47ed-4cb8-be1b-b4c09864fc6f
Study my_study uses reference grch38, but the specified VCFs use hs37d5.
Each aid may only be added once.
$ vdb study add my_study 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c
Matching VCFs:
HG002: 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c
The following VCFs are already in this study at checkpoint 1 and will be skipped:
HG002: 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c
Nothing left to import.
Use the *
wildcard to match part of a sample name. Use single '
quotes around any string containing a *
to prevent the shell from interpreting it as a local filesystem wildcard match.
# this vdb has thousands of VCFs available
$ biograph vdb vcf list
sample_name imported_on refname variant_count aid description
HGDP00001 2021-04-26 17:17:07 hs37d5 3846530 9663fc7b-e72b-468d-9aed-342fb6e72ed3
HGDP00003 2021-04-26 17:17:14 hs37d5 4494292 0aa83f9b-6c6c-495d-94f1-3b12c45291e9
HGDP00005 2021-04-26 17:17:22 hs37d5 3855632 8bdaa425-f442-493c-be2a-077b3559894b
...
HGDP01418 2021-04-26 20:04:19 hs37d5 4672976 d602eebd-3914-4980-b8e7-04040a1b9174
HGDP01419 2021-04-26 20:04:28 hs37d5 4650929 616700ee-b23a-44b0-9b4a-d16e7f6e4f07
$ biograph vdb study add hgdp_sample 'HGDP012*3'
Matching VCFs:
HGDP01213: 9080bc7b-08db-4674-ab2f-f8b82441ae6e
HGDP01233: d2ed8a1d-6dd2-455b-ba7b-c2973bbf2a72
HGDP01243: 26f2763b-ec41-4959-99b2-505fb0004216
HGDP01263: 83d7acd4-595e-4394-9466-efca354da852
HGDP01273: 8880fef5-19d0-4207-ad8d-bb73032b9e2f
HGDP01283: 3f606e13-380c-4ac0-98e2-7cd8dd9bdb77
HGDP01293: 81d97984-c9b7-4fe6-9f44-2d603cfce28e
Adding 31,310,134 variants from 7 VCFs to study hgdp_sample
To add all available VCFs to a study, specify '*'
. Note that this will only work if all imported VCFs use the same reference.
$ biograph vdb study add my_study '*'
Matching VCFs:
HG002: 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c
HG003: a98626d6-758b-468d-add9-fbfbac47d207
HG004: 9a174215-8fc5-4c6f-bc4b-134654f65b99
Adding 13,538,956 variants from 3 VCFs to study my_study
You can import variants from a different study using --from
. This is useful when exploring different grouping or filtering strategies without changing the original study. Use the '*'
wildcard to import everything, or specify sample names, aids, and wildcards as usual.
$ biograph vdb study add my_study --from another_study '*'
When using --from
, the most recent checkpoint is used by default. To select a different checkpoint, include --checkpoint
as well. This makes it easy to import all samples from an existing study but from a point before filtering was applied.
# import pre-filtered variants for HG005 from another_study
$ biograph vdb study add my_study --from another_study --checkpoint 1 HG005
$ biograph vdb study add --help
usage: biograph vdb study add [-h] [--from SRC_STUDY]
[--checkpoint CHECKPOINT]
study_name sample [sample ...]
Add variants to a study
Specify a VCF id or sample name to include all of its variants.
Wildcard matching * is applied to match multiple sample names.
To copy variants from the most recent checkpoint of an existing study,
use --from and specify one or more sample names with optional wildcards.
Use --checkpoint to select an older checkpoint in the study.
To remove VCFs from a study, use the 'filter' or 'revert' study commands.
All variants in a study must be called against the same reference.
Examples:
# Add a specific VCF id
$ biograph vdb study add my_study 0d1da4fa-778d-4d1d-9700-45f56acba576
# Sample name
$ biograph vdb study add my_study HG002
# Wildcard match. Wrap in '' to avoid accidental shell glob matching.
$ biograph vdb study add my_study 'HG00*' 'NA*3'
# Copy all variants from an existing study at the most recent checkpoint
$ biograph vdb study add my_study --from another_study '*'
# Copy sample HG003 from an existing study at a specific checkpoint
$ biograph vdb study add my_study --from another_study --checkpoint 3 'HG003'
positional arguments:
study_name Name of the study
sample VCF Sample name or aid to add
optional arguments:
-h, --help show this help message and exit
--from SRC_STUDY Look for samples in this study
--checkpoint CHECKPOINT
When using --from, copy variants form this checkpoint
(default: most recent)