Skip to content

study add

Rob Flickenger edited this page Aug 9, 2021 · 2 revisions

Use the biograph vdb study add command to add variants to a study. Variant entries may come from imported VCFs or from other studies. Any number of sample names, aids, or wildcards may be specified. If a sample name is specified, all VCFs matching that sample name will be added. Use aids to specify specify specific VCFs.

$ biograph vdb study add my_study HG002 HG003 9a174215-8fc5-4c6f-bc4b-134654f65b99
Matching VCFs:
  HG002: 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c
  HG003: a98626d6-758b-468d-add9-fbfbac47d207
  HG004: 9a174215-8fc5-4c6f-bc4b-134654f65b99

Adding 13,538,956 variants from 3 VCFs to study my_study

All variants in a study must be called against the same genetic reference. Attempting to add variants that were not called on the same reference will cause biograph to exit with an error.

$ biograph vdb study add my_study HG002_hs37d5
Matching VCFs:
  HG002_hs37d5: 7df5a719-47ed-4cb8-be1b-b4c09864fc6f

Study my_study uses reference grch38, but the specified VCFs use hs37d5.

Each aid may only be added once.

$ vdb study add my_study 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c
Matching VCFs:
  HG002: 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c

The following VCFs are already in this study at checkpoint 1 and will be skipped:
  HG002: 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c

Nothing left to import.

Wildcards

Use the * wildcard to match part of a sample name. Use single ' quotes around any string containing a * to prevent the shell from interpreting it as a local filesystem wildcard match.

# this vdb has thousands of VCFs available
$ biograph vdb vcf list
sample_name          imported_on          refname    variant_count aid                                  description
HGDP00001            2021-04-26 17:17:07  hs37d5     3846530       9663fc7b-e72b-468d-9aed-342fb6e72ed3
HGDP00003            2021-04-26 17:17:14  hs37d5     4494292       0aa83f9b-6c6c-495d-94f1-3b12c45291e9
HGDP00005            2021-04-26 17:17:22  hs37d5     3855632       8bdaa425-f442-493c-be2a-077b3559894b
...
HGDP01418            2021-04-26 20:04:19  hs37d5     4672976       d602eebd-3914-4980-b8e7-04040a1b9174
HGDP01419            2021-04-26 20:04:28  hs37d5     4650929       616700ee-b23a-44b0-9b4a-d16e7f6e4f07

$ biograph vdb study add hgdp_sample 'HGDP012*3'
Matching VCFs:
  HGDP01213: 9080bc7b-08db-4674-ab2f-f8b82441ae6e
  HGDP01233: d2ed8a1d-6dd2-455b-ba7b-c2973bbf2a72
  HGDP01243: 26f2763b-ec41-4959-99b2-505fb0004216
  HGDP01263: 83d7acd4-595e-4394-9466-efca354da852
  HGDP01273: 8880fef5-19d0-4207-ad8d-bb73032b9e2f
  HGDP01283: 3f606e13-380c-4ac0-98e2-7cd8dd9bdb77
  HGDP01293: 81d97984-c9b7-4fe6-9f44-2d603cfce28e

Adding 31,310,134 variants from 7 VCFs to study hgdp_sample

To add all available VCFs to a study, specify '*'. Note that this will only work if all imported VCFs use the same reference.

$ biograph vdb study add my_study '*'
Matching VCFs:
  HG002: 2e7a0129-13a5-44b6-8594-2fc2e6c80e6c
  HG003: a98626d6-758b-468d-add9-fbfbac47d207
  HG004: 9a174215-8fc5-4c6f-bc4b-134654f65b99

Adding 13,538,956 variants from 3 VCFs to study my_study

Adding variants from another study

You can import variants from a different study using --from. This is useful when exploring different grouping or filtering strategies without changing the original study. Use the '*' wildcard to import everything, or specify sample names, aids, and wildcards as usual.

$ biograph vdb study add my_study --from another_study '*'

When using --from, the most recent checkpoint is used by default. To select a different checkpoint, include --checkpoint as well. This makes it easy to import all samples from an existing study but from a point before filtering was applied.

# import pre-filtered variants for HG005 from another_study
$ biograph vdb study add my_study --from another_study --checkpoint 1 HG005

Getting more help

$ biograph vdb study add --help
usage: biograph vdb study add [-h] [--from SRC_STUDY]
                              [--checkpoint CHECKPOINT]
                              study_name sample [sample ...]

Add variants to a study

Specify a VCF id or sample name to include all of its variants.
Wildcard matching * is applied to match multiple sample names.

To copy variants from the most recent checkpoint of an existing study,
use --from and specify one or more sample names with optional wildcards.
Use --checkpoint to select an older checkpoint in the study.

To remove VCFs from a study, use the 'filter' or 'revert' study commands.

All variants in a study must be called against the same reference.

Examples:

 # Add a specific VCF id
 $ biograph vdb study add my_study 0d1da4fa-778d-4d1d-9700-45f56acba576

 # Sample name
 $ biograph vdb study add my_study HG002

 # Wildcard match. Wrap in '' to avoid accidental shell glob matching.
 $ biograph vdb study add my_study 'HG00*' 'NA*3'

 # Copy all variants from an existing study at the most recent checkpoint
 $ biograph vdb study add my_study --from another_study '*'

 # Copy sample HG003 from an existing study at a specific checkpoint
 $ biograph vdb study add my_study --from another_study --checkpoint 3 'HG003'

positional arguments:
  study_name            Name of the study
  sample                VCF Sample name or aid to add

optional arguments:
  -h, --help            show this help message and exit
  --from SRC_STUDY      Look for samples in this study
  --checkpoint CHECKPOINT
                        When using --from, copy variants form this checkpoint
                        (default: most recent)
Clone this wiki locally