Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import plink-ng as a git dependency #16

Closed
wants to merge 244 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
244 commits
Select commit Hold shift + click to select a range
d40ec8e
Simulation framework and classes
mlamkin7 Jan 22, 2022
e94ea45
added GeneticMarker class
mlamkin7 Jan 24, 2022
5a8ad8a
Completed Classes
mlamkin7 Jan 26, 2022
41a4a76
Restructure of repo
mlamkin7 Feb 8, 2022
17e724f
Refactor repository
mlamkin7 Feb 9, 2022
60472da
Completed Simulator (Needs to be tested)
mlamkin7 Feb 9, 2022
f14ab6a
Merged haptools updates
mlamkin7 Feb 9, 2022
fb7c4c9
Fixed formatting
mlamkin7 Feb 9, 2022
e44bacc
Increased runtime by precalculating randomized events in the simulati…
mlamkin7 Feb 13, 2022
bb8afae
Hacked together karyogram code
mlamkin7 Feb 13, 2022
38b091a
Hacked together code
mlamkin7 Feb 13, 2022
9eba3b5
initiating haptools readme pages
Feb 14, 2022
4d73de8
fixing minor readme typos
Feb 14, 2022
2c9da55
minor readme changes
Feb 14, 2022
f65204b
adding simgenotypes readme
Feb 14, 2022
06fcaaa
Update README.md
mlamkin7 Feb 15, 2022
e41bfee
Update README.md
mlamkin7 Feb 15, 2022
058b0e8
Update README.md
mlamkin7 Feb 15, 2022
2c99a25
Update README.md
mlamkin7 Feb 15, 2022
ec872bb
Update README.md
mlamkin7 Feb 15, 2022
94fa0fe
Update README.md
mlamkin7 Feb 15, 2022
b5d1bc2
Merge pull request #14 from aryarm/mgymrek-docs-admixsim
mlamkin7 Feb 27, 2022
3f8f662
update file locations and fixed cM -> M
mlamkin7 Feb 27, 2022
663af36
Merge remote-tracking branch 'origin/admix-sim' into admix-sim
mlamkin7 Feb 27, 2022
453075a
removed snakemake
mlamkin7 Mar 1, 2022
8c84401
update poetry to v1.2
aryarm Mar 6, 2022
4563bde
add pgenlib as a dependency
aryarm Mar 7, 2022
87a7a5e
oops - specify it as a tag instead of rev
aryarm Mar 7, 2022
1e9c929
Merge pull request #17 from aryarm/admix-sim
gymreklab Mar 15, 2022
18786eb
adding haplotype classes
Mar 15, 2022
ee62fa0
correct version of simgenotype
mlamkin7 Mar 16, 2022
d54ba73
handle SNP haplotype finding in transform
Mar 16, 2022
884225f
initial simulator code
Mar 16, 2022
b12f04a
adding simphenotype readme
Mar 16, 2022
1a4747c
adding simphenotype readme
Mar 16, 2022
07aac6a
adding simphenotype readme
Mar 16, 2022
0b1099b
adding simphenotype readme
Mar 16, 2022
0ce8574
update dependencies in lock
aryarm Apr 1, 2022
f33a6c3
install pandas
aryarm Apr 2, 2022
482db62
decide that we actually dont need pandas, after all
aryarm Apr 2, 2022
a8d5af8
update black to fix incompatibility with click
aryarm Apr 2, 2022
8c042fb
add covariates class to data module
aryarm Apr 2, 2022
a50b90c
separate python standard library dependencies from external dependencies
aryarm Apr 2, 2022
0756290
allow for filtering multiallelic variants in genotypes class
aryarm Apr 2, 2022
dc7cad0
warn user if the data module loaded zero variants
aryarm Apr 2, 2022
c14fd91
add md file to the docs via steps in #18
aryarm Apr 2, 2022
ebaf7ed
reference md files external from docs/
aryarm Apr 2, 2022
6bd5ea5
remove header b/c it already appears in the README
aryarm Apr 8, 2022
f71b853
support gz and bz2 file extensions (see #19)
aryarm Apr 9, 2022
435ef2c
fmt with black
aryarm Apr 10, 2022
e1b4e17
update and lock dependencies
aryarm Apr 11, 2022
3454ad6
add matplotlib dep (resolves #22)
aryarm Apr 11, 2022
427f030
remove pytabix dependency and use pysam instead
aryarm Apr 11, 2022
a5877a9
separate imports into blocks
aryarm Apr 11, 2022
a8c263c
move mpl outside of docs dep section
aryarm Apr 11, 2022
e290f2e
Merge pull request #21 from gymrek-lab/feat/add-md-docs
gymreklab Apr 13, 2022
7827e3d
reduce memory in Genotypes.read (see #19)
aryarm Apr 13, 2022
18f5a97
use logging instead of assertions in data module (see #19)
aryarm Apr 13, 2022
47f61e6
add iterate function to data classes (see #19)
aryarm Apr 13, 2022
81be2be
convert assertions in phens and covars classes to logs
aryarm Apr 13, 2022
d001916
fmt with black
aryarm Apr 13, 2022
133777b
switch to using namedtuple in iterate function
aryarm Apr 13, 2022
ed28bd9
fmt with black
aryarm Apr 13, 2022
af26af6
Merge pull request #20 from gymrek-lab/feat/multi-allelic
aryarm Apr 13, 2022
cb3f43b
Merge pull request #24 from gymrek-lab/fix/dependencies
aryarm Apr 13, 2022
30f3f75
resolve merge conflicts from haplotype_classes
aryarm Apr 13, 2022
d059079
move files out of directories
aryarm Apr 13, 2022
af7e5c3
resolve docs after moving files
aryarm Apr 13, 2022
71faf08
remove conflict markers in pyproject
aryarm Apr 13, 2022
71c2448
Merge pull request #27 from gymrek-lab/ref/directories
aryarm Apr 13, 2022
6ade14c
VCF updates
mlamkin7 Apr 13, 2022
c6c66e5
fixing relative import
Apr 13, 2022
7ff1b6b
Start of VCF implementation
mlamkin7 Apr 13, 2022
b79afac
fixing relative imports
Apr 13, 2022
f5ca258
fixed merge conflict
mlamkin7 Apr 13, 2022
39de906
Merge branch 'karyogram' into feat/vcf_output
mlamkin7 Apr 13, 2022
0a32585
Merge pull request #28 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 13, 2022
2365418
VCF and HAP file examples
s041629 Apr 14, 2022
dc3c3ce
copy variant module from happler
aryarm Apr 14, 2022
3d033d9
cleaning up karyogram code
Apr 14, 2022
aa2f682
Merge branch 'karyogram' of https://github.com/gymrek-lab/haptools in…
Apr 14, 2022
b8c74e2
Added specifying chromosome functionality to simulating genotypes.
mlamkin7 Apr 14, 2022
3ea4e83
DAT example files
s041629 Apr 14, 2022
5600ba5
Merge pull request #29 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 14, 2022
89cc333
Example DAT files
s041629 Apr 14, 2022
5a11f98
Merge pull request #31 from gymrek-lab/karyogram
gymreklab Apr 14, 2022
420903d
cleaning up karyogram code
Apr 14, 2022
769a106
Merge pull request #32 from gymrek-lab/karyogram
gymreklab Apr 14, 2022
9a0d40c
Admixed individuals from the US
s041629 Apr 14, 2022
9a0fa20
start on work for haplotype parser
aryarm Apr 14, 2022
93e28a4
solving karyogram color issues when color not specified
Apr 14, 2022
e06dba6
Fixed error with regex parsing of chromosomes
mlamkin7 Apr 14, 2022
36ca43e
Merge pull request #34 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 14, 2022
2b88859
adding centromeres
Apr 14, 2022
fcf2b1e
cleaning up karyogram user interface
Apr 14, 2022
b13a5d7
cleaning up karyogram user interface
Apr 14, 2022
1f3b67d
Merge pull request #35 from gymrek-lab/karyogram
gymreklab Apr 14, 2022
7bcd48a
catching user errors in karyogram
Apr 14, 2022
6f36202
Merge pull request #36 from gymrek-lab/karyogram
gymreklab Apr 14, 2022
84873cd
fixing broken links in readme to docs that moved
Apr 14, 2022
c789595
adding karyogram docs page
Apr 14, 2022
4b4cfa6
adding karyogram docs page
Apr 14, 2022
6834440
adding test karyogram image
Apr 14, 2022
b73ab38
adding test karyogram image
Apr 14, 2022
84045b2
typos in karyogram docs
Apr 14, 2022
e8ccbff
typos in karyogram docs
Apr 14, 2022
92e42a2
adding example breakpoints
Apr 14, 2022
3644aed
Merge pull request #37 from gymrek-lab/docs
gymreklab Apr 14, 2022
8dc7bd0
Fixed floating point error
mlamkin7 Apr 15, 2022
1391b0c
continue implementing Haplotypes.read and Haplotypes.iterate methods
aryarm Apr 17, 2022
d488acd
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 Apr 19, 2022
c51467b
create Haplotype and Variant classes for storing lines from .haps files
aryarm Apr 19, 2022
c8428fa
create specific section in docs for file formats
aryarm Apr 19, 2022
2015215
fix issues with commands not appearing in toc of docs
aryarm Apr 19, 2022
1879e04
add docs for .hap haplotypes file format
aryarm Apr 19, 2022
0a2af60
copy variant module from happler
aryarm Apr 14, 2022
7c8d182
start on work for haplotype parser
aryarm Apr 14, 2022
99071f8
continue implementing Haplotypes.read and Haplotypes.iterate methods
aryarm Apr 17, 2022
c5500fe
create Haplotype and Variant classes for storing lines from .haps files
aryarm Apr 19, 2022
32bd815
create specific section in docs for file formats
aryarm Apr 19, 2022
600e032
fix issues with commands not appearing in toc of docs
aryarm Apr 19, 2022
8cb274a
add docs for .hap haplotypes file format
aryarm Apr 19, 2022
dc63ed2
Merge branch 'feat/haplotypes' of github.com:gymrek-lab/haptools into…
aryarm Apr 19, 2022
8f856b0
rename hap data files
aryarm Apr 19, 2022
91856b4
create new example hap files with beta added
aryarm Apr 19, 2022
1f24949
Added pulse events
mlamkin7 Apr 19, 2022
8a71d2b
Merge pull request #39 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 19, 2022
5aa0deb
change allele to str in hap format spec
aryarm Apr 19, 2022
2dc7a74
fixed error in arguments for simulate gt
mlamkin7 Apr 19, 2022
465f6ec
Merge pull request #40 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 19, 2022
f5fea3a
Update cuba.dat
gymreklab Apr 19, 2022
ade8b0d
Update README.md
gymreklab Apr 19, 2022
a62a03b
correct type-hinting of return of Haplotypes.iterate
aryarm Apr 19, 2022
e06b7f9
renaming sim_admixture to sim_genotypes to match cmd name
Apr 19, 2022
a784e6b
use fname property in Haplotypes.write
aryarm Apr 19, 2022
b324fe1
updating help messages for simgenotype
Apr 19, 2022
64c92af
renaming sim_genotype
Apr 19, 2022
5175676
Merge pull request #41 from gymrek-lab/simgenotypes-docs
gymreklab Apr 19, 2022
cf82d4f
start handling extras in Haplotypes class
aryarm Apr 21, 2022
555deba
store variants as tuple intead of list in Haplotype class
aryarm Apr 21, 2022
ec69ae7
rewrite from_hap_spec to automatically use properties from subclasses
aryarm Apr 21, 2022
0eff78d
define new haplotype class for haptools
aryarm Apr 21, 2022
41b104e
updated chroms default values
mlamkin7 Apr 21, 2022
d8bb990
Merge pull request #42 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 21, 2022
0973654
Update simgenotype.md
mlamkin7 Apr 21, 2022
5ba8f78
check header lines in Haplotypes.read
aryarm Apr 21, 2022
54a0617
add docs for usage of the .hap file
aryarm Apr 21, 2022
4f7c7fa
fmt with black
aryarm Apr 21, 2022
b196736
rebuild api docs with haplotypes.py
aryarm Apr 21, 2022
3e2a426
add examples for Haplotypes class
aryarm Apr 22, 2022
7e86aaf
validate that all extras are there in Haplotypes.check_ex_header
aryarm Apr 22, 2022
6232929
make _fmt a private field
aryarm Apr 23, 2022
62ab36b
convert iterate to __iter__ in data module
aryarm Apr 23, 2022
b369539
add more examples and docs to haplotypes class
aryarm Apr 23, 2022
f2fe5ac
add example hap files to docs
aryarm Apr 23, 2022
effb035
create smaller hap example files
aryarm Apr 23, 2022
b3d05ff
add HaplotypeTests class to testing module
aryarm Apr 23, 2022
fb27999
call __iter__ from read in Haplotypes class
aryarm Apr 23, 2022
3cca45f
use basic.hap in haplotypes examples
aryarm Apr 23, 2022
054a01b
add indexed basic hap and test example.hap.gz
aryarm Apr 23, 2022
5080728
test Haplotypes.write() method
aryarm Apr 23, 2022
e2bf695
add header lines to example.hap
aryarm Apr 23, 2022
4480b19
reformat with black -- oops
aryarm Apr 24, 2022
6d3b598
require sorting of line type symbols for indexed hap files
aryarm Apr 24, 2022
68b3047
Delete nohup.out
mlamkin7 Apr 28, 2022
f0e173c
Delete nohup.out
mlamkin7 Apr 28, 2022
394fa20
Updated documentation
mlamkin7 May 5, 2022
c3074c7
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 May 5, 2022
b5ca354
Merge branch 'feat/vcf_output' of https://github.com/gymrek-lab/hapto…
mlamkin7 May 5, 2022
6a7c5ee
add Extra object encoding extra fields in Haplotypes module
aryarm May 9, 2022
2edcb52
revise hap test data files to pass tests
aryarm May 9, 2022
122f062
Merge branch 'feat/haplotypes' of github.com:gymrek-lab/haptools into…
aryarm May 9, 2022
68ba119
add docs for new extra field declarations in header
aryarm May 9, 2022
001a28e
Preallocate np array when loading genotypes
aryarm May 9, 2022
daaeadf
retest genotypes module after changes
aryarm May 10, 2022
eb641c0
create transform subcommand
aryarm May 10, 2022
95a1619
create TestGenotypes class in testing module
aryarm May 10, 2022
2e33f4a
test variant selection in Genotypes class
aryarm May 10, 2022
e54143a
refmt with black
aryarm May 10, 2022
60bda2b
create Data.unset() to check if data is unset
aryarm May 10, 2022
56ea690
add variants param to Genotypes.load()
aryarm May 10, 2022
e830119
output from a file path in transform subcommand
aryarm May 11, 2022
c1b55ff
create Genotypes class that also stores REF/ALT
aryarm May 11, 2022
db74659
create Haplotype.transform function
aryarm May 11, 2022
b72d1d3
create Haplotypes.transform function and add tests
aryarm May 11, 2022
e084ea8
write Haplotypes to a VCF
aryarm May 11, 2022
2c1dc3c
refmt with black and get rid of HaplotypesGT class
aryarm May 11, 2022
9e83254
clean up transform docs
aryarm May 11, 2022
6bad9d8
warn against importing at the top of __main__
aryarm May 11, 2022
4384cb8
clean up duplicated code in Genotypes class
aryarm May 13, 2022
259aaee
add Genotypes._prephased attr to ignore phasing while debugging
aryarm May 13, 2022
e72f2d3
allow for discarding samples that are missing genotypes
aryarm May 13, 2022
13c06e7
add more docs and messages to Genotypes and Haplotypes classes
aryarm May 13, 2022
1410315
require GenotypeRefAlt instance as input to Haplotypes.transform
aryarm May 13, 2022
8b502d1
Incomplete VCF output
mlamkin7 May 14, 2022
8ccb7d2
refmt with black
aryarm May 14, 2022
75e75be
prelim code for other gts readers
aryarm May 14, 2022
34a839d
Merge pull request #45 from gymrek-lab/feat/transform
aryarm May 14, 2022
9b4393a
Merge branch 'feat/haplotypes' into main
aryarm May 14, 2022
d092a1e
fix TypeError in sim_genotype._write_vcf
aryarm May 18, 2022
95ddb6e
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 Jun 9, 2022
f1432d4
need to stash changes
mlamkin7 Jun 9, 2022
13af0a5
fixed merging issues
mlamkin7 Jun 9, 2022
09ba1ed
Completed output vcf
mlamkin7 Jun 15, 2022
5efd815
Completed Output VCF and one test case
mlamkin7 Jun 16, 2022
6c81f8d
Added validation to input files for sim genotype
mlamkin7 Jun 17, 2022
f1a2f12
Merge pull request #53 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 27, 2022
229b285
Added 1000genomes sampleinfo file for input.
mlamkin7 Jun 29, 2022
e4aa6e2
Validation of map files' fields during execution.
mlamkin7 Jun 29, 2022
49998d8
Updated with Arya's recommendations on pull request
mlamkin7 Jun 29, 2022
738eaeb
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 Jun 29, 2022
8a931f9
Updated docs to include simgenotypes
mlamkin7 Jun 29, 2022
a760af3
Fixed bug with isfile instead of isdir
mlamkin7 Jun 30, 2022
07e51b9
Merge pull request #56 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 30, 2022
09a757c
Updated Example in __main__.py
mlamkin7 Jun 30, 2022
8f91078
Update simgenotype.md
mlamkin7 Jun 30, 2022
8e95499
Update simgenotype.md
mlamkin7 Jun 30, 2022
2b5e501
Update simgenotype.md
mlamkin7 Jun 30, 2022
d525dce
Update simgenotype.md
mlamkin7 Jun 30, 2022
4bdece4
Update simgenotype.md
mlamkin7 Jun 30, 2022
80ce432
Working example.
mlamkin7 Jun 30, 2022
54ed5a4
working example
mlamkin7 Jun 30, 2022
d76c013
Merge pull request #57 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 30, 2022
703d36e
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 Jun 30, 2022
062afdc
Merge pull request #58 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 30, 2022
f021d3a
minor cleanups to simgenotype output messages
Jun 30, 2022
9e63e06
Merge pull request #59 from gymrek-lab/test-simgenotype
mlamkin7 Jun 30, 2022
d4b10ad
stashing changes
mlamkin7 Jun 30, 2022
c011f60
Fixed merge conflicts and added minor changes
mlamkin7 Jun 30, 2022
48363a1
Updated 1000 genomes documentation under formats
mlamkin7 Jun 30, 2022
3d99787
fixed simgenotype --help command output
mlamkin7 Jun 30, 2022
f17edd5
Merge pull request #60 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 30, 2022
ac5aa13
adding apoe haplotype example
Jul 1, 2022
db2aa61
changing apoe4 to hg19
Jul 1, 2022
7e32b51
updating apoe4 example in transform docs
Jul 1, 2022
d35c1ec
Merge pull request #61 from gymrek-lab/test-transform
gymreklab Jul 1, 2022
0dee04c
Added SAMPLE Format Field
mlamkin7 Jul 2, 2022
a26a01f
Merge pull request #62 from gymrek-lab/feat/vcf_output
mlamkin7 Jul 2, 2022
c1fabfb
Fixed issue in test_outputvcf.py
mlamkin7 Jul 2, 2022
3acdbc1
Merge pull request #63 from gymrek-lab/feat/vcf_output
mlamkin7 Jul 2, 2022
295c40b
update poetry to v1.2
aryarm Mar 6, 2022
c595077
add pgenlib as a dependency
aryarm Mar 7, 2022
5343191
oops - specify it as a tag instead of rev
aryarm Mar 7, 2022
803f8e0
update pyproject to poetry-core >=1.1.0b1
aryarm Jul 9, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,13 @@ __pycache__
# pytest cache
.pytest_cache
# poetry
dist/
dist/

# OSX
*.DS_Store*

# Test output files
test.par
test.phen
example_simgenotype.bp
example_simgenotype.vcf
28 changes: 26 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,31 @@
Please wait until we have published our first tagged release before using our code.

# haptools
Simulate phenotypes for fine-mapping. Use real variants to simulate real, biological LD patterns.
The Snakemake pipeline in the `snakemake/` directory uses the results of the simulation to test several fine-mapping methods, including FINEMAP and SuSiE.

Haptools is a collection of tools for simulating and analyzing genotypes and phenotypes while taking into account haplotype information. It is particularly designed for analysis of individuals with admixed ancestries, although the tools can also be used for non-admixed individuals.

Homepage: https://haptools.readthedocs.io/

## Installation

UNDER CONSTRUCTION

## Haptools utilities

Haptools consists of multiple utilities listed below. Click on a utility to see more detailed usage information.

* [`haptools simgenotype`](docs/commands/simgenotype.md): Simulate genotypes for admixed individuals under user-specified demographic histories.

* [`haptools simphenotype`](docs/commands/simphenotype.md): Simulate a complex trait, taking into account local ancestry- or haplotype- specific effects. `haptools simphenotype` takes as input a VCF file and outputs simulated phenotypes for each sample.

* [`haptools karyogram`](docs/commands/karyogram.md): Visualize a "chromosome painting" of local ancestry labels based on breakpoints output by `haptools simgenome`.

Outputs produced by these utilities are compatible with each other. For example
`haptools simgenome` outputs a VCF file with local ancestry information annotated for each variant. The output VCF file can be used as input to `haptools simphenotype` to simulate phenotype information. `haptools simgenome` also outputs a list of local ancestry breakpoints which can be visualized using `haptools karyogram`.

## Contributing

If you are interested in contributing to `haptools`, please get in touch by submitting a Github issue or contacting us at mlamkin@ucsd.edu.



23 changes: 23 additions & 0 deletions docs/api/haptools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,26 @@ haptools.data.phenotypes module
:undoc-members:
:show-inheritance:

haptools.data.covariates module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: haptools.data.covariates
:members:
:undoc-members:
:show-inheritance:

haptools.data.haplotypes module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: haptools.data.haplotypes
:members:
:undoc-members:
:show-inheritance:

haptools.sim_genotype module
~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: haptools.sim_genotype
:members:
:undoc-members:
:show-inheritance:
36 changes: 36 additions & 0 deletions docs/commands/karyogram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Haptools karyogram

`haptools karyogram` takes as input a breakpoints file (e.g. as output by `haptools simgenotype`) and a sample name, and plots a karyogram depicting local ancestry tracks.


## Basic usage

```
haptools karyogram \
--bp tests/data/5gen.bp \
--sample Sample_1 \
--out karyogram.png
```

See details of the breakpoints file [here](simgenotype.md). If you specify `--sample $SAMPLE`, the breakpoints file must have breakpoints for `$SAMPLE_1` and `$SAMPLE_2` (the two haplotypes of `$SAMPLE`).

## Additional options

You may also specify the following options:

* `--centromeres <FILE>`: path to a file describing the locations of chromosome ends and centromeres. An example file is given here: `tests/data/centromeres_hg19.txt`. The columns are: chromosome, chrom_start, centromere, chrom_end. For acrocentric chromosomes, the centromere field is ommitted. This file format was taken from [here](https://github.com/armartin/ancestry_pipeline).
* `--colors "pop1:color1,pop2:color2..."`: You can optionally specify which colors should be used for each population. If colors are not given, the script chooses reasonable defaults.

## Example command

The following example can be run with files in this repository:

```
haptools karyogram --bp tests/data/5gen.bp --sample Sample_1 \
--out test_karyogram.png --centromeres tests/data/centromeres_hg19.txt \
--colors 'CEU:blue,YRI:red'
```

This will output a file `test_karyogram.png`. The example is shown below.

![Example karyogram](../images/test_karyogram.png)
4 changes: 4 additions & 0 deletions docs/commands/karyogram.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.. _subcommands-karyogram:

.. include:: karyogram.md
:parser: myst_parser.sphinx_
87 changes: 87 additions & 0 deletions docs/commands/simgenotype.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# simgenotype

`haptools simgenotype` takes as input a reference set of haplotypes in VCF format and a user-specified admixture model. It outputs a VCF file with simulated genotype information for admixed genotypes, as well as a breakpoints file that can be used for visualization.

## Basic usage

```
haptools simgenotype \
--invcf REFVCF \
--sample_info SAMPLEINFOFILE \
--model MODELFILE \
--map GENETICMAP \
--out OUTPREFIX
```

Detailed information about each option, and example commands using publicly available files, are shown below.

## Detailed usage

`--invcf` - Input VCF file used to simulate specifiic haplotypes for resulting samples
`--sample_info` - File used to map samples in `REFVCF` to populations found in `MODELFILE`
`--model` - Parameters for simulating admixture across generations
`--map` - .map file used to determine recombination events during the simulation
`--out` - Output prefix of the structure `/path/to/output` which results in the vcf file `output.vcf.gz` and breakpoints file `output.bp`

## File formats

Model Format

Structure of model.dat file

`num_samples` - Total number of samples to be output by the simulator (`num_samples*2` haplotypes)
`num_generations` - Number of generations to simulate admixture, must be > 0
`*_freq` - Frequency of populations to be present in the simulated samples

```
{num_samples} Admixed Pop_label1 Pop_label2 ... Pop_labeln
{num_generations} {admixed_freq} {pop_label1_freq} {pop_label2_freq} ... {pop_labeln_freq}
```

Example model.dat file

```
40 Admixed CEU YRI
6 0 0.2 0.8
```
Simulating 6 generations in this case implies the first generation has population freqs `Admixed=0, CEU=0.2, YRI=0.8` and the remaining 2-6 generations have population frequency `Admixed=1, CEU=0, YRI=0`

Map Format

`chr` - chromosome of coordinate (1-22, X)
`var` - variant identifier
`pos cM` - Position in centimorgans
`pos bp` - Base pair coordinate

```
{chr}\t{var}\t{pos cM}\t{pos bp}
```
Beagle Genetic Maps used in simulation (GRCh38): http://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/


Outfile Format

`Sample Header` - Name of sample following the structure `Sample_{number}_{hap}` eg. `Sample_10_1` for sample number 10 haplotype 1
`pop` - Population label corresponding to the index of the population in the dat file so in the example above CEU = 1, YRI = 2
`chr` - chromosome (1-22, X)

```
Sample Header
{pop}\t{chr}\t{pos bp}
...
Sample Header 2
...
```

## Examples

Example Command
```
haptools simgenotype \
--model ./tests/data/outvcf_gen.dat \
--mapdir ./tests/data/map/ \
--chroms 1,2 \
--invcf ./tests/data/outvcf_test.vcf \
--sample_info ./tests/data/outvcf_info.tab \
--out ./tests/data/example_simgenotype
```
4 changes: 4 additions & 0 deletions docs/commands/simgenotype.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.. _commands-simgenotype:

.. include:: simgenotype.md
:parser: myst_parser.sphinx_
76 changes: 76 additions & 0 deletions docs/commands/simphenotype.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# simphenotype

Haptools simphenotype simulates a complex trait, taking into account haplotype- or local-ancestry- specific effects as well as traditional variant-level effects. It takes causal effects and genotypes as input and outputs simulated phenotypes.

Usage is modeled based on the [GCTA GWAS Simulation](https://yanglab.westlake.edu.cn/software/gcta/#GWASSimulation) utility.

## Usage

Below is a basic `haptools simphenotype` command:

```
haptools simphenotype \
--vcf <gzipped vcf file> \
--hap <gzipped hap file> \
--out <outprefix> \
< --simu-qt | --simu-cc > \
[simulation options]
```

Required parameters:

* `--vcf <string>`: A bgzipped, tabix-indexed, phased VCF file. If you are simulating local-ancestry effects, the VCF file must contain the `FORMAT/LA` tag included in output of `haptools simgenotype`. See [haptools file formats](../../docs/formats/inputs.rst) for more details.

* `--hap <string>`: A bgzipped, tabix-indexed HAP file, which specifies causal effects. This is a custom format described in more detail in [haptools file formats](../../docs/formats/haplotypes.rst). The HAP format enables flexible specification of a range of effect types including traditional variant-level effects, haplotype-level effects, associations with repeat lengths at short tandem repeats, and interaction of these effects with local ancestry labels. See [Examples](#examples) below for detailed examples of how to specify effects.

* `--out <string>`: Prefix to name output files.

* `--simu-qt` or `simu-cc` indicate whether to simulate a quantitative or case control trait. One of these two options must be specified.

Additional parameters:

* `--simu-rep <int>`: Number of phenotypes to simulate. Default: 1.

* `--simu-hsq <float>`: Trait heritability. Default: 0.1.

* `--simu-k <float>`: Disease prevalence (for case-control). Default: 0.1

## Output files

`haptools simphenotypes` outputs the following files:

* `<outprefix>.phen`: Based on the phenotype files accepted by [plink](https://www.cog-genomics.org/plink/1.9/input#pheno). Tab-delimited file with one row per sample. The first and second columns give the sample ID. The third column gives the simulated phenotype. If `--simu-rep` was set to greater than 1, additional columns are output for each simulated trait. Example file:

```
HG00096 HG00096 0.058008375906919506
HG00097 HG00097 0.03472768471423458
HG00099 HG00099 -0.20850127859705808
HG00100 HG00100 -0.21206803352471154
HG00101 HG00101 0.3157913763428323
```

* `<outprefix>.par`: summarizes the frequencies and effects of simulated haplotypes. The columns are: haplotype ID (from the HAP file), haplotype frequency, and effect. Example file:

```
Haplotype Frequency Effect
H-001 0.6 -0.2
```

<a name="examples"></a>
## Examples

#### Simulate a single haplotype-effect based on a 2 SNP haplotype:

```
haptools simphenotype \
--vcf tests/data/simple.vcf.gz \
--hap tests/data/simple.hap.gz \
--out test \
--simu-qt --simu-hsq 0.8 --simu-rep 1
```

based on this HAP file (available in `tests/data`)

```
H-001 1 10114 10116 1:10114:T:C,1:10116:A:G T,G * -0.2
```
4 changes: 4 additions & 0 deletions docs/commands/simphenotype.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.. _commands-simphenotype:

.. include:: simphenotype.md
:parser: myst_parser.sphinx_
39 changes: 39 additions & 0 deletions docs/commands/transform.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
.. _commands-transform:


transform
=========

Transform a set of genotypes via a list of haplotypes. Create a new VCF containing haplotypes instead of variants.

The ``transform`` command takes as input a set of genotypes in VCF and a list of haplotypes (specified as a :doc:`.hap file </formats/haplotypes>`) and outputs a set of haplotype "genotypes" in VCF.

Usage
~~~~~
.. code-block:: bash

haptools transform \
--region TEXT \
--sample SAMPLE \
--samples-file FILENAME \
--output PATH \
--verbosity [CRITICAL|ERROR|WARNING|INFO|DEBUG|NOTSET] \
GENOTYPES HAPLOTYPES

Examples
~~~~~~~~
.. code-block:: bash

haptools transform tests/data/example.vcf.gz tests/data/example.hap.gz | less

.. code-block:: bash

haptools transform -o output.vcf.gz -s NA12878 tests/data/apoe.vcf.gz tests/data/apoe4.hap

Detailed Usage
~~~~~~~~~~~~~~

.. click:: haptools.__main__:main
:prog: haptools
:show-nested:
:commands: transform
7 changes: 5 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
# -- Project information -----------------------------------------------------

project = "haptools"
copyright = "2021, Arya Massarat, Michael Lamkin"
author = "Arya Massarat, Michael Lamkin"
copyright = "2021, Michael Lamkin, Arya Massarat"
author = "Michael Lamkin, Arya Massarat"


# -- General configuration ---------------------------------------------------
Expand All @@ -34,6 +34,7 @@
"sphinx_rtd_theme",
"numpydoc",
"sphinx_click",
"myst_parser",
]

# Add any paths that contain templates here, relative to this directory.
Expand All @@ -47,6 +48,8 @@
# -- Extension configuration -------------------------------------------------
autosummary_generate = True
numpydoc_show_class_members = False
# allow for both rst and md syntax
source_suffix = ['.rst']

# -- Options for HTML output -------------------------------------------------

Expand Down
10 changes: 0 additions & 10 deletions docs/executing/inputs.rst

This file was deleted.

Loading