Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make up to date with VertebrateResequencing / pbwt ? #50

Open
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

CholoTook
Copy link

Perhaps it's my anal-retentive personality type, but the fact that you've diverged really bugs me.

If you don't want to merge changes, these should be separate projects no?

Cheers,

mcshane and others added 30 commits April 22, 2015 16:48
bugfix for reading and writing samples files
add a command -log to allow logging to a file
vcf/bcf opening can now do so via libcurl/irods if htslib configured for that
* ploidy I/O should now actually work
* VCF output now treats missing data correctly
* imputeMissing now called on the merge of the input data and frame prior to imputation
* still some I/O work todo with missing/dosage and subsetting sites and samples
* add `-removeSamples` option
* when subsetting samples and sites, also update `missing` for these subsets
* fix bug in `pbwtRemoveSites` where it would not print out
  sites beyond the last site in the `removeSites` file
* add tests
It was not practical. For a large input cohort most sites will
have at least one sample with missing data and the given implementation
would fall over
revert whitespace changes to clean up diff
rename `sample()` to `pbwtSample()` to indicate it is getting
indexing into the `p->samples` array rather than the global
`sampleDict`. `sample()` then is the used to look up in
`sampleDict`.
reading the `.samples` file will now guess `isX` and `isY` by comparing
the number of samples with the number of haplotypes.

`pbwtReadVcfGT` requires the command line `-X` and `-Y` option to be set
not entirely sure what state this is in...
Because a static sample dictionary is used and two sample sets are read (once
for input VCF and once for the reference panel), the sample indexes were
incorrect.
for the following command:
valgrind ./pbwt -log /dev/null -readVcfGT test/refImpute.in.vcf -referenceImpute xxx/OMNI -writeVcf /dev/null
…eferenceImpute

The kOld iterator is incremented before it is used, so the the >= equalities
were selected incorrect (non-overlapping) segments.

In this commit also matchSequencesSweepSparse() is replaced by
matchSequencesSweep(), this may not be desired, not sure.
pd3 and others added 10 commits January 30, 2017 10:52
Test whether the assumption is true rather than inserting hard
values to notify us about possible problems.
by avoiding loading reference panel samples, thus there is no need
two store different sets of samples in a global hash.

This replaces the previous commits 017fdc9, f682390, and 1e62967.
…g, correct order of updating cursor; in paintAncestryMatrix() free a couple more arrays to fix memory leak
@CholoTook
Copy link
Author

Gah... there's clearly too much here... Lets look at just the 13 commits in this fork that aren't in the VertebrateResequencing fork...

@richarddurbin
Copy link
Owner

richarddurbin commented Mar 8, 2022 via email

@CholoTook
Copy link
Author

So... git show verts/production..durbo/master

Looks OK... git rebase durbo/master

@CholoTook
Copy link
Author

CholoTook commented Mar 8, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants