Make up to date with VertebrateResequencing / pbwt ? #50

CholoTook · 2022-03-08T14:47:51Z

Perhaps it's my anal-retentive personality type, but the fact that you've diverged really bugs me.

If you don't want to merge changes, these should be separate projects no?

Cheers,

…h GATK allele depth

bugfix for reading and writing samples files

add a command -log to allow logging to a file

…thing

vcf/bcf opening can now do so via libcurl/irods if htslib configured for that

…loidy

* ploidy I/O should now actually work * VCF output now treats missing data correctly * imputeMissing now called on the merge of the input data and frame prior to imputation * still some I/O work todo with missing/dosage and subsetting sites and samples

* add `-removeSamples` option * when subsetting samples and sites, also update `missing` for these subsets * fix bug in `pbwtRemoveSites` where it would not print out sites beyond the last site in the `removeSites` file * add tests

It was not practical. For a large input cohort most sites will have at least one sample with missing data and the given implementation would fall over

revert whitespace changes to clean up diff

rename `sample()` to `pbwtSample()` to indicate it is getting indexing into the `p->samples` array rather than the global `sampleDict`. `sample()` then is the used to look up in `sampleDict`.

reading the `.samples` file will now guess `isX` and `isY` by comparing the number of samples with the number of haplotypes. `pbwtReadVcfGT` requires the command line `-X` and `-Y` option to be set

not entirely sure what state this is in...

Because a static sample dictionary is used and two sample sets are read (once for input VCF and once for the reference panel), the sample indexes were incorrect.

Feature/ploidy

for the following command: valgrind ./pbwt -log /dev/null -readVcfGT test/refImpute.in.vcf -referenceImpute xxx/OMNI -writeVcf /dev/null

…eferenceImpute The kOld iterator is incremented before it is used, so the the >= equalities were selected incorrect (non-overlapping) segments. In this commit also matchSequencesSweepSparse() is replaced by matchSequencesSweep(), this may not be desired, not sure.

… production

Test whether the assumption is true rather than inserting hard values to notify us about possible problems.

by avoiding loading reference panel samples, thus there is no need two store different sets of samples in a global hash. This replaces the previous commits 017fdc9, f682390, and 1e62967.

…g, correct order of updating cursor; in paintAncestryMatrix() free a couple more arrays to fix memory leak

…major allele [temporary solution]

CholoTook · 2022-03-08T14:52:08Z

Gah... there's clearly too much here... Lets look at just the 13 commits in this fork that aren't in the VertebrateResequencing fork...

richarddurbin · 2022-03-08T15:03:10Z

Hi Dan. Sorry, I just picked up this thread and will look at this. Richard From: Dan Bolser ***@***.***> Date: Tuesday, 8 March 2022 at 14:52 To: richarddurbin/pbwt ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [richarddurbin/pbwt] Make up to date with VertebrateResequencing / pbwt ? (PR #50) Gah... there's clearly too much here... Lets look at just the 13 commits in this fork that aren't in the VertebrateResequencing fork... — Reply to this email directly, view it on GitHub<#50 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA2FXZSGFGPFSFHCSABLZKLU65SSLANCNFSM5QGR6KYA>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

CholoTook · 2022-03-08T15:05:19Z

So... git show verts/production..durbo/master

Looks OK... git rebase durbo/master

CholoTook · 2022-03-08T15:11:40Z

Thanks so much, I got confused quite quickly trying to merge the two. ``` git rebase durbo/master First, rewinding head to replay your work on top of it... Applying: set htslib=1.2.1, pbwt=3.1 and link against bsd Using index info to reconstruct a base tree... A makefile Falling back to patching base and 3-way merge... CONFLICT (modify/delete): makefile deleted in HEAD and modified in set htslib=1.2.1, pbwt=3.1 and link against bsd. Version set htslib=1.2.1, pbwt=3.1 and link against bsd of makefile left in tree. error: Failed to merge in the changes. Patch failed at 0001 set htslib=1.2.1, pbwt=3.1 and link against bsd hint: Use 'git am --show-current-patch' to see the failed patch Resolve all conflicts manually, mark them as resolved with "git add/rm <conflicted_files>", then run "git rebase --continue". You can instead skip this commit: run "git rebase --skip". To abort and get back to the state before "git rebase", run "git rebase --abort". ```

…

On Tue, 8 Mar 2022 at 15:03, Richard Durbin ***@***.***> wrote: Hi Dan. Sorry, I just picked up this thread and will look at this. Richard From: Dan Bolser ***@***.***> Date: Tuesday, 8 March 2022 at 14:52 To: richarddurbin/pbwt ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [richarddurbin/pbwt] Make up to date with VertebrateResequencing / pbwt ? (PR #50) Gah... there's clearly too much here... Lets look at just the 13 commits in this fork that aren't in the VertebrateResequencing fork... — Reply to this email directly, view it on GitHub< #50 (comment)>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AA2FXZSGFGPFSFHCSABLZKLU65SSLANCNFSM5QGR6KYA >. Triage notifications on the go with GitHub Mobile for iOS< https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android< https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub >. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub <#50 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANKSZTROJ7T5B7YTDZR2SB3U65T3ZANCNFSM5QGR6KYA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

mcshane and others added 30 commits April 22, 2015 16:48

set htslib=1.2.1, pbwt=3.1 and link against bsd

baffe47

change allele dosage annotation from AD to ADS to avoid collision wit…

6662992

…h GATK allele depth

Merge pull request #1 from mcshane/feature/read_write_samples

f490a8e

bugfix for reading and writing samples files

Merge pull request #2 from mcshane/feature/log

8b1a95d

add a command -log to allow logging to a file

update referenceImpute test files so that the test actually does some…

a0ad99b

…thing

first pass at adding support for chrX and chrY

f80da84

makefile addition to allow linking against htslib-with plugins

257a60d

vcf/bcf opening can now do so via libcurl/irods if htslib configured for that

Merge branch 'master' of github.com:richarddurbin/pbwt into feature/p…

85a0a90

…loidy

further ploidy updates

42200ae

* ploidy I/O should now actually work * VCF output now treats missing data correctly * imputeMissing now called on the merge of the input data and frame prior to imputation * still some I/O work todo with missing/dosage and subsetting sites and samples

subsetting fixes

41afc51

* add `-removeSamples` option * when subsetting samples and sites, also update `missing` for these subsets * fix bug in `pbwtRemoveSites` where it would not print out sites beyond the last site in the `removeSites` file * add tests

revert the recent imputeMissing addition to referenceImpute

bd9a8d3

It was not practical. For a large input cohort most sites will have at least one sample with missing data and the given implementation would fall over

include removeSamples in the list of options

1d940e4

referenceImpute now records sites which were typed

f65287d

reverse logic of Site->typed to Site->isImputed

c0c6ae7

code cleanup

61971e5

revert whitespace changes to clean up diff

clean up sample handling

a880d7e

rename `sample()` to `pbwtSample()` to indicate it is getting indexing into the `p->samples` array rather than the global `sampleDict`. `sample()` then is the used to look up in `sampleDict`.

remove need for global isX and isY variables

ecd9fbc

reading the `.samples` file will now guess `isX` and `isY` by comparing the number of samples with the number of haplotypes. `pbwtReadVcfGT` requires the command line `-X` and `-Y` option to be set

remove some code duplication

23b7f58

in progress: changes to metadata handling

7664577

progress commit

f11ff51

not entirely sure what state this is in...

Fix in -X -loadSamples functionality

1e62967

Because a static sample dictionary is used and two sample sets are read (once for input VCF and once for the reference panel), the sample indexes were incorrect.

Prevent segfault if empty file was given

9d20e91

Merge pull request #1 from pd3/feature/ploidy

968380f

Feature/ploidy

Merge remote-tracking branch 'feature/ploidy' into production

9e6f79a

Clean memory leaks

c5cf324

for the following command: valgrind ./pbwt -log /dev/null -readVcfGT test/refImpute.in.vcf -referenceImpute xxx/OMNI -writeVcf /dev/null

Tentative fix for dosages of typed genotypes, force them to {0,1}

8ce3e4d

The samples array is padded with an empty name

f682390

Prevent segfault if no arguments were given

0bf5255

Fix a typo: iterate over all ALT alleles, do not get stuck on the first

e74f522

pd3 and others added 10 commits January 30, 2017 10:52

Merge remote-tracking branch 'pd3/feature/ploidy/feature/ploidy' into…

832335e

… production

Sanity check for {0,1} dosages at typed markers

56e7f61

Test whether the assumption is true rather than inserting hard values to notify us about possible problems.

Bump minor version to 3.1

4705410

Similarly to f682390, the samples array is padded with an empty name

017fdc9

Fix the -X -loadSamples functionality

bf6ebe2

by avoiding loading reference panel samples, thus there is no need two store different sets of samples in a global hash. This replaces the previous commits 017fdc9, f682390, and 1e62967.

in pbwtImpute3() disable sparse, correct stop point for match countin…

00e0b52

…g, correct order of updating cursor; in paintAncestryMatrix() free a couple more arrays to fix memory leak

comment changes plus framework for imputing missing to major allele

ebb9d13

Fill missing target genotypes intersecting with reference panel with …

dca08a0

…major allele [temporary solution]

Updated tests

9a7a038

Remove forgotten debugging messages

f09141f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make up to date with VertebrateResequencing / pbwt ? #50

Make up to date with VertebrateResequencing / pbwt ? #50

CholoTook commented Mar 8, 2022

CholoTook commented Mar 8, 2022

richarddurbin commented Mar 8, 2022 via email

CholoTook commented Mar 8, 2022

CholoTook commented Mar 8, 2022 via email

Make up to date with VertebrateResequencing / pbwt ? #50

Are you sure you want to change the base?

Make up to date with VertebrateResequencing / pbwt ? #50

Conversation

CholoTook commented Mar 8, 2022

CholoTook commented Mar 8, 2022

richarddurbin commented Mar 8, 2022 via email

CholoTook commented Mar 8, 2022

CholoTook commented Mar 8, 2022 via email