Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance optimizations for GenotypeGVCFs #1957

Merged
merged 1 commit into from
Jul 6, 2016
Merged

Conversation

akiezun
Copy link
Contributor

@akiezun akiezun commented Jun 30, 2016

part of work tracked here #1608

@lbergelson please review. The corresponding protected branch is https://github.com/broadinstitute/gatk-protected/tree/ak_ggvcf_performance

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.008%) to 81.686% when pulling 321c561 on ak_ggvcf_performance into d0db982 on master.

final double[] dependentSetLog10Likelihoods = dependentSet.getLog10Likelihoods();
final int[] counts = targetSet.getACcounts().getCounts();

for ( int j = (totalK/2)-1, n = targetSetLog10Likelihoods.length; j < n; j++ ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If totalK is 0 (or less) this produces a different result than before. I'm not sure if that's possible or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. i think totalK may be 0 sometimes. I'll leave this as for ( int j = 1, n = targetSetLog10Likelihoods.length; j < n; j++ ) {

@lbergelson
Copy link
Member

@akiezun minor comments

@lbergelson lbergelson assigned akiezun and unassigned lbergelson Jul 5, 2016
@coveralls
Copy link
Collaborator

coveralls commented Jul 6, 2016

Coverage Status

Coverage increased (+0.0008%) to 81.699% when pulling 3d63672 on ak_ggvcf_performance into 1637de1 on master.

@akiezun akiezun assigned lbergelson and unassigned akiezun Jul 6, 2016
@akiezun
Copy link
Contributor Author

akiezun commented Jul 6, 2016

back to @lbergelson for second review. log10 cache is now static (and synchronized at expansion only) as in gatk3 - it looks faster (no map lookup that would be needed in the thread local solution)

@lbergelson
Copy link
Member

@akiezun My only concern now is that someone takes log10 of a very large number triggering a massive and slow cache expansion. This caching scheme is good for clustered queries of small values, but terrible for sparse large queries. Is that a case we need to consider?

@akiezun
Copy link
Contributor Author

akiezun commented Jul 6, 2016

The access pattern I saw is that the queries start small and creep up to
tens or hundreds of thousands (Order of the queried numbers). With the
exponential cache expansion, the cache quickly catches up. So it's fine.
We'll revise when needed.

On Wednesday, July 6, 2016, Louis Bergelson notifications@github.com
wrote:

@akiezun https://github.com/akiezun My only concern now is that someone
takes log10 of a very large number triggering a massive and slow cache
expansion. This caching scheme is good for clustered queries of small
values, but terrible for sparse large queries. Is that a case we need to
consider?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1957 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AB5rL32j0zFsPo1VQ4T6gxnEHvjpYZ02ks5qS_VhgaJpZM4JCXE9
.

Sent from Gmail Mobile

@lbergelson
Copy link
Member

👍

@lbergelson lbergelson removed their assignment Jul 6, 2016
@lbergelson lbergelson merged commit 575984f into master Jul 6, 2016
@lbergelson lbergelson deleted the ak_ggvcf_performance branch July 6, 2016 19:08
magicDGS pushed a commit to bioinformagik/gatk that referenced this pull request Jul 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants