Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yf fix GenotypeAlleleCount's toString #6376

Merged
merged 5 commits into from
Jan 20, 2020

Conversation

yfarjoun
Copy link
Contributor

GenotypeAlleleCount's toString function was broken, leading to a misleading debugging information showing up in the debugger

This PR fixes the bug and adds tests for toString output.

@yfarjoun yfarjoun requested a review from vruano January 15, 2020 14:00
@yfarjoun yfarjoun changed the title Yf fix genotype allele count to string Yf fix GenotypeAlleleCount's toString Jan 15, 2020
@@ -371,13 +371,12 @@ public String toUnphasedGenotypeString() {
return "";
}
final StringBuilder sb = new StringBuilder(distinctAlleleCount * 3);
for (int i = 0; i < distinctAlleleCount; i += 2) {
for (int i = 0; i < distinctAlleleCount * 2; i += 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid the multiplication * 2 at each iteration? Don't know if the VM will optimize this out. I wonder whether you could compare against sortedAlleleCounts.length but need to see the rest of the code to confirm.... too lazy

If you change this loop consider to skip the +2 increment on i and simple increase as it is need to access the index and count:

for (int i = 0; i < stop; ) {
    int index = sortedAlleleCounts[i++];
    int count = sortedAlleleCounts[i++];
    ....
}

@@ -371,13 +371,12 @@ public String toUnphasedGenotypeString() {
return "";
}
final StringBuilder sb = new StringBuilder(distinctAlleleCount * 3);
for (int i = 0; i < distinctAlleleCount; i += 2) {
for (int i = 0; i < distinctAlleleCount * 2; i += 2) {
final int alleleIndex = sortedAlleleCounts[i];
final int alleleCount = sortedAlleleCounts[i + 1];
for (int j = 0; j < alleleCount; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since counts is guaranteed not to be 0, then the string composition could be rather:

sb.append(index);
for (int j = 1; j < count; j++) {
    sb.append('/').append(index);
}
return sb.toString();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, you need to add a / after every allele, not only the alleles with counts >1. think about 0/1, whose sortedAlleleCounts array is (0,1,1,1)....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index is 0 and 1, count is 1 and 1. ... (index=0, count=1) ... then (index=1, count=1). count is always 1 or larger or that is my understanding....

at least in the example that you are giving count for both alleles (0 and 1) is 1 (the second number of the pair. (0, 1, 1, 1) is decomposed into (0, 1) and (1, 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case you could leave the code as it is... just noticed this optimization opportunity near your change so I suggested the edit but it has no concern with fix, so.

Yossi Farjoun added 2 commits January 20, 2020 17:56
- made tests more explicit
- added some more tests
@yfarjoun
Copy link
Contributor Author

done.

@@ -371,13 +371,12 @@ public String toUnphasedGenotypeString() {
return "";
}
final StringBuilder sb = new StringBuilder(distinctAlleleCount * 3);
for (int i = 0; i < distinctAlleleCount; i += 2) {
for (int i = 0; i < distinctAlleleCount * 2; i += 2) {
final int alleleIndex = sortedAlleleCounts[i];
final int alleleCount = sortedAlleleCounts[i + 1];
for (int j = 0; j < alleleCount; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index is 0 and 1, count is 1 and 1. ... (index=0, count=1) ... then (index=1, count=1). count is always 1 or larger or that is my understanding....

at least in the example that you are giving count for both alleles (0 and 1) is 1 (the second number of the pair. (0, 1, 1, 1) is decomposed into (0, 1) and (1, 1)

@@ -371,13 +371,12 @@ public String toUnphasedGenotypeString() {
return "";
}
final StringBuilder sb = new StringBuilder(distinctAlleleCount * 3);
for (int i = 0; i < distinctAlleleCount; i += 2) {
for (int i = 0; i < distinctAlleleCount * 2; i += 2) {
final int alleleIndex = sortedAlleleCounts[i];
final int alleleCount = sortedAlleleCounts[i + 1];
for (int j = 0; j < alleleCount; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case you could leave the code as it is... just noticed this optimization opportunity near your change so I suggested the edit but it has no concern with fix, so.

@yfarjoun yfarjoun merged commit e8bccd9 into master Jan 20, 2020
@yfarjoun yfarjoun deleted the yf_fix_GenotypeAlleleCount_toString branch January 20, 2020 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants