-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve pod SHA256 values #2262
Conversation
// Add legacy prometheus annotations. These have to remain, as they are included in the computation | ||
// of the pod sha256 hash, and removing them will cause pods to roll when customers upgrade to new | ||
// versions of the operator. | ||
AnnotationHelper.annotateForPrometheus(metadata, "/wls-exporter", getMetricsPort()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line may cause a NPE on unboxing a null Integer if both getListenPort() and getSslPort() is null in getMetricsPort().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True - of course, such a domain would be pretty worthless. IMO the logic is wrong in any case, but I didn't want to change it here - all I have done is refactored the existing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like adding defensive code for invalid configurations, but the ugly part of this NPE is that when it happens it is hard to figure out what is going on. It would be helpful if we could at least detect the situation and log a warning message, instead of letting NPE occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a change that should catch that.
return metadata; | ||
} | ||
|
||
private Integer getMetricsPort() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure if the NPE will happen on return or on the method that calls this, although I would think it is latter. It is probably safer to catch the exception on the caller, or assign the port to a variable in this method, and check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should happen on the unboxing - so I should ensure that happens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, even that doesn't help. That kind of domain error causes problems all over the place. If you catch it here, it pops up elsewhere. This PR is not the place to fix that preexisting problem, I think. The code here does not cause the problem, and fixing it will require more effort and is out of scope. I am going to revert my last change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a simple class to test this, and unboxing occurred on the method call where null Integer value is assigned to an int parameter. In other words it does not happen in the return statement here and the try/catch should be on the calling method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public class Test {
public static void main(String[] args) {
try {
printValue(getValue());
} catch (Throwable t) {
System.out.println("t2 = " + t);
}
}
static Integer getValue() {
Integer value = null;
try {
return value;
} catch (Throwable t) {
System.out.println("t = " + t);
throw t;
}
}
static void printValue(int i) {
System.out.println("i = " + i);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the output is
t2 = java.lang.NullPointerException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably better to use a local variable to check the value. If it is null, log and return zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reproduce the problem. I can also reproduce it before this change. I can prevent getMetricsPort() from returning a null, which causes the problem later - and then the NPE happens in getLocalAdminProtocolChannelPort
.
I agree that there is a problem - but that problem has nothing to do with the PR, and should be addressed elsewhere. Could you please file a bug report or a GitHub issue so we can track it?
The basic problem is the logic by which we select the port for admin protocol. It should be done in one place, and it is done in multiple places. Worse, a change was made in 3.1.4 to the logic now served by getMetricsPort which cannot be changed, as it would cause pods to roll on upgrade if we do.
The best thing would probably be to catch the error in the domain long before we get here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that the logic of choosing a port to use is not clean and needs to be revisited.
I was glad that this particular section was removed by a previous PR, and was just concerned that this code is now added back (to prevent pods to roll) so the issue came back.
We can certainly handle this in a different jira.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to complete the current discussion. It is pretty rare that getLocalAdminProtocolChannelPort returns null since it looks into every port that is configured (admin port, listen port, SSL port, and NAPs). But getMetricsPort only looks for listen port and SSL port.
This reverts commit edaf31e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can track the NPE on unboxing and revisiting the logic of choosing a port for Prometheus in a different jira task.
This changes verifies some cases of SHA256 hash computations and ensures that they don't change over time. The hash values used as references are those generated by operator version 3.1.0.