-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OWLS-90180 Optimize detection of when to run introspection #2430
Conversation
} | ||
|
||
private Long getStartTime(Packet packet) { | ||
return Optional.ofNullable((Long) packet.get(JobHelper.START_TIME)).orElse(0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably ok, but if START_TIME is not set, then the elapsed time will be very long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is for the operator restart use case where the introspector job is started before the operator is restarted. The START_TIME will not be set in the packet in this case. It seems the elapsed time is only used to log a FINE level log message and I'm not sure if it's worth storing the START_TIME in the introspector config map if it's only used for logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this check by setting the START_TIME to job creation time for operator restart use-case.
…case of operator restart use case.
private Long getJobLazyDeletionTime(V1Job domainIntrospectorJob) { | ||
int retryIntervalSeconds = TuningParameters.getInstance().getMainTuning().domainPresenceRecheckIntervalSeconds; | ||
return Optional.ofNullable(domainIntrospectorJob.getMetadata()) | ||
.map(m -> m.getCreationTimestamp()).map(t -> t.toInstant().toEpochMilli()).orElse(0L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you converting these OffsetDateTime instances to epoch millis? Why not return an OffsetDateTime and use the isBefore() or isAfter() comparison methods. Similarly, the class has built-in support for adding seconds or millis, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, made the changes to use OffsetDateTime, isAfter() and plus() methods instead of converting it to epoch millis in fedcece .
new DomainProcessorImpl.IntrospectionRequestStep(info), | ||
createDomainIntrospectorJobStep(getNext())), packet); | ||
} else { | ||
packet.putIfAbsent(START_TIME, System.currentTimeMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
START_TIME could be an instance of OffsetDateTime.now(). There isn't a good reason to convert to a Long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in fedcece .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The changes in this PR are to address the use case where the operator starts a long-running introspector job and then the operator dies and is restarted – the newly restarted operator finds the existing introspector job and waits for those results rather than deleting this job and replacing it. Previously, in this use case, the operator deleted the existing job and replaced it with a new job.
For the case when the operator restarts and finds a failed job with a SEVERE message, it checks if the current time is greater than the sum of job creation time and the
domainPresenceRecheckIntervalSeconds
interval. If so, it deletes the failed job and replaces it with a new job.Integration test results - https://build.weblogick8s.org:8443/job/weblogic-kubernetes-operator-kind-new/5446/