Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OWLS-90180 Optimize detection of when to run introspection #2430

Merged
merged 11 commits into from
Jul 8, 2021

Conversation

ankedia
Copy link
Member

@ankedia ankedia commented Jun 28, 2021

The changes in this PR are to address the use case where the operator starts a long-running introspector job and then the operator dies and is restarted – the newly restarted operator finds the existing introspector job and waits for those results rather than deleting this job and replacing it. Previously, in this use case, the operator deleted the existing job and replaced it with a new job.

For the case when the operator restarts and finds a failed job with a SEVERE message, it checks if the current time is greater than the sum of job creation time and the domainPresenceRecheckIntervalSeconds interval. If so, it deletes the failed job and replaces it with a new job.
Integration test results - https://build.weblogick8s.org:8443/job/weblogic-kubernetes-operator-kind-new/5446/

}

private Long getStartTime(Packet packet) {
return Optional.ofNullable((Long) packet.get(JobHelper.START_TIME)).orElse(0L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably ok, but if START_TIME is not set, then the elapsed time will be very long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is for the operator restart use case where the introspector job is started before the operator is restarted. The START_TIME will not be set in the packet in this case. It seems the elapsed time is only used to log a FINE level log message and I'm not sure if it's worth storing the START_TIME in the introspector config map if it's only used for logging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this check by setting the START_TIME to job creation time for operator restart use-case.

private Long getJobLazyDeletionTime(V1Job domainIntrospectorJob) {
int retryIntervalSeconds = TuningParameters.getInstance().getMainTuning().domainPresenceRecheckIntervalSeconds;
return Optional.ofNullable(domainIntrospectorJob.getMetadata())
.map(m -> m.getCreationTimestamp()).map(t -> t.toInstant().toEpochMilli()).orElse(0L)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you converting these OffsetDateTime instances to epoch millis? Why not return an OffsetDateTime and use the isBefore() or isAfter() comparison methods. Similarly, the class has built-in support for adding seconds or millis, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, made the changes to use OffsetDateTime, isAfter() and plus() methods instead of converting it to epoch millis in fedcece .

new DomainProcessorImpl.IntrospectionRequestStep(info),
createDomainIntrospectorJobStep(getNext())), packet);
} else {
packet.putIfAbsent(START_TIME, System.currentTimeMillis());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

START_TIME could be an instance of OffsetDateTime.now(). There isn't a good reason to convert to a Long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fedcece .

Copy link
Member

@jshum2479 jshum2479 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rjeberhard rjeberhard merged commit 49a73b7 into main Jul 8, 2021
@ankedia ankedia deleted the owls_90180_optimize_intospector_run branch July 12, 2021 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants