Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OWLS-87753 - Fix to start all managed server pods when watch event notifications not received in jumbo k8s cluster #2188

Merged
merged 4 commits into from
Feb 11, 2021

Conversation

ankedia
Copy link
Member

@ankedia ankedia commented Feb 10, 2021

OWLS-87753- Fix for starting managed servers when watch event notifications are missed.

This change creates waiters for each managed server pod in ManagedServerUpIterator step. These fibers are started in parallel with fibers that schedule and actually start the managed server pods. The waiter fibers invoke WaitForReady step for the corresponding managed server. At the time when WaitFotReady is invoked, managed server pods would not have been created. Hence it passes resourceName to the WaitForReady step instead of the initial resource. The managed server pod state is updated periodically in the DomainPresenceInfo when the callback onSuccess is invoked (every 5 seconds) in waitForReady step. The createReadAndIfReadyCheckStep uses domain-id and namespace from the packet instead of getting it from the initial resource.

The link for integration test results is at - https://build.weblogick8s.org:8443/job/weblogic-kubernetes-operator-kind-new/4024/ . There are 5 failures but they seem unrelated to this change.

return createReadAsyncStep(getName(), getNamespace(), getDomainUid(), resumeIfReady(callback));
} else {
DomainPresenceInfo info = packet.getSpi(DomainPresenceInfo.class);
return createReadAsyncStep(resourceName, Optional.ofNullable(info).map(i -> i.getNamespace()).orElse(null),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add check to ensure resourceName is not null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resourceName and initialResource are final variables and are initialized in the constructor. I'm assuming that either initialResource or resourceName will be non-null for the resource that needs to be checked. I can add a non-null check to be safe but I wasn't sure if that's going to be necessary. Thanks.

@alai8
Copy link
Member

alai8 commented Feb 10, 2021

Will there be new tests added for this change? If not in this PR, please file a jira so we won't forget to add tests. Thanks

@ankedia
Copy link
Member Author

ankedia commented Feb 11, 2021

Will there be new tests added for this change? If not in this PR, please file a jira so we won't forget to add tests. Thanks
Yes, I will file JIRA for adding unit and integration tests. It's a bit hard to unit test this change because the issue is caused by missed events but I'll look into it as time permits.

Copy link
Member

@alai8 alai8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the latest change address the case if resourceName is null. But it is a minor issue.

packet.clone(),
null);
}

private Step createReadAndIfReadyCheckStep(Callback callback) {
return createReadAsyncStep(getName(), getNamespace(), getDomainUid(), resumeIfReady(callback));
private Step createReadAndIfReadyCheckStep(Callback callback, Packet packet) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow our Step/Packet pattern. The Step is typically stateless or, at least, can be used for many different calls. The Packet is given on the Step.apply call. The logic in the else should happen in the Step.apply().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the suggested change, please let me know if I missed something. Thanks.

Copy link
Member

@rjeberhard rjeberhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the behavior in the "else" in to a small Step class that returns the correct next Step from its apply() method. This way, you don't need to pass the Packet as a parameter.

@rjeberhard rjeberhard merged commit 44dbf2a into release/3.1.3 Feb 11, 2021
rjeberhard added a commit that referenced this pull request Feb 12, 2021
* Update for 3.1.3

* keep namespaces processed in batches (#2189)

* OWLS-87753 - Fix to start all managed server pods when watch event notifications not received in jumbo k8s cluster (#2188)

* Fix for OWLS_87753 to start managed server waiters in case watch event notifications are not received

* Check for null initialResource

* PR review comment - added null check for resourceName

* Address PR review comment.

* Release note update

* Spelling and grammar updates

Co-authored-by: Dongbo Xiao <dongbo.xiao@oracle.com>
Co-authored-by: Anil Kedia <37935279+ankedia@users.noreply.github.com>
@ankedia ankedia deleted the owls_87753_startup_fix branch March 3, 2021 21:50
rjeberhard pushed a commit to rjeberhard/weblogic-kubernetes-operator that referenced this pull request Apr 14, 2023
…tifications not received in jumbo k8s cluster (oracle#2188)

* Fix for OWLS_87753 to start managed server waiters in case watch event notifications are not received

* Check for null initialResource

* PR review comment - added null check for resourceName

* Address PR review comment.
rjeberhard pushed a commit that referenced this pull request Apr 19, 2023
…tifications not received in jumbo k8s cluster (#2188)

* Fix for OWLS_87753 to start managed server waiters in case watch event notifications are not received

* Check for null initialResource

* PR review comment - added null check for resourceName

* Address PR review comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants