Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove conflicting tls proxy secret generator. #567

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

HumairAK
Copy link
Contributor

@HumairAK HumairAK commented Feb 7, 2024

The issue resolved by this Pull Request:

Resolves https://issues.redhat.com/browse/RHOAIENG-2798

Description of your changes:

This solves a race condition that existed because of a workaround we've implemented. KFP exposes a service that is backed by the apiserver pod. In Data Science Pipelines, we want this service name to be dynamic -- ds-pipeline-{ DSPA NAME }}. In KFP v2, the running pipeline pods call back to the apiserver via its service, and they rely on the hardcoded service name of ml-pipeline (kubeflow/pipelines#9689). To temporarily work around this hardcoding, we're exposing both services -- our dynamically-named one and the hardcoded ml-pipeline one. (Both services point to the same apiserver pod.)

We use the OpenShift service ca operator to generate certs which we mount into the oauth proxy sidecar for the apiserver. This cert generation is triggered by an annotation on the service. When we created the yaml for the hardcoded ml-pipeline duplicate service, we mistakenly duplicated the annotation as well. This caused a race condition where sometimes the OpenShift service ca operator would see the ml-pipeline service first, causing our dynamically-named service to be in error state (message: secret dspa-master/ds-pipelines-proxy-tls-{dspa-name} does not have corresponding service UID). The workaround ml-pipeline service doesn't need the annotation, so we just remove it here.

see https://issues.redhat.com/browse/RHOAIENG-2798 for more details

Testing instructions

  1. deploy dspa, visit the api server route, ensure you hit oauth proxy, try a couple of times
  2. inspect the ds-pipelines-proxy-tls-sample ensure the owner of this secret is the service named ds-pipeline-<dspa_name>
  3. the ml-pipeline service is utilized by actual pipeline executions (due to hardcoded names in upstream kfp) so run one or two to confirm these remain unaffected, they should run to completion successfully (assuming no user code errors)

Checklist

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Copy link
Contributor

openshift-ci bot commented Feb 7, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from humairak. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dsp-developers
Copy link
Contributor

A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-567
An OCP cluster where you are logged in as cluster admin is required.

To use this image run the following:

cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/567/head
git checkout -b pullrequest 71a2eeaeaf3485cf31eccc2fbb4c9bc617d5044f
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-567"

More instructions here on how to deploy and test a Data Science Pipelines Application.

Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍
good catch

@gregsheremeta
Copy link
Contributor

I took a stab at putting necessary context in the PR description. Let me know if I got anything wrong.

@gregsheremeta
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Feb 7, 2024
@HumairAK HumairAK merged commit a8a25fb into opendatahub-io:main Feb 7, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants