Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI multicluster + OIDC environment #2619

Merged
merged 45 commits into from
Apr 5, 2021
Merged

Add CI multicluster + OIDC environment #2619

merged 45 commits into from
Apr 5, 2021

Conversation

antgamdia
Copy link
Contributor

Description of the change

(note: this PR supersedes #2610. It's better to split the multicluster environment and the general improvements)

This PR updates our integration tests to use the multicluster environment so that we can detect issues and errors (as proposed in #2376).

The environment is based on the one we already use for dev purposes, that is: dex/openldap + two kind clusters.
There are, also, a couple of tweaks since the integration tests run inside a container; therefore, the ingress URL is not available and there are some issues related to the cookies.

Benefits

Testing Kubeapps in multicluster environments with OIDC.

Possible drawbacks

As the complexity of the CI increases, the flakiness of the overall CI systems might also increase.

Applicable issues

Additional information

This PR depends upon #2618 to be merged.

As an improvement, we can also consider including the OIDC via Pinniped. It shouldn't be a big deal since all the multicluster effort is already done in this PR.

@antgamdia antgamdia changed the title Multicluster ci Add CI multicluster + OIDC environment Mar 29, 2021
Base automatically changed from minor-dry-ci-fixes to master March 29, 2021 15:51
@antgamdia antgamdia changed the base branch from master to dependabot/npm_and_yarn/dashboard/ajv-8.0.1 March 29, 2021 15:55
@antgamdia antgamdia changed the base branch from dependabot/npm_and_yarn/dashboard/ajv-8.0.1 to master March 29, 2021 15:55
);

// wait for the loading msg to disappear
await page.waitForFunction(() => !document.querySelector(".margin-t-xxl"));
await page.waitForFunction(() => !document.querySelector(".margin-t-xxl cds-progress-circle"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to use the class margin-t-xxl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this not the best way of doing this though, the progress circle may appear and disappear if things are re-requested.

The expect(page).toClick(... should already wait for the loading gif to disappear

Copy link
Contributor

@andresmgot andresmgot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @antgamdia. This is looking good, just some minor comments.

page,
process.env.USE_MULTICLUSTER_OIDC_ENV,
"/",
process.env.ADMIN_TOKEN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not add this process.env.ADMIN_TOKEN (just to make clear this test won't work with a token)

Comment on lines 20 to 21
// wait for the loading msg to disappear
await page.waitForFunction(() => !document.querySelector("cds-progress-circle"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless there is a reason that waiting for the button doesn't work I won't add these kind of waits, it just makes the tests more difficult to maintain.

Comment on lines 43 to 50
try {
// TODO(andresmgot): Remove this line once 2.3 is released
await expect(page).toClick("cds-button", { text: "Add new credentials" });
} catch(e) {
await expect(page).toClick(".btn-info-outline", { text: "Add new credentials" });
} catch (e) {
await expect(page).toClick(".btn-info-outline", {
text: "Add new credentials",
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this try-catch now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing in my fork I didn't build the new CI images (for saving time), and I would still need these lines. Now they can be safely removed, thanks for the reminder!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you forgot to remove the try-catch?

Copy link
Contributor Author

@antgamdia antgamdia Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I removed them indeed. However, the CI started failing since the selectors weren't been found, so I reverted the change.

https://app.circleci.com/pipelines/github/kubeapps/kubeapps/2614/workflows/dee98a52-3d20-47a6-9a76-a1fa16351eaf/jobs/42596/artifacts

 FAIL  use-cases/operator-deployment.js (108.546 s)

 RUNS  ...
  ● Deploys an Operator

    TimeoutError: Element div.modal-dialog.modal-md > div > div.modal-body > div > div > cds-button:nth-child(2) (text: "Delete") not found

      waiting for function failed: timeout 30000ms exceeded

But, since the tests were failing because of other reasons, I'm gonna trigger again a new build and let's see what happens.
If it is still failing I guess it will be something related to the built image.

Comment on lines 61 to 67
try {
// TODO(andresmgot): Remove this line once 2.3 is released
await expect(page).toClick(".secondary-input cds-button", { text: "Submit" });
} catch(e) {
await expect(page).toClick(".secondary-input cds-button", {
text: "Submit",
});
} catch (e) {
await expect(page).toClick(".btn-info-outline", { text: "Submit" });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, we can remove the try-catch

Comment on lines 14 to 19
// wait until load
await page.evaluate(() => {
[...document.querySelectorAll(".kubeapps-dropdown-header")].find(element =>
element.outerText.includes("Current Context"),
);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this better waits for the text "apache" to appear

Comment on lines 16 to 17
// wait for the loading msg to disappear
await page.waitForFunction(() => !document.querySelector("cds-progress-circle"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, wait for "prometheus" (not sure if the timeout below works).

Comment on lines 27 to 41
const isAlreadyDeployed = await page.evaluate(
() => document.querySelector("cds-button[disabled]") !== null,
);

await utils.retryAndRefresh(page, 2, async () => {
// The CSV takes a bit to get populated
await expect(page).toMatch("Installed");
});
if (!isAlreadyDeployed) {
// Deploy the Operator
await expect(page).toClick("cds-button", { text: "Deploy" });

await utils.retryAndRefresh(page, 4, async () => {
// The CSV takes a bit to get populated
await expect(page).toMatch("Installed", { timeout: 10000 });
});
} else {
console.log("Warning: the operator has already been deployed");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if you already replied, why the isAlreadyDeployed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep (sorry for dropping the former PR, but I wanted to split the changes for ease the review process)
Check it at #2610 (comment)
Since uninstalling the operator is not that easy, I left this if for those cases where we are executing the tests locally. I don't mind removing it if you want, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be fine (if for some reason this is bypassing the operator installation, it should fail when creating an instance).

@@ -37,24 +47,31 @@ test("Deploys an Operator", async () => {
await expect(page).toMatch("Operators", { timeout: 10000 });

// Filter out charts to search only for the prometheus operator
await expect(page).toClick("label", { text: "Operators" });
await expect(page).toClick("label", { text: "Operators", timeout: 10000 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 351 to 352
testsToIgnore=("operator-deployment.js" "${testsToIgnore[@]}")
testsToIgnore=("add-multicluster-deployment.js" "${testsToIgnore[@]}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use a single line for this

@@ -339,6 +368,10 @@ kubectl create serviceaccount kubeapps-view -n kubeapps
kubectl create role view-secrets --verb=get,list,watch --resource=secrets
kubectl create rolebinding kubeapps-view-secret --role view-secrets --serviceaccount kubeapps:kubeapps-view
kubectl create clusterrolebinding kubeapps-view --clusterrole=view --serviceaccount kubeapps:kubeapps-view
## Create view user (oidc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the admin user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already created with cluster-admin role with the default RBAC (kubeapps-operator@example.com).
This step is just modifying the kubeapps-user@example.com, but we can perform this step in the same place as the operator user for cohesion if you may prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yes, better to have them in the same place.

Comment on lines +3 to +6
class CustomSequencer extends Sequencer {
sort(tests) {
const copyTests = Array.from(tests);
return copyTests.sort((testA, testB) => (testA.path > testB.path ? 1 : -1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the tests use to be executed in alphabetical order, there is no actual guarantee of that. It was causing a failure in the multicluster test because the secret created in the secret-repo test wasn't being found.
This sequencer just ensures this alphabetical order (I've pushed the integration img with this dep as well)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are saying that creating a private repository causes an error in the multicluster scenario? (That sounds like a bug we should fix). If that's the case, can you open an issue for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't yet support app repos (let alone secret ones) on other clusters (#1982) because the interface to sync with the asset svc is via the DB (#2394) (left over from the read-only monocular).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, I always forget about this issue.
Perhaps we should name the tests like "01-xxxx.js", "00-xxx.js" just to make clear that they are being executed in order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that private repositories are not supported in the second cluster but what I don't understand is why the order of the tests affects the result. I thought that creating a private repo in the default cluster caused an error when trying to deploy in the second cluster but that may not be it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, ok, I see your point now and it may be an issue as you said:

When creating a private repo first, we are creating the docker credentials. Somehow, when deploying in the second cluster, it tries to retrieve these credentials (not available, though), and fails, therefore.

image

Let me replicate the issue in my dev environment and will file an issue with the details.

However, with regards to this PR, I think (for the sake of replicability) to still define an explicit order during the test suite execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to reproduce this issue in my dev environment. I set up the multicluster env and: 1) tried to manually create the private repo and then a new deployment; 2) run the e2e test directly from yarn start ....
In both cases, the credentials are being properly stored and the app deploys seamlessly as well.

That's odd... :S

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably added a namespaced private repository. If you create it in the kubeapps namespace, you are likely able to reproduce the issue.

@@ -41,12 +41,21 @@ k8s_wait_for_deployment() {

debug "Waiting for deployment ${deployment} to be successfully rolled out..."
# Avoid to exit the function if the rollout fails
silence kubectl rollout status --namespace "$namespace" deployment "$deployment" || exit_code=$?
silence kubectl rollout status --namespace "$namespace" deployment "$deployment" -w --timeout=60s || exit_code=$?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are not strictly necessary (the main error was I didn't pass the db password 🤦 ), but I think they make the function work as expected, I mean, performing an actual waiting, rather than digesting the exit_code for logging purposes.

@@ -169,13 +174,34 @@ installOrUpgradeKubeapps() {
--set kubeops.replicaCount=1 \
--set assetsvc.replicaCount=1 \
--set dashboard.replicaCount=1 \
--set postgresql.replication.enabled=false \
--set postgresql.postgresqlPassword=password \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mental note: don't forget this flag when upgrading kubeapps

@antgamdia
Copy link
Contributor Author

LGTM when you fix the current error.

Despite my sorrow and regardless of how I want much to press the merge button :P, I would need another +1 just to make sure the new changes are ok with you as well.

@absoludity
Copy link
Contributor

LGTM when you fix the current error.

Despite my sorrow and regardless of how I want much to press the merge button :P, I would need another +1 just to make sure the new changes are ok with you as well.

Hah! Andres can merge it if happy with the changes (I thought it might have just been a few, but I see lots of commits, so I'll let Andres continue it). Enjoy your break o/

Copy link
Contributor

@andresmgot andresmgot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @antgamdia. I have some (minor) comments

Comment on lines +3 to +6
class CustomSequencer extends Sequencer {
sort(tests) {
const copyTests = Array.from(tests);
return copyTests.sort((testA, testB) => (testA.path > testB.path ? 1 : -1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are saying that creating a private repository causes an error in the multicluster scenario? (That sounds like a bug we should fix). If that's the case, can you open an issue for it?

Comment on lines 43 to 50
try {
// TODO(andresmgot): Remove this line once 2.3 is released
await expect(page).toClick("cds-button", { text: "Add new credentials" });
} catch(e) {
await expect(page).toClick(".btn-info-outline", { text: "Add new credentials" });
} catch (e) {
await expect(page).toClick(".btn-info-outline", {
text: "Add new credentials",
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you forgot to remove the try-catch?

Comment on lines 259 to 262
# TODO(agamez): Remove these lines in the next version
kubectl delete clusterrole -n kubapps kubeapps:controller:kubeops-ns-discovery-kubeapps kubeapps:controller:kubeops-operators-kubeapps kubeapps:kubeapps:apprepositories-read kubeapps:kubeapps:apprepositories-refresh kubeapps:kubeapps:apprepositories-write || true
kubectl delete clusterrolebinding -n kubapps kubeapps:controller:kubeapps:apprepositories-read kubeapps:controller:kubeops-ns-discovery-kubeapps || true
kubectl delete apprepositories.kubeapps.com -n kubeapps bitnami || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure you should need these?

The clusterrole/clusterrolebinding should be adopted by the new version and the apprepository should not exist because we are using --set apprepository.initialRepos=null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I think you saw a former commit, 7086d6f should fix it.
Thanks for the heads-up anyway :)

kubectl get pods --namespace "$namespace"
sleep 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't add sleeps this long. I don't think we need to retry here.

Copy link
Contributor

@andresmgot andresmgot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just don't understand very well why the tests need to be ordered.

Comment on lines +3 to +6
class CustomSequencer extends Sequencer {
sort(tests) {
const copyTests = Array.from(tests);
return copyTests.sort((testA, testB) => (testA.path > testB.path ? 1 : -1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that private repositories are not supported in the second cluster but what I don't understand is why the order of the tests affects the result. I thought that creating a private repo in the default cluster caused an error when trying to deploy in the second cluster but that may not be it.

@antgamdia antgamdia merged commit b29a824 into master Apr 5, 2021
@antgamdia antgamdia deleted the multicluster-ci branch April 5, 2021 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI doesn't test deployment to other clusters so it breaks.
3 participants