-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CI multicluster + OIDC environment #2619
Conversation
); | ||
|
||
// wait for the loading msg to disappear | ||
await page.waitForFunction(() => !document.querySelector(".margin-t-xxl")); | ||
await page.waitForFunction(() => !document.querySelector(".margin-t-xxl cds-progress-circle")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to use the class margin-t-xxl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this not the best way of doing this though, the progress circle may appear and disappear if things are re-requested.
The expect(page).toClick(...
should already wait for the loading gif to disappear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @antgamdia. This is looking good, just some minor comments.
page, | ||
process.env.USE_MULTICLUSTER_OIDC_ENV, | ||
"/", | ||
process.env.ADMIN_TOKEN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not add this process.env.ADMIN_TOKEN
(just to make clear this test won't work with a token)
// wait for the loading msg to disappear | ||
await page.waitForFunction(() => !document.querySelector("cds-progress-circle")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless there is a reason that waiting for the button doesn't work I won't add these kind of waits, it just makes the tests more difficult to maintain.
try { | ||
// TODO(andresmgot): Remove this line once 2.3 is released | ||
await expect(page).toClick("cds-button", { text: "Add new credentials" }); | ||
} catch(e) { | ||
await expect(page).toClick(".btn-info-outline", { text: "Add new credentials" }); | ||
} catch (e) { | ||
await expect(page).toClick(".btn-info-outline", { | ||
text: "Add new credentials", | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this try-catch now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing in my fork I didn't build the new CI images (for saving time), and I would still need these lines. Now they can be safely removed, thanks for the reminder!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you forgot to remove the try-catch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I removed them indeed. However, the CI started failing since the selectors weren't been found, so I reverted the change.
FAIL use-cases/operator-deployment.js (108.546 s)
RUNS ...
● Deploys an Operator
TimeoutError: Element div.modal-dialog.modal-md > div > div.modal-body > div > div > cds-button:nth-child(2) (text: "Delete") not found
waiting for function failed: timeout 30000ms exceeded
But, since the tests were failing because of other reasons, I'm gonna trigger again a new build and let's see what happens.
If it is still failing I guess it will be something related to the built image.
try { | ||
// TODO(andresmgot): Remove this line once 2.3 is released | ||
await expect(page).toClick(".secondary-input cds-button", { text: "Submit" }); | ||
} catch(e) { | ||
await expect(page).toClick(".secondary-input cds-button", { | ||
text: "Submit", | ||
}); | ||
} catch (e) { | ||
await expect(page).toClick(".btn-info-outline", { text: "Submit" }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, we can remove the try-catch
// wait until load | ||
await page.evaluate(() => { | ||
[...document.querySelectorAll(".kubeapps-dropdown-header")].find(element => | ||
element.outerText.includes("Current Context"), | ||
); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this better waits for the text "apache" to appear
// wait for the loading msg to disappear | ||
await page.waitForFunction(() => !document.querySelector("cds-progress-circle")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, wait for "prometheus" (not sure if the timeout below works).
const isAlreadyDeployed = await page.evaluate( | ||
() => document.querySelector("cds-button[disabled]") !== null, | ||
); | ||
|
||
await utils.retryAndRefresh(page, 2, async () => { | ||
// The CSV takes a bit to get populated | ||
await expect(page).toMatch("Installed"); | ||
}); | ||
if (!isAlreadyDeployed) { | ||
// Deploy the Operator | ||
await expect(page).toClick("cds-button", { text: "Deploy" }); | ||
|
||
await utils.retryAndRefresh(page, 4, async () => { | ||
// The CSV takes a bit to get populated | ||
await expect(page).toMatch("Installed", { timeout: 10000 }); | ||
}); | ||
} else { | ||
console.log("Warning: the operator has already been deployed"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if you already replied, why the isAlreadyDeployed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep (sorry for dropping the former PR, but I wanted to split the changes for ease the review process)
Check it at #2610 (comment)
Since uninstalling the operator is not that easy, I left this if
for those cases where we are executing the tests locally. I don't mind removing it if you want, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be fine (if for some reason this is bypassing the operator installation, it should fail when creating an instance).
@@ -37,24 +47,31 @@ test("Deploys an Operator", async () => { | |||
await expect(page).toMatch("Operators", { timeout: 10000 }); | |||
|
|||
// Filter out charts to search only for the prometheus operator | |||
await expect(page).toClick("label", { text: "Operators" }); | |||
await expect(page).toClick("label", { text: "Operators", timeout: 10000 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only timeout
option working is for the toMatch
function:
script/e2e-test.sh
Outdated
testsToIgnore=("operator-deployment.js" "${testsToIgnore[@]}") | ||
testsToIgnore=("add-multicluster-deployment.js" "${testsToIgnore[@]}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use a single line for this
script/e2e-test.sh
Outdated
@@ -339,6 +368,10 @@ kubectl create serviceaccount kubeapps-view -n kubeapps | |||
kubectl create role view-secrets --verb=get,list,watch --resource=secrets | |||
kubectl create rolebinding kubeapps-view-secret --role view-secrets --serviceaccount kubeapps:kubeapps-view | |||
kubectl create clusterrolebinding kubeapps-view --clusterrole=view --serviceaccount kubeapps:kubeapps-view | |||
## Create view user (oidc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the admin user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already created with cluster-admin role with the default RBAC (kubeapps-operator@example.com
).
This step is just modifying the kubeapps-user@example.com
, but we can perform this step in the same place as the operator user for cohesion if you may prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yes, better to have them in the same place.
class CustomSequencer extends Sequencer { | ||
sort(tests) { | ||
const copyTests = Array.from(tests); | ||
return copyTests.sort((testA, testB) => (testA.path > testB.path ? 1 : -1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the tests use to be executed in alphabetical order, there is no actual guarantee of that. It was causing a failure in the multicluster test because the secret created in the secret-repo test wasn't being found.
This sequencer just ensures this alphabetical order (I've pushed the integration img with this dep as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are saying that creating a private repository causes an error in the multicluster scenario? (That sounds like a bug we should fix). If that's the case, can you open an issue for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, I always forget about this issue.
Perhaps we should name the tests like "01-xxxx.js", "00-xxx.js" just to make clear that they are being executed in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that private repositories are not supported in the second cluster but what I don't understand is why the order of the tests affects the result. I thought that creating a private repo in the default cluster caused an error when trying to deploy in the second cluster but that may not be it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, ok, I see your point now and it may be an issue as you said:
When creating a private repo first, we are creating the docker credentials. Somehow, when deploying in the second cluster, it tries to retrieve these credentials (not available, though), and fails, therefore.
Let me replicate the issue in my dev environment and will file an issue with the details.
However, with regards to this PR, I think (for the sake of replicability) to still define an explicit order during the test suite execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to reproduce this issue in my dev environment. I set up the multicluster env and: 1) tried to manually create the private repo and then a new deployment; 2) run the e2e test directly from yarn start ...
.
In both cases, the credentials are being properly stored and the app deploys seamlessly as well.
That's odd... :S
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably added a namespaced private repository. If you create it in the kubeapps
namespace, you are likely able to reproduce the issue.
@@ -41,12 +41,21 @@ k8s_wait_for_deployment() { | |||
|
|||
debug "Waiting for deployment ${deployment} to be successfully rolled out..." | |||
# Avoid to exit the function if the rollout fails | |||
silence kubectl rollout status --namespace "$namespace" deployment "$deployment" || exit_code=$? | |||
silence kubectl rollout status --namespace "$namespace" deployment "$deployment" -w --timeout=60s || exit_code=$? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are not strictly necessary (the main error was I didn't pass the db password 🤦 ), but I think they make the function work as expected, I mean, performing an actual waiting, rather than digesting the exit_code for logging purposes.
@@ -169,13 +174,34 @@ installOrUpgradeKubeapps() { | |||
--set kubeops.replicaCount=1 \ | |||
--set assetsvc.replicaCount=1 \ | |||
--set dashboard.replicaCount=1 \ | |||
--set postgresql.replication.enabled=false \ | |||
--set postgresql.postgresqlPassword=password \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mental note: don't forget this flag when upgrading kubeapps
Despite my sorrow and regardless of how I want much to press the merge button :P, I would need another +1 just to make sure the new changes are ok with you as well. |
Hah! Andres can merge it if happy with the changes (I thought it might have just been a few, but I see lots of commits, so I'll let Andres continue it). Enjoy your break o/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @antgamdia. I have some (minor) comments
class CustomSequencer extends Sequencer { | ||
sort(tests) { | ||
const copyTests = Array.from(tests); | ||
return copyTests.sort((testA, testB) => (testA.path > testB.path ? 1 : -1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are saying that creating a private repository causes an error in the multicluster scenario? (That sounds like a bug we should fix). If that's the case, can you open an issue for it?
try { | ||
// TODO(andresmgot): Remove this line once 2.3 is released | ||
await expect(page).toClick("cds-button", { text: "Add new credentials" }); | ||
} catch(e) { | ||
await expect(page).toClick(".btn-info-outline", { text: "Add new credentials" }); | ||
} catch (e) { | ||
await expect(page).toClick(".btn-info-outline", { | ||
text: "Add new credentials", | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you forgot to remove the try-catch?
script/e2e-test.sh
Outdated
# TODO(agamez): Remove these lines in the next version | ||
kubectl delete clusterrole -n kubapps kubeapps:controller:kubeops-ns-discovery-kubeapps kubeapps:controller:kubeops-operators-kubeapps kubeapps:kubeapps:apprepositories-read kubeapps:kubeapps:apprepositories-refresh kubeapps:kubeapps:apprepositories-write || true | ||
kubectl delete clusterrolebinding -n kubapps kubeapps:controller:kubeapps:apprepositories-read kubeapps:controller:kubeops-ns-discovery-kubeapps || true | ||
kubectl delete apprepositories.kubeapps.com -n kubeapps bitnami || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure you should need these?
The clusterrole/clusterrolebinding should be adopted by the new version and the apprepository should not exist because we are using --set apprepository.initialRepos=null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I think you saw a former commit, 7086d6f should fix it.
Thanks for the heads-up anyway :)
script/libtest.sh
Outdated
kubectl get pods --namespace "$namespace" | ||
sleep 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't add sleeps this long. I don't think we need to retry here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just don't understand very well why the tests need to be ordered.
class CustomSequencer extends Sequencer { | ||
sort(tests) { | ||
const copyTests = Array.from(tests); | ||
return copyTests.sort((testA, testB) => (testA.path > testB.path ? 1 : -1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that private repositories are not supported in the second cluster but what I don't understand is why the order of the tests affects the result. I thought that creating a private repo in the default cluster caused an error when trying to deploy in the second cluster but that may not be it.
Description of the change
(note: this PR supersedes #2610. It's better to split the multicluster environment and the general improvements)
This PR updates our integration tests to use the multicluster environment so that we can detect issues and errors (as proposed in #2376).
The environment is based on the one we already use for dev purposes, that is: dex/openldap + two kind clusters.
There are, also, a couple of tweaks since the integration tests run inside a container; therefore, the ingress URL is not available and there are some issues related to the cookies.
Benefits
Testing Kubeapps in multicluster environments with OIDC.
Possible drawbacks
As the complexity of the CI increases, the flakiness of the overall CI systems might also increase.
Applicable issues
Additional information
This PR depends upon #2618 to be merged.
As an improvement, we can also consider including the OIDC via Pinniped. It shouldn't be a big deal since all the multicluster effort is already done in this PR.