Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-27073: add cluster migration e2e test #2127

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
emailsender-manifests.yaml
central-chart/
pids-port-forward
# temp files created by multicluster e2e tests
cluster-list.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these come from / why can we not clean them up after the test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found them in deploy.sh - is there no way to have a cleanup.sh equivalent? Or better to leave the artifacts in case debugging is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those files would be created and not cleaned up. So you can use them for debugging already, this is just the .gitignore to prevent those files from being commited and pushed to the repo.

cluster-list2.json
fm-dataplane-config.yaml
# Ignore uploaded files in development
/storage/*

Expand Down
6 changes: 3 additions & 3 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@
"filename": "e2e/e2e_test.go",
"hashed_secret": "7f38822bc2b03e97325ff310099f457f6f788daf",
"is_verified": false,
"line_number": 300
"line_number": 294
}
],
"internal/dinosaur/pkg/api/public/api/openapi.yaml": [
Expand Down Expand Up @@ -312,7 +312,7 @@
"filename": "pkg/client/fleetmanager/mocks/client_moq.go",
"hashed_secret": "44e17306b837162269a410204daaa5ecee4ec22c",
"is_verified": false,
"line_number": 584
"line_number": 640
}
],
"pkg/client/redhatsso/api/api/openapi.yaml": [
Expand Down Expand Up @@ -416,5 +416,5 @@
}
]
},
"generated_at": "2024-11-26T16:50:48Z"
"generated_at": "2024-12-16T09:41:30Z"
}
16 changes: 16 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,22 @@ test/e2e: $(GINKGO_BIN)
./e2e/...
.PHONY: test/e2e

test/e2e/multicluster: $(GINKGO_BIN)
CLUSTER_ID=1234567890abcdef1234567890abcdef \
ENABLE_CENTRAL_EXTERNAL_CERTIFICATE=$(ENABLE_CENTRAL_EXTERNAL_CERTIFICATE) \
GITOPS_CONFIG_PATH=$(GITOPS_CONFIG_FILE) \
RUN_MULTICLUSTER_E2E=true
$(GINKGO_BIN) -r $(GINKGO_FLAGS) \
--randomize-suites \
--fail-on-pending --keep-going \
--cover --coverprofile=cover.profile \
--race --trace \
--json-report=e2e-report.json \
--timeout=$(TEST_TIMEOUT) \
--poll-progress-after=5m \
./e2e/multicluster
.PHONY: test/e2e/multicluster

# Deploys the necessary applications to the selected cluster and runs e2e tests inside the container
# Useful for debugging Openshift CI runs locally
test/deploy/e2e-dockerized:
Expand Down
35 changes: 35 additions & 0 deletions e2e/dns/record_cleanup.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
package dns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would waiting for "deleting" state instead of "deprovisioning" rely on the FM's cleanup of the routes instead of manually cleaning them up here? (sorry if missing something, not used to looking at the e2e tests as much)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about this. But I think FM would only start deleting the route records on deleting state.

We trigger a deletion and wait for deletion at the end of this test, which usually cleans up the records.

This logic is intended as an additional measure on test failures. Because on failures we might be stuck somewhere in the usual reconciliation loop which can lead to a situation where the deletion lifecycle logic doesn't get executed. In a case like this we don't want to leak those resources.


import (
"github.com/aws/aws-sdk-go/service/route53"
. "github.com/onsi/gomega"
"github.com/stackrox/acs-fleet-manager/internal/dinosaur/pkg/api/public"
"github.com/stackrox/acs-fleet-manager/internal/dinosaur/pkg/services"
)

// CleanupCentralRequestRecords deletes all route53 resoruces associated with the centralRequest
func CleanupCentralRequestRecords(route53Client *route53.Route53, centralRequest public.CentralRequest) {
dnsLoader := NewRecordsLoader(route53Client, centralRequest)
recordSets := dnsLoader.LoadDNSRecords()

action := string(services.CentralRoutesActionDelete)
changes := []*route53.Change{}
for _, rs := range recordSets {
c := &route53.Change{
Action: &action,
ResourceRecordSet: rs,
}
changes = append(changes, c)
}

if len(changes) == 0 {
return
}

_, err := route53Client.ChangeResourceRecordSets(&route53.ChangeResourceRecordSetsInput{
HostedZoneId: dnsLoader.rhacsZone.Name,
ChangeBatch: &route53.ChangeBatch{Changes: changes},
})

Expect(err).ToNot(HaveOccurred())
}
5 changes: 3 additions & 2 deletions e2e/e2e_canary_upgrade_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (

. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/stackrox/acs-fleet-manager/e2e/testutil"
"github.com/stackrox/acs-fleet-manager/fleetshard/pkg/central/operator"
"github.com/stackrox/acs-fleet-manager/internal/dinosaur/constants"
"github.com/stackrox/acs-fleet-manager/internal/dinosaur/pkg/api/public"
Expand Down Expand Up @@ -61,7 +62,7 @@ var _ = Describe("Fleetshard-sync Targeted Upgrade", Ordered, func() {
})

BeforeEach(func() {
SkipIf(!features.TargetedOperatorUpgrades.Enabled() || !runCanaryUpgradeTests, "Skipping canary upgrade test")
testutil.SkipIf(!features.TargetedOperatorUpgrades.Enabled() || !runCanaryUpgradeTests, "Skipping canary upgrade test")
option := fmImpl.OptionFromEnv()
auth, err := fmImpl.NewStaticAuth(ctx, fmImpl.StaticOption{StaticToken: option.Static.StaticToken})
Expect(err).ToNot(HaveOccurred())
Expand Down Expand Up @@ -249,7 +250,7 @@ var _ = Describe("Fleetshard-sync Targeted Upgrade", Ordered, func() {
It("delete central", func() {
Expect(deleteCentralByID(ctx, client, createdCentral.Id)).
To(Succeed())
Eventually(assertCentralRequestDeprovisioning(ctx, client, createdCentral.Id)).
Eventually(testutil.AssertCentralRequestDeprovisioning(ctx, client, createdCentral.Id)).
WithTimeout(waitTimeout).
WithPolling(defaultPolling).
Should(Succeed())
Expand Down
93 changes: 4 additions & 89 deletions e2e/e2e_suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,9 @@ import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
openshiftRouteV1 "github.com/openshift/api/route/v1"
"github.com/stackrox/acs-fleet-manager/e2e/testutil"
"github.com/stackrox/acs-fleet-manager/fleetshard/config"
"github.com/stackrox/acs-fleet-manager/fleetshard/pkg/k8s"
"github.com/stackrox/acs-fleet-manager/internal/dinosaur/constants"
"github.com/stackrox/acs-fleet-manager/internal/dinosaur/pkg/api/public"
"github.com/stackrox/acs-fleet-manager/pkg/client/fleetmanager"
"github.com/stackrox/rox/operator/apis/platform/v1alpha1"
appsv1 "k8s.io/api/apps/v1"
Expand All @@ -32,8 +31,8 @@ var (
dnsEnabled bool
routesEnabled bool
route53Client *route53.Route53
waitTimeout = getWaitTimeout()
extendedWaitTimeout = getWaitTimeout() * 3
waitTimeout = testutil.GetWaitTimeout()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a meta-comment - adding the testutil. in a separate PR if possible would've made reviewing just the cluster migration part easier. It's fast enough to skim by but next time would recommend splitting into 2 PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about splitting PRs as well along the way, decided to not do it since I only wanted very few changes. Then it grew over time to what you see now :) . I'll split it next time.

extendedWaitTimeout = testutil.GetWaitTimeout() * 3
dpCloudProvider = getEnvDefault("DP_CLOUD_PROVIDER", "standalone")
dpRegion = getEnvDefault("DP_REGION", "standalone")
fleetManagerEndpoint = "http://localhost:8000"
Expand All @@ -52,18 +51,6 @@ var (
}
)

func getWaitTimeout() time.Duration {
timeoutStr, ok := os.LookupEnv("WAIT_TIMEOUT")
if ok {
timeout, err := time.ParseDuration(timeoutStr)
if err == nil {
return timeout
}
fmt.Printf("Error parsing timeout, using default timeout %v: %s\n", defaultTimeout, err)
}
return defaultTimeout
}

func getEnvDefault(key, defaultValue string) string {
value, ok := os.LookupEnv(key)
if !ok {
Expand All @@ -89,7 +76,7 @@ var _ = BeforeSuite(func() {
Expect(err).ToNot(HaveOccurred())

var accessKey, secretKey string
dnsEnabled, accessKey, secretKey = isDNSEnabled(routesEnabled)
dnsEnabled, accessKey, secretKey = testutil.DNSConfiguration(routesEnabled)

if dnsEnabled {
creds := credentials.NewStaticCredentials(
Expand Down Expand Up @@ -125,58 +112,6 @@ func enableTestsGroup(testName string, envName string, defaultValue string) bool
return false
}

func isDNSEnabled(routesEnabled bool) (bool, string, string) {
accessKey := os.Getenv("ROUTE53_ACCESS_KEY")
secretKey := os.Getenv("ROUTE53_SECRET_ACCESS_KEY")
enableExternal := os.Getenv("ENABLE_CENTRAL_EXTERNAL_CERTIFICATE")
dnsEnabled := accessKey != "" &&
secretKey != "" &&
enableExternal != "" && routesEnabled
return dnsEnabled, accessKey, secretKey
}

func assertCentralRequestStatus(ctx context.Context, client *fleetmanager.Client, id string, status string) func() error {
return func() error {
centralRequest, _, err := client.PublicAPI().GetCentralById(ctx, id)
if err != nil {
return err
}
if centralRequest.Status != status {
return fmt.Errorf("expected centralRequest status %s, got %s", status, centralRequest.Status)
}
return nil
}
}

func assertCentralRequestReady(ctx context.Context, client *fleetmanager.Client, id string) func() error {
return assertCentralRequestStatus(ctx, client, id, constants.CentralRequestStatusReady.String())
}

func assertCentralRequestProvisioning(ctx context.Context, client *fleetmanager.Client, id string) func() error {
return assertCentralRequestStatus(ctx, client, id, constants.CentralRequestStatusProvisioning.String())
}

func assertCentralRequestDeprovisioning(ctx context.Context, client *fleetmanager.Client, id string) func() error {
return assertCentralRequestStatus(ctx, client, id, constants.CentralRequestStatusDeprovision.String())
}

func assertCentralRequestDeleting(ctx context.Context, client *fleetmanager.Client, id string) func() error {
return assertCentralRequestStatus(ctx, client, id, constants.CentralRequestStatusDeleting.String())
}

func assertCentralRequestAccepted(ctx context.Context, client *fleetmanager.Client, id string) func() error {
return assertCentralRequestStatus(ctx, client, id, constants.CentralRequestStatusAccepted.String())
}

func obtainCentralRequest(ctx context.Context, client *fleetmanager.Client, id string, request *public.CentralRequest) error {
centralRequest, _, err := client.PublicAPI().GetCentralById(ctx, id)
if err != nil {
return err
}
*request = centralRequest
return nil
}

func assertStoredSecrets(ctx context.Context, privateAPI fleetmanager.PrivateAPI, centralRequestID string, expected []string) func() error {
return func() error {
privateCentral, _, err := privateAPI.GetCentral(ctx, centralRequestID)
Expand Down Expand Up @@ -266,20 +201,6 @@ func assertDeploymentHealthyReplicas(ctx context.Context, namespace, name string
}
}

func assertReencryptIngressRouteExist(ctx context.Context, namespace string, route *openshiftRouteV1.RouteIngress) func() error {
return func() error {
reencryptIngress, err := routeService.FindReencryptIngress(ctx, namespace)
if err != nil {
return fmt.Errorf("failed finding reencrypt ingress in namespace %s: %v", namespace, err)
}
if reencryptIngress == nil {
return fmt.Errorf("reencrypt ingress in namespace %s not found", namespace)
}
*route = *reencryptIngress
return nil
}
}

func assertReencryptRouteExist(ctx context.Context, namespace string, route *openshiftRouteV1.Route) func() error {
return func() error {
reencryptRoute, err := routeService.FindReencryptRoute(ctx, namespace)
Expand Down Expand Up @@ -307,9 +228,3 @@ func assertPassthroughRouteExist(ctx context.Context, namespace string, route *o
return nil
}
}

func SkipIf(condition bool, message string) {
if condition {
Skip(message, 1)
}
}
Loading
Loading