Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ Enhance and Stabilise Druid E2E tests #782

Open
1 of 18 tasks
unmarshall opened this issue Apr 4, 2024 · 2 comments
Open
1 of 18 tasks

☂️ Enhance and Stabilise Druid E2E tests #782

unmarshall opened this issue Apr 4, 2024 · 2 comments
Assignees
Labels
area/dev-productivity Developer productivity related (how to improve development) area/quality Output qualification (tests, checks, scans, automation in general, etc.) related area/testing Testing related kind/enhancement Enhancement, improvement, extension

Comments

@unmarshall
Copy link
Contributor

unmarshall commented Apr 4, 2024

What you would like to be added:

  • Use separate namespace for running e2e tests concurrently (using go-native tests)
  • Load the images using kind load as this currently takes a long time during setup.
  • Remove ginkgo with native golang tests. We are removing ginkgo usage from druid. A lot of unit and some IT tests have already been migrated and will be merged with Druid Refactor to Address Multiple Controller Conflicts #777 .
  • Use KO to build images so that these are faster.
  • For tests that have failed, preserve their namespaces so that developers can debug. For tests that have passed cleanup the respective namespaces.
  • Do not stop the kind cluster at the end of the test run. Or at least have an option to not do that. For concourse pipeline mandatory cleanup is required but for local runs where an ability to analyse failure is required there we can switch it off.
  • Have capability to put breakpoint to any test and enable debugging from the IDE.
  • Should be able to do fast iterations with quasi hot deploy (golang unfortunately does not support real hot-deploy).
  • Record and publish (as logs) startup times for etcd clusters. Ideally these should be recorded as metrics for all clusters managed by druid on dev/staging/canary/live landscapes. This will help understand any deterioration in the startup times across releases.
  • Need to add proper compaction and copy-backups-task testing to the e2e test suite.
  • e2e tests currently test only backup-enabled etcds, but not test backup-disabled etcds (such as etcd-events in g/g).
  • Remove flakiness in tests - there is still some flakiness even after the fix that we made yesterday, and such flakes need to be removed to have deterministic test runs.
  • Introduce etcd-druid upgrade tests #807
    • Add to CI pipeline with PR branch vs master branch (or previous release)
  • Test backward compatibility to previous druid version (support for downgrade)
  • Test reconciliation after error in previous reconciliation which had caused the etcd cluster to be unready. This test would catch cases such as the one described in Deleting the etcd-bootstrap configmap leads to etcd reconciliation to never succeed #818
  • Generate all PKI artifacts to be used for e2e tests. This utility should be re-used for any tests (other than e2e tests) that require PKI artifacts.
  • Fix and enhance Azurite and fakegcs emulator support etcd-backup-restore#762

Motivation (Why is this needed?):
E2E tests should be simple, comprehensive, fast and stable.

@unmarshall unmarshall added the kind/enhancement Enhancement, improvement, extension label Apr 4, 2024
@unmarshall unmarshall self-assigned this Apr 4, 2024
@unmarshall unmarshall added area/dev-productivity Developer productivity related (how to improve development) area/quality Output qualification (tests, checks, scans, automation in general, etc.) related area/testing Testing related labels Apr 4, 2024
@shreyas-s-rao shreyas-s-rao changed the title Enhance and Stabilise Druid E2E tests ☂️ Enhance and Stabilise Druid E2E tests Jun 24, 2024
@shreyas-s-rao
Copy link
Contributor

Manual tests that I generally run before merging large PRs, cover different combinations of druid auto-reconcile enabled, single/multi node etcds, backups disabled/enabled (with different providers), TLS disabled/enabled, etc, for various scenarios like:

  • etcd creation
  • reconciliation
  • spec changes
  • scale-up of replicas (with different combinations of TLS disabled/enabled)
  • hibernation/unhibernation (scale down to 0 and back up to original replicas)
  • upgrade of druid from old to new version (with checks for etcd status reconciliation, and later spec reconciliation)
  • compaction jobs
  • copy-backups tasks
Ex: list of manual tests executed before merging #777
TEST NAME Druid Auto-Reconcile Single/Multi Node Backups (provider) Etcd Client TLS Etcd Peer TLS EtcdBR TLS TEST RESULT
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Single NA FALSE FALSE FALSE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Single NA TRUE TRUE TRUE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Single AWS TRUE TRUE TRUE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Multi NA FALSE FALSE FALSE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Multi NA TRUE TRUE TRUE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Multi AWS TRUE TRUE TRUE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Multi GCP TRUE TRUE TRUE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Multi Azure TRUE TRUE TRUE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Multi Openstack TRUE TRUE TRUE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd FALSE Multi Local TRUE TRUE TRUE TRUE
Perform etcd spec changes, check if reconciliation triggered FALSE Multi GCP TRUE TRUE TRUE TRUE
Scale-up etcd from single-node non-TLS to multi-node non-TLS, hibernate, unhibernate FALSE Single GCP FALSE FALSE FALSE TRUE
Scale-up etcd from single-node non-TLS to multi-node TLS, hibernate, unhibernate FALSE Single GCP FALSE FALSE FALSE TRUE
Scale-up etcd from single-node TLS to multi-node TLS, hibernate, unhibernate FALSE Single NA TRUE TRUE TRUE TRUE
Upgrade druid from master to #777, check status updates, add reconcile annotation, check reconciliation FALSE Multi GCP TRUE TRUE TRUE TRUE
Deploy etcdcopybackupstask, check success FALSE Multi Local TRUE TRUE TRUE TRUE
Configure compaction with low threshold, populate etcd, check if compaction jobs are triggered and run FALSE Single AWS TRUE TRUE TRUE TRUE
Deploy etcd, check reconciliation, hibernate, unhibernate, delete etcd TRUE Multi GCP TRUE TRUE TRUE TRUE
Perform etcd spec changes, check if reconciliation triggered TRUE Multi GCP TRUE TRUE TRUE TRUE

@unmarshall
Copy link
Contributor Author

#833 introduced namespace separation but this will be completely re-written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dev-productivity Developer productivity related (how to improve development) area/quality Output qualification (tests, checks, scans, automation in general, etc.) related area/testing Testing related kind/enhancement Enhancement, improvement, extension
Projects
None yet
Development

No branches or pull requests

2 participants