Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2023-03-06 - (expected chart version 4.34.0) #3132

Merged
merged 81 commits into from
Mar 6, 2023

Conversation

zebot
Copy link
Contributor

@zebot zebot commented Mar 6, 2023

[2023-03-06] (Chart Release 4.34.0)

Release notes

API changes

  • API v3 is now supported. The new MLS endpoints introduced in API v3 have been removed, and are now only available under v4. (Finalise API v3 #3122)

Features

Bug fixes and other updates

Documentation

Internal changes

Federation changes

akshaymankar and others added 30 commits January 26, 2023 16:36
Master->Develop after release
Broad Changes:

- Introduce `SQSWatcher` in `Util.Test.SQS`. This can be used to watch an SQS queue in separate thread from tests. It keeps whatever comes in the queue in an `IORef [a]`. The tests then make assertions on whatever they expect to be in the queue. The tests can no longer assume that the queue will be empty and only used by calls made by the test, so they always need to make assertion using `id` of the team of user depending on which queue they are making an assertion.

- brig-integration: Remove hard coded port for running mock bot services during the test. This allows multiple mocks to be running at the same time. To make this work in K8s, the service needs to be headless.

- galley-integation: Same as above, but for mock legalhold services

- Increase timeouts for expiring various codes and invitations. Since the tests run in parallel, some tests may starve for CPU. These timeouts then start affecting the tests.  Most of them were 5 seconds and have been updated to 10 seconds. This makes the some tests (which test the fact that these timeouts work) run longer, but it is ok.

- Increase timeouts further in 1 test (`brig-integration:API.Team.testInvitationPaging`) because it was running for longer than 10 seconds and failing intermittently

- galley-integration: De-duplicate code in `API..Teams.LegalHold` and `API.Teams.LegalHold.DisabledByDefault`.

- Remove any obvious uses of  `putStrLn`, `print` etc. from tests. These don't interleave very well when tests are running in parallel. Some tests legalhold tests were being "skipped" based on feature flags, these are now clearly marked skipped using tasty machinery. 

- Bonus: Make upload-images script upload the images in parallel.

- FUTUREWORK: Make spar tests run in parallel, they take more than 5 mins to run.

Co-authored-by: jschaul <jschaul@users.noreply.github.com>
#3037)

* charts: Mark all test resources to be only created while running tests

* Use patched helm to ensure it doesn't try to get logs of configmaps

* ciImage,devSetup: Add awk

* ciImage: Add cfssl
Co-authored-by: Sebastian Willenborg <sebastian.willenborg@wire.com>
Co-authored-by: Sebastian Willenborg <sebastian.willenborg@wire.com>
Add security response about wire.com DoS and HTML injection
* change helm hook type of test resources which are not Pods

* changelog adjustment
- change liveness and readyness probes to start querying more quickly to
  see if cassandra is up. Instead of 90 - 120 seconds, if cassandra is
up earlier that should manifest itself in the setup time of 'make
kube-integration-setup'
- change helmfile for wire-server to wait for databases-ephemeral to be
  up before launching pods: cassandra-migration needs to have a working
cassandra anyway - the crashloop-backoff strategy leads to a lot of
waiting in between restarts; so it should be faster to wait for
cassandra to be up before attempting schema migrations
example case where this test failed: https://concourse.ops.zinfra.io/teams/main/pipelines/staging/jobs/test/builds/342

output of failing test:
```
  metrics
    prometheus:                                                                              OK (0.02s)
    work:                                                                                    FAIL (1.06s)
      Error message: /login was called twice
      expected: 2
       but got: 3

      CallStack (from HasCallStack):
        assertFailure, called at ./Test/Tasty/HUnit/Orig.hs:86:32 in tasty-hunit-0.10.0.3-KJER1RJhmod6e0raY4U8z6:Test.Tasty.HUnit.Orig
        assertEqual, called at test/integration/API/Metrics.hs:78:12 in main:API.Metrics
      Use -p '(!/turn/&&!/user.auth.cookies.limit/)&&/metrics.work/' to rerun this test only.
```
* Test helper SQSWatcher: use purgeQueue

The previous logic of emptying the queue by reading all messages and
deleting them assumes there is no other process writing anything into
the queue, which might not be the case (in case of parallel
brig/galley/spar tests). Instead, use purgeQueue to empty the queue,
which should be faster and more reliable.

* Hi CI
… to flaky tests) for parallel helm test executions. (#3040)

1. Allow running helm tests in parallel if desired, using `HELM_PARALLELISM=6` (disabled for now until we have fixed some flaky tests which fail more often when tests run in parallel)

2. rework integration test output: logs from test runs will only show if there are any failed tests. Also, the bottom of the output will have a summary of what failed and what didn't; as well as only the failed test lines with a context of +- 10 lines. This should hopefully make it easier to see what went wrong: just scroll to the bottom.

The summary looks like this:

```
=== tail cargohold: ===

All 21 tests passed (8.45s)
=== tail gundeck: ===

All 33 tests passed (56.60s)
=== tail federator: ===
Finished in 0.6576 seconds
9 examples, 0 failures
=== tail spar: ===
Finished in 397.2779 seconds
553 examples, 0 failures, 65 pending
=== tail brig: ===

2 out of 449 tests failed (123.07s)
=== tail galley: ===

1 out of 414 tests failed (136.33s)
cargohold-integration passed ✅.
gundeck-integration passed ✅.
federator-integration passed ✅.
spar-integration passed ✅.
brig-integration FAILED ❌. pfff...
galley-integration FAILED ❌. pfff...
Tests failed.
```
* Lower the log level of federator inotify

---------

Co-authored-by: jschaul <jschaul@users.noreply.github.com>
Co-authored-by: Leif Battermann <leif.battermann@wire.com>
Co-authored-by: Leif Battermann <leif.battermann@wire.com>
Co-authored-by: Leif Battermann <leif.battermann@wire.com>
fisx and others added 25 commits February 23, 2023 12:07
 Make account registration whitelists local #3043 

https://wearezeta.atlassian.net/browse/SQPIT-405 (a related wire infrastructure PR is linked in the ticket)

This is changing a feature wire has been using on our staging environment, and (probably?) not anywhere else. See the changelog if you think you may be affected.

Since the service is both outdated and almost unused, this PR moves the data from that service into the local server config yaml.

Migration should be painless, since the new settings are in a different place than the old ones. Just make sure the new fields are added to the config before the upgrade, and then you can remove the old ones at any time after.
Render a Swagger docs page per internal endpoint. The benefit is that we don't have to play crazy tricks to get all (overlapping!) paths right. Currently, this is solved in develop by prefixing the paths by their service name (e.g. /<brig>/i/status.)

Executing the swagger operations by clicking on *Execute* doesn't work and never will: The services do not handle CORS related headers. Thus, the browser refuses to accept the response. But, the rendered curl command works if kubectl forward-port is called as described.
Make brig-schema a little faster by merging the first 34 schema migrations and thereby removing some redundancies on fresh installations.
Introduces an integration test / regression test to check that control-level pings with a payload result in a control-level pong with the same payload as specified in the [RFC](https://www.rfc-editor.org/rfc/rfc6455#section-5.5.2)

This was used in debugging https://wearezeta.atlassian.net/browse/FS-1489

(related ping-pong prior work: #561 and prior discussion: #560)
* Use openssl instead of tls in federator http2 client

* changelog

* Strip trailing dot for hostname validation

* Move blessed ciphers close to where context is being built

Make it clear that this only works with TLS 1.2 as of now

* Check client certificate and private key to ensure they match

This will prevent reloading in case the files are being updated one by one.

* Add options to ssl context to workaround various bugs

https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_options.html#SSL_OP_ALL

* Remove leftover debugging code

* Ensure test for testing hostname with trailing dot is correct

It was broken in a previous commit so it was not testing with a hostname with
trailing dot at all.

* Remove commented out code for hs-tls

* Remove duplicated comment

* Slightly better types for CertifiateAndPrivateKeyDoNotMatch

* Share code to create ssl context between test and src

* Grammar

Co-authored-by: Paolo Capriotti <paolo@capriotti.io>

* federator: Pass response consumer continuation to discoverAndCall

This ensures that HTTP2 client doesn't close the connection before the response
body gets consumed.

In current implementation of the HTTP2 client there is a race between the part
which consumes the response and "background threads". These background threads
are sending and receiving data and they are not supposed to finish unless
connection gets abruptly terminated, however, due to the race they get a
`Async.cancel` when the response consumer function finishes executing.

Before this change, `Codensity` was supposed to ensure that the consumer doesn't
finish executing, but I am not sure why it didn't work, changing the code to use
CPS fixes this.

* Remove `-Wno-unused-imports`, perhaps added by mistake

* Federator Client: Simplify reading data from SSL

* Revert "federator: Pass response consumer continuation to discoverAndCall"

This reverts commit febf71a.

Thanks to @pcapriotti for clarifying that the test was failing because the test
was exiting Codensity before making the assertion causing the test to fail.

* federator-integration: Avoid exiting Codensity too soon

* federator: Run all code warpped in `withOpenSSL`

* federator-unit-tests: Ensure assertions happen without exiting Codensity

* Special handling for reading 0 bytes out of the TLS socket

---------

Co-authored-by: Paolo Capriotti <paolo@capriotti.io>
Better error message for invalid ID in credential
This fixes compatibility with Nix 2.14.
* Cleanup haskell-pins

* Bring back forked http-client

The ssl-util package relies on the fork.

* Fix compile error due to http2 bump
* retry with exp backoff when rate limited by Amazon

* add changelog

* factor our retry function + review comments
* Add pregenerated v3 swagger

* Finalise API v3

* Use v2 for welcome messages in tests

* Add CHANGELOG entry

* Set v4 as the development version

* Update golden tests

* Add assertion for v4 to version test

* Use v2 welcome in end2end tests
* replace "tho" with "the"
* fix glossary reference
* fix terminology list
  upstream uses "End User/Browser" here, in our context Transport makes
  more sense
@zebot zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label Mar 6, 2023
@smatting smatting self-requested a review March 6, 2023 19:31
@smatting smatting merged commit 3e4f302 into master Mar 6, 2023
@smatting smatting deleted the release_2023-03-06_18_57 branch March 6, 2023 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist
Projects
None yet
Development

Successfully merging this pull request may close these issues.