Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Instrument FederationStateIdsServlet - /state_ids #13499

Merged
merged 4 commits into from
Aug 15, 2022

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Aug 10, 2022

Instrument FederationStateIdsServlet - /state_ids so it's easier to follow what's going on in Jaeger when viewing a trace.

/_matrix/federation/v1/state_ids/:roomId?event_id=:eventId

Part of #13356

Sometimes slow /state_ids is just ratelimiter 😴

Before After

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

Conflicts:
	docker/conf/homeserver.yaml
	synapse/federation/federation_server.py
	synapse/handlers/federation.py
@MadLittleMods MadLittleMods added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Aug 10, 2022
@@ -110,6 +111,9 @@ def ratelimit(self) -> "Iterator[defer.Deferred[None]]":
def _on_enter(self, request_id: object) -> "defer.Deferred[None]":
time_now = self.clock.time_msec()

wait_span_scope = start_active_span("ratelimit wait")
wait_span_scope.__enter__()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can make a big difference in understanding where all of the time went. In the before, there is just a giant gap with no indication of what's taking up the time. In the after, you can see the ratelimit wait span 🎉

Before After

@MadLittleMods MadLittleMods marked this pull request as ready for review August 10, 2022 23:35
@MadLittleMods MadLittleMods requested a review from a team as a code owner August 10, 2022 23:35
@@ -162,6 +166,7 @@ def on_wait_finished(_: Any) -> "defer.Deferred[None]":

def on_start(r: object) -> object:
logger.debug("Ratelimit [%s]: Processing req", id(request_id))
wait_span_scope.__exit__(None, None, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that this approach could lead to us forgetting to exit the scope in some situations (particularly: on cancellation, or other exceptions such as the place where we raise LimitExceededError).

one approach we've taken elsewhere is to make the span cover the whole of the limited operation (ie, the wait, and the operation itself), and just emit a tracing event when the limiter completes (or have a nested span). That means you can just do a with start_active_span(...) as normal.

Failing that, I'd suggest doing this in on_both instead of on_start, and moving the call to start_active_span to just before the definition of on_both, to minimise the amount of code that can throw exceptions between the two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one approach we've taken elsewhere is to make the span cover the whole of the limited operation (ie, the wait, and the operation itself), and just emit a tracing event when the limiter completes (or have a nested span). That means you can just do a with start_active_span(...) as normal.

We already have a span that covers the wait and the operation itself. I'm interested in exposing how long the wait is though.


Failing that, I'd suggest doing this in on_both instead of on_start, and moving the call to start_active_span to just before the definition of on_both, to minimise the amount of code that can throw exceptions between the two.

Moved to this 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested in exposing how long the wait is though.

doesn't emitting a tracing event do that though?

Moved to this

ok then :)

@MadLittleMods MadLittleMods requested a review from richvdh August 12, 2022 03:50
@@ -176,8 +177,11 @@ def on_both(r: object) -> object:
# Ensure that we've properly cleaned up.
self.sleeping_requests.discard(request_id)
self.ready_request_queue.pop(request_id, None)
wait_span_scope.__exit__(None, None, None)
Copy link
Member

@richvdh richvdh Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly surprised this works, since wait_span_scope isn't declared at the point on_both is defined. But I assume you've tested it and python closures do in fact work this way :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work but I would hope mypy would complain if it didn't 🙏

@richvdh richvdh self-requested a review August 15, 2022 13:19
MadLittleMods added a commit that referenced this pull request Aug 15, 2022
From feedback in #13499
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@richvdh richvdh merged commit 344a2f7 into develop Aug 15, 2022
@richvdh richvdh deleted the madlittlemods/instrument-FederationStateIdsServlet branch August 15, 2022 18:41
@MadLittleMods
Copy link
Contributor Author

Thanks for the review and merge @richvdh 🐊

MadLittleMods added a commit that referenced this pull request Aug 17, 2022
squahtx pushed a commit that referenced this pull request Aug 18, 2022
Introduced in #13499.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx pushed a commit that referenced this pull request Aug 18, 2022
Introduced in #13499.

Signed-off-by: Sean Quah <seanq@matrix.org>
DMRobertson pushed a commit that referenced this pull request Aug 23, 2022
Synapse 1.66.0rc1 (2022-08-23)
==============================

This release removes the ability for homeservers to delegate email ownership
verification and password reset confirmation to identity servers. This removal
was originally planned for Synapse 1.64, but was later deferred until now.

See the [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details.

Features
--------

- Improve validation of request bodies for the following client-server API endpoints: [`/account/password`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpassword), [`/account/password/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpasswordemailrequesttoken), [`/account/deactivate`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountdeactivate) and [`/account/3pid/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidemailrequesttoken). ([\#13188](#13188), [\#13563](#13563))
- Add forgotten status to [Room Details Admin API](https://matrix-org.github.io/synapse/latest/admin_api/rooms.html#room-details-api). ([\#13503](#13503))
- Add an experimental implementation for [MSC3852 (Expose user agents on `Device`)](matrix-org/matrix-spec-proposals#3852). ([\#13549](#13549))
- Add `org.matrix.msc2716v4` experimental room version with updated content fields. Part of [MSC2716 (Importing history)](matrix-org/matrix-spec-proposals#2716).  ([\#13551](#13551))
- Add support for compression to federation responses. ([\#13537](#13537))
- Improve performance of sending messages in rooms with thousands of local users. ([\#13522](#13522), [\#13547](#13547))

Bugfixes
--------

- Faster room joins: make `/joined_members` block whilst the room is partial stated. ([\#13514](#13514))
- Fix a bug introduced in Synapse 1.21.0 where the [`/event_reports` Admin API](https://matrix-org.github.io/synapse/develop/admin_api/event_reports.html) could return a total count which was larger than the number of results you can actually query for. ([\#13525](#13525))
- Fix a bug introduced in Synapse 1.52.0 where sending server notices fails if `max_avatar_size` or `allowed_avatar_mimetypes` is set and not `system_mxid_avatar_url`. ([\#13566](#13566))
- Fix a bug where the `opentracing.force_tracing_for_users` config option would not apply to [`/sendToDevice`](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3sendtodeviceeventtypetxnid) and [`/keys/upload`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3keysupload) requests. ([\#13574](#13574))

Improved Documentation
----------------------

- Add `openssl` example for generating registration HMAC digest. ([\#13472](#13472))
- Tidy up Synapse's README. ([\#13491](#13491))
- Document that event purging related to the `redaction_retention_period` config option is executed only every 5 minutes. ([\#13492](#13492))
- Add a warning to retention documentation regarding the possibility of database corruption. ([\#13497](#13497))
- Document that the `DOCKER_BUILDKIT=1` flag is needed to build the docker image. ([\#13515](#13515))
- Add missing links in `user_consent` section of configuration manual. ([\#13536](#13536))
- Fix the doc and some warnings that were referring to the nonexistent `custom_templates_directory` setting (instead of `custom_template_directory`). ([\#13538](#13538))

Deprecations and Removals
-------------------------

- Remove the ability for homeservers to delegate email ownership verification
  and password reset confirmation to identity servers. See [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details.

Internal Changes
----------------

- Update the rejected state of events during de-partial-stating. ([\#13459](#13459))
- Avoid blocking lazy-loading `/sync`s during partial joins due to remote memberships. Pull remote memberships from auth events instead of the room state. ([\#13477](#13477))
- Refuse to start when faster joins is enabled on a deployment with workers, since worker configurations are not currently supported. ([\#13531](#13531))

- Allow use of both `@trace` and `@tag_args` stacked on the same function. ([\#13453](#13453))
- Instrument the federation/backfill part of `/messages` for understandable traces in Jaeger. ([\#13489](#13489))
- Instrument `FederationStateIdsServlet` (`/state_ids`) for understandable traces in Jaeger. ([\#13499](#13499), [\#13554](#13554))
- Track HTTP response times over 10 seconds from `/messages` (`synapse_room_message_list_rest_servlet_response_time_seconds`). ([\#13533](#13533))
- Add metrics to track how the rate limiter is affecting requests (sleep/reject). ([\#13534](#13534), [\#13541](#13541))
- Add metrics to time how long it takes us to do backfill processing (`synapse_federation_backfill_processing_before_time_seconds`, `synapse_federation_backfill_processing_after_time_seconds`). ([\#13535](#13535), [\#13584](#13584))
- Add metrics to track rate limiter queue timing (`synapse_rate_limit_queue_wait_time_seconds`). ([\#13544](#13544))
- Update metrics to track `/messages` response time by room size. ([\#13545](#13545))

- Refactor methods in `synapse.api.auth.Auth` to use `Requester` objects everywhere instead of user IDs. ([\#13024](#13024))
- Clean-up tests for notifications. ([\#13471](#13471))
- Add some miscellaneous comments to document sync, especially around `compute_state_delta`. ([\#13474](#13474))
- Use literals in place of `HTTPStatus` constants in tests. ([\#13479](#13479), [\#13488](#13488))
- Add comments about how event push actions are rotated. ([\#13485](#13485))
- Modify HTML template content to better support mobile devices' screen sizes. ([\#13493](#13493))
- Add a linter script which will reject non-strict types in Pydantic models. ([\#13502](#13502))
- Reduce the number of tests using legacy TCP replication. ([\#13543](#13543))
- Allow specifying additional request fields when using the `HomeServerTestCase.login` helper method. ([\#13549](#13549))
- Make `HomeServerTestCase` load any configured homeserver modules automatically. ([\#13558](#13558))
Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Sep 1, 2022
Synapse 1.66.0 (2022-08-31)
===========================

No significant changes since 1.66.0rc2.

This release removes the ability for homeservers to delegate email ownership
verification and password reset confirmation to identity servers. This removal
was originally planned for Synapse 1.64, but was later deferred until now. See
the [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details.

Deployments with multiple workers should note that the direct TCP replication
configuration was deprecated in Synapse v1.18.0 and will be removed in Synapse
v1.67.0. In particular, the TCP `replication` [listener](https://matrix-org.github.io/synapse/v1.66/usage/configuration/config_documentation.html#listeners)
type (not to be confused with the `replication` resource on the `http` listener
type) and the `worker_replication_port` config option will be removed .

To migrate to Redis, add the [`redis` config](https://matrix-org.github.io/synapse/v1.66/workers.html#shared-configuration),
then remove the TCP `replication` listener from config of the master and
`worker_replication_port` from worker config. Note that a HTTP listener with a
`replication` resource is still required. See the
[worker documentation](https://matrix-org.github.io/synapse/v1.66/workers.html)
for more details.

Synapse 1.66.0rc2 (2022-08-30)
==============================

Bugfixes
--------

- Fix a bug introduced in Synapse 1.66.0rc1 where the new rate limit metrics were misreported (`synapse_rate_limit_sleep_affected_hosts`, `synapse_rate_limit_reject_affected_hosts`). ([\matrix-org#13649](matrix-org#13649))

Synapse 1.66.0rc1 (2022-08-23)
==============================

Features
--------

- Improve validation of request bodies for the following client-server API endpoints: [`/account/password`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpassword), [`/account/password/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpasswordemailrequesttoken), [`/account/deactivate`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountdeactivate) and [`/account/3pid/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidemailrequesttoken). ([\matrix-org#13188](matrix-org#13188), [\matrix-org#13563](matrix-org#13563))
- Add forgotten status to [Room Details Admin API](https://matrix-org.github.io/synapse/latest/admin_api/rooms.html#room-details-api). ([\matrix-org#13503](matrix-org#13503))
- Add an experimental implementation for [MSC3852 (Expose user agents on `Device`)](matrix-org/matrix-spec-proposals#3852). ([\matrix-org#13549](matrix-org#13549))
- Add `org.matrix.msc2716v4` experimental room version with updated content fields. Part of [MSC2716 (Importing history)](matrix-org/matrix-spec-proposals#2716).  ([\matrix-org#13551](matrix-org#13551))
- Add support for compression to federation responses. ([\matrix-org#13537](matrix-org#13537))
- Improve performance of sending messages in rooms with thousands of local users. ([\matrix-org#13522](matrix-org#13522), [\matrix-org#13547](matrix-org#13547))

Bugfixes
--------

- Faster room joins: make `/joined_members` block whilst the room is partial stated. ([\matrix-org#13514](matrix-org#13514))
- Fix a bug introduced in Synapse 1.21.0 where the [`/event_reports` Admin API](https://matrix-org.github.io/synapse/develop/admin_api/event_reports.html) could return a total count which was larger than the number of results you can actually query for. ([\matrix-org#13525](matrix-org#13525))
- Fix a bug introduced in Synapse 1.52.0 where sending server notices fails if `max_avatar_size` or `allowed_avatar_mimetypes` is set and not `system_mxid_avatar_url`. ([\matrix-org#13566](matrix-org#13566))
- Fix a bug where the `opentracing.force_tracing_for_users` config option would not apply to [`/sendToDevice`](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3sendtodeviceeventtypetxnid) and [`/keys/upload`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3keysupload) requests. ([\matrix-org#13574](matrix-org#13574))

Improved Documentation
----------------------

- Add `openssl` example for generating registration HMAC digest. ([\matrix-org#13472](matrix-org#13472))
- Tidy up Synapse's README. ([\matrix-org#13491](matrix-org#13491))
- Document that event purging related to the `redaction_retention_period` config option is executed only every 5 minutes. ([\matrix-org#13492](matrix-org#13492))
- Add a warning to retention documentation regarding the possibility of database corruption. ([\matrix-org#13497](matrix-org#13497))
- Document that the `DOCKER_BUILDKIT=1` flag is needed to build the docker image. ([\matrix-org#13515](matrix-org#13515))
- Add missing links in `user_consent` section of configuration manual. ([\matrix-org#13536](matrix-org#13536))
- Fix the doc and some warnings that were referring to the nonexistent `custom_templates_directory` setting (instead of `custom_template_directory`). ([\matrix-org#13538](matrix-org#13538))

Deprecations and Removals
-------------------------

- Remove the ability for homeservers to delegate email ownership verification
  and password reset confirmation to identity servers. See [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details.

Internal Changes
----------------

- Update the rejected state of events during de-partial-stating. ([\matrix-org#13459](matrix-org#13459))
- Avoid blocking lazy-loading `/sync`s during partial joins due to remote memberships. Pull remote memberships from auth events instead of the room state. ([\matrix-org#13477](matrix-org#13477))
- Refuse to start when faster joins is enabled on a deployment with workers, since worker configurations are not currently supported. ([\matrix-org#13531](matrix-org#13531))

- Allow use of both `@trace` and `@tag_args` stacked on the same function. ([\matrix-org#13453](matrix-org#13453))
- Instrument the federation/backfill part of `/messages` for understandable traces in Jaeger. ([\matrix-org#13489](matrix-org#13489))
- Instrument `FederationStateIdsServlet` (`/state_ids`) for understandable traces in Jaeger. ([\matrix-org#13499](matrix-org#13499), [\matrix-org#13554](matrix-org#13554))
- Track HTTP response times over 10 seconds from `/messages` (`synapse_room_message_list_rest_servlet_response_time_seconds`). ([\matrix-org#13533](matrix-org#13533))
- Add metrics to track how the rate limiter is affecting requests (sleep/reject). ([\matrix-org#13534](matrix-org#13534), [\matrix-org#13541](matrix-org#13541))
- Add metrics to time how long it takes us to do backfill processing (`synapse_federation_backfill_processing_before_time_seconds`, `synapse_federation_backfill_processing_after_time_seconds`). ([\matrix-org#13535](matrix-org#13535), [\matrix-org#13584](matrix-org#13584))
- Add metrics to track rate limiter queue timing (`synapse_rate_limit_queue_wait_time_seconds`). ([\matrix-org#13544](matrix-org#13544))
- Update metrics to track `/messages` response time by room size. ([\matrix-org#13545](matrix-org#13545))

- Refactor methods in `synapse.api.auth.Auth` to use `Requester` objects everywhere instead of user IDs. ([\matrix-org#13024](matrix-org#13024))
- Clean-up tests for notifications. ([\matrix-org#13471](matrix-org#13471))
- Add some miscellaneous comments to document sync, especially around `compute_state_delta`. ([\matrix-org#13474](matrix-org#13474))
- Use literals in place of `HTTPStatus` constants in tests. ([\matrix-org#13479](matrix-org#13479), [\matrix-org#13488](matrix-org#13488))
- Add comments about how event push actions are rotated. ([\matrix-org#13485](matrix-org#13485))
- Modify HTML template content to better support mobile devices' screen sizes. ([\matrix-org#13493](matrix-org#13493))
- Add a linter script which will reject non-strict types in Pydantic models. ([\matrix-org#13502](matrix-org#13502))
- Reduce the number of tests using legacy TCP replication. ([\matrix-org#13543](matrix-org#13543))
- Allow specifying additional request fields when using the `HomeServerTestCase.login` helper method. ([\matrix-org#13549](matrix-org#13549))
- Make `HomeServerTestCase` load any configured homeserver modules automatically. ([\matrix-org#13558](matrix-org#13558))

# -----BEGIN PGP SIGNATURE-----
#
# iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmMPT8QACgkQ+84Kzgcy
# GG9CUAv+Pv/iDpE2jKlV7zQ/cagaKCGsFK5jy0+K9Wr215nP89tuhU37bJXsgvVu
# GP3A8k1c/ENPhXwYHLCnnxV3jick1FuVE0W6h0j2PMYeIGNCQhDswytnsQO4JExg
# fGLL4ygCzpe8bFX9+mhIM4z8xkZjZX3lIa8CN2LtRLIo0m7qoT1ZWqdt7kAjj5yL
# XMk+3Y1yq/Y4SHHqgKurBNdwNcwnv7ynchWxTYa12WVTINt26dLV0Syk3p8u2SLl
# 5YNzcDs2TAM7+VxAu7E0AQl426+Ufi122Oj1ZBUG2FxTPLH8Xr18cN2M/at6WxoX
# 8pOkGiuahKKvahw1iCoHAGIC66gFIPxBE9xW4R2SKrQtG4sDuKJI0kvunRV8+cy5
# TuJ9cmdDmJR2vj3P3OULqLXGkWsGNJqfZZF8OWkHEI8LUIXZLrAZocFtlonkr9rV
# Y8r8LxL8Id1rbHAnCXcJnYdaJ6ol0RIObDFpitY/D8BDUONVw/byeOyAEkq/XPrZ
# Ke/9K8sy
# =eg1L
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed Aug 31 13:10:44 2022 BST
# gpg:                using RSA key 58C4E75BC67C92169A7FDD11FBCE0ACE0732186F
# gpg: Can't check signature: No public key

# Conflicts:
#	synapse/api/auth.py
#	synapse/push/baserules.py
#	synapse/push/bulk_push_rule_evaluator.py
#	synapse/push/push_rule_evaluator.py
#	synapse/storage/databases/main/event_push_actions.py
#	tests/server_notices/test_resource_limits_server_notices.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants