Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Lost logcontexts during federation send timeouts #5604

Closed
richvdh opened this issue Jul 2, 2019 · 8 comments
Closed

Lost logcontexts during federation send timeouts #5604

richvdh opened this issue Jul 2, 2019 · 8 comments
Labels
A-Logging Synapse's logs (structured or otherwise). Not metrics. z-p2 (Deprecated Label)

Comments

@richvdh
Copy link
Member

richvdh commented Jul 2, 2019

My logs are full of warnings like this:

2019-07-02 21:58:57,055 - synapse.http.matrixfederationclient - 411 - INFO - - Failed to send request: [<twisted.python.failure.Failure twisted.internet.defer.CancelledError: >]
2019-07-02 21:58:57,055 - synapse.http.matrixfederationclient - 464 - WARNING - - {PUT-O-10914} [skisr.cs.put.poznan.pl] Request failed: PUT matrix://skisr.cs.put.poznan.pl/_matrix/federation/v1/send/1562090565238: RequestGenerationFailed:[CancelledError()]
2019-07-02 21:58:57,055 - synapse.util.retryutils - 214 - INFO - - Connection to skisr.cs.put.poznan.pl was unsuccessful (<class 'synapse.api.errors.RequestSendFailed'>(Failed to send request: RequestGenerationFailed: [<twisted.python.failure.Failure twisted.internet.defer.CancelledError: >])); backoff now 0
2019-07-02 21:58:57,056 - synapse.storage._base - 482 - WARNING - - Starting db txn 'set_destination_retry_timings' from sentinel context
2019-07-02 21:58:57,056 - synapse.storage._base - 521 - WARNING - - Starting db connection from sentinel context: metrics will be lost
2019-07-02 21:58:57,056 - synapse.util.metrics - 124 - WARNING - - Context has unexpectedly changed from 'federation_transaction_transmission_loop-26347' to 'sentinel'. ('_send_new_transaction')
2019-07-02 21:58:57,057 - synapse.federation.sender.per_destination_queue - 318 - WARNING - - TX [skisr.cs.put.poznan.pl] Failed to send transaction: Failed to send request: RequestGenerationFailed: [<twisted.python.failure.Failure twisted.internet.defer.CancelledError: >]
2019-07-02 21:58:57,058 - synapse.util.logcontext - 294 - WARNING - - Expected logging context federation_transaction_transmission_loop-26347 was lost

Evidently something is dropping a logcontext that shouldn't be.

@richvdh
Copy link
Member Author

richvdh commented Jul 3, 2019

The basis of the problem is that the request is cancelled, but only the .well-known fetch is cancelled - the SRV lookup and main request are attempted anyway, but there is now nothing to unpick the log context. To be honest, the logcontext feels like the least of the problems here.

Failing test on the rav/federation_request_timeout branch

@richvdh
Copy link
Member Author

richvdh commented Jan 14, 2021

likewise, if the SRV lookup takes ages, then when the federation request times out, there is nothing left to catch and report the (eventual) failure of the DNS lookup, which makes it hard to tell what the failure was.

@richvdh
Copy link
Member Author

richvdh commented Sep 20, 2021

@squahtx is this fixed by #10810?

@squahtx
Copy link
Contributor

squahtx commented Sep 20, 2021

@squahtx is this fixed by #10810?

The initial issue is fixed at least. I haven't done anything to address the 2nd and 3rd comments above.

@richvdh
Copy link
Member Author

richvdh commented Sep 20, 2021

ok, please remember to add references to any relevant issues when you create a PR, to make it easier to cross-reference things later ;)

I'm going to go ahead and close this, and open new issues for the remaining problems.

@richvdh richvdh closed this as completed Sep 20, 2021
@richvdh
Copy link
Member Author

richvdh commented Sep 20, 2021

filed as #10855

@richvdh
Copy link
Member Author

richvdh commented Oct 21, 2021

we're still seeing reports of this in Synapse 1.45, so I guess it's not fully fixed.

@richvdh richvdh reopened this Oct 21, 2021
@richvdh
Copy link
Member Author

richvdh commented Oct 23, 2021

we're still seeing reports of this in Synapse 1.45, so I guess it's not fully fixed.

it appears this was an incorrect report: the server in question was in fact running 1.43 at the time the messages were logged.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Logging Synapse's logs (structured or otherwise). Not metrics. z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

3 participants