Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

2/3 of non-email push notifications seem to not go through #7075

Closed
babolivier opened this issue Mar 13, 2020 · 3 comments
Closed

2/3 of non-email push notifications seem to not go through #7075

babolivier opened this issue Mar 13, 2020 · 3 comments
Assignees
Labels
A-Push Issues related to push/notifications T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-bug (Deprecated Label)

Comments

@babolivier
Copy link
Contributor

(it might be email as well but we don't have enough stats for them)

We've received report that some users weren't receiving push notifications either via email or on their devices (i.e. via APNS or FCM). The graph for HTTP push (the latter category) on matrix.org seems to show a 2/3 drop on February 27th:

image

The pusher worker's logs seem to indicate the same trend:

$ zgrep "Received response to POST http://[SYGNAL'S IP]/_matrix/push/v1/notify" pusher1.log-20200218.gz | wc -l
3153656

(on Feb 18th)

$ zgrep "Received response to POST http://[SYGNAL'S IP]/_matrix/push/v1/notify" pusher1.log-20200310.gz | wc -l
931077

(on Mar 10th)

Feb 27th seems to match with a deployment of Synapse matrix.org, so it seems fair to say this is probably a Synapse issue and the cause should be a changed that happened between then and Feb 19th (the previous deployment, which didn't seem to have that bug). #6964 seems like a likely suspect, though I didn't get far enough in my investigation to confirm this theory.

According to what @manuroe told me, it seems like a user can temporarily fix this issue by restarting their client, which causes it to re-register the pusher. But it turns out it's only a temporary fix because a device fixed like this can stop receiving notifications again after a short while (I think @dbkr experienced that).

Another point of interest is that, while investigating @jryans's missing email notifs (which I thought might be coming from the same issue, but I have no evidence to confirm nor deny this theory right now), I could see that Synapse's database (in the pushers table) claimed to have that the last successful email notification was sent to his address two days ago while he hasn't received any since Feb 27th. It might be unrelated, but the fact that the date coincides with this bug and that email pushes don't seem to be entirely broken (I managed to get an email notif sent to an address associated with my matrix.org account earlier today), I thought it worth it to mention it here.

My first hunch was that the pusher wasn't replicated correctly from the master to the pusher worker, and that hunch was coming from the fact that for Ryan's account, the replication notifier would log "Streaming: pushers -> [PUSHER ID]" for the HTTP pusher(s) but not the email one (which was the problematic one), but it looks like it only logs that for HTTP pushers so it's might not be coming from there.

@clokep
Copy link
Member

clokep commented Mar 16, 2020

Feb 27th seems to match with a deployment of Synapse matrix.org, so it seems fair to say this is probably a Synapse issue and the cause should be a changed that happened between then and Feb 19th (the previous deployment, which didn't seem to have that bug). #6964

I went back through and it seems like the range of commits to look at is: e3d811e...69111a8

There's quite a bit in there (130 files changed, +1.7k, -2.9k). The PRs that seemed to touch push things are:

@clokep
Copy link
Member

clokep commented Mar 17, 2020

A few notes from today (unsure if these are useful, but want to write them down):

  • The incoming data rate for the pusher work does not seem to have changed:
    image

  • The amount of badge updates that were processed seems to also dip around the same time as HTTP pushes dip (see graph in the description):
    image

@richvdh
Copy link
Member

richvdh commented Mar 18, 2020

It looks like, since https://github.com/matrix-org/synapse/pull/6964/files#diff-9b951d362503f8f7a6b4fd290f703601L216, we are no longer calling PusherPool.start.

It's unclear why push was working at all, but still: #7104 should fix it.

@richvdh richvdh closed this as completed Mar 19, 2020
@MadLittleMods MadLittleMods added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Push Issues related to push/notifications labels Aug 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Push Issues related to push/notifications T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-bug (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

5 participants