Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event with invalid auth_events #3695

Closed
timokoesters opened this issue Feb 2, 2022 · 8 comments
Closed

Event with invalid auth_events #3695

timokoesters opened this issue Feb 2, 2022 · 8 comments
Labels
clarification An area where the spec could do with being more explicit

Comments

@timokoesters
Copy link

What should happen if an event has at least one invalid auth event?

Synapse currently fails with

Dropping event , which relies on auth_event $qK, which could not be found

But Conduit drops the invalid auth events and tries the auth checks anyway.

Dropping bad events makes Matrix more fault tolerant / easier to repair. For instance, If a member event breaks, one can just leave and rejoin. Currently the rejoin will fail on Synapse because the old broken member event is still in the chain

@timokoesters timokoesters added the clarification An area where the spec could do with being more explicit label Feb 2, 2022
@erikjohnston
Copy link
Member

The problem is that e.g. state res v2 it assumes that all servers has the full copy of the auth chain. So if a server doesn't have the full copy then it may come to different conclusions about the state.

Dropping bad events makes Matrix more fault tolerant / easier to repair. For instance, If a member event breaks, one can just leave and rejoin. Currently the rejoin will fail on Synapse because the old broken member event is still in the chain

What do you mean by break here?

@richvdh
Copy link
Member

richvdh commented Feb 2, 2022

Is this a subset of https://github.com/matrix-org/matrix-doc/issues/1646? In particular:

We should also define the behaviour when a validation check is not met: should we ignore such events, or reject incoming pushes? For pulls, should we try to get the events from elsewhere?

@timokoesters
Copy link
Author

So if a server doesn't have the full copy then it may come to different conclusions about the state

If a server leaves out broken auth_events and the auth checks are all good, then taking more auth events should not make the auth checks suddenly fail.

With "broken" I mean events where the origin server thinks it's alright, but other servers don't accept it at all, usually caused by bugs in server implementations (e.g. event id calculation is wrong, signatures are invalid)

@richvdh
Copy link
Member

richvdh commented Feb 2, 2022

taking more auth events should not make the auth checks suddenly fail.

this seems to be a bad assumption. It's quite possible to create an event which will pass auth at a subset of its auth events, but not on all of its auth events.

But the debate is academic - as Erik says, the state res algorithm relies on the auth chain to define an partial ordering for events in the room. If some servers don't have all the auth events, they will reach different conclusions about the ordering of events, and hence different conclusions about the state of the room at each point on the DAG.

In short: you should not accept events where some of the auth events are missing/unobtainable/unacceptable. And yes, if an auth event becomes unacceptable due to somebody expiring or changing a signing key, we have a whole lot of problems.

@timokoesters
Copy link
Author

@richvdh I understand, leaving out the auth event could leave out an entire branch of the DAG.

Here's a different approach to solve the "If a member event breaks, one can just leave and rejoin" problem. We always leave out the previous member event from the auth_events when doing a leave->* transition.

Does that sound alright? The idea is that a "leave" membership and no membership at all are equivalent, which I think is correct.

@richvdh
Copy link
Member

richvdh commented Feb 2, 2022

That's fine for event auth, but not for state resolution.

The idea is that a "leave" membership and no membership at all are equivalent,

Unfortunately, they aren't equivalent from the point of view of state resolution, which needs the full chain to know which of two "leave"->"join" transitions came first.

@timokoesters
Copy link
Author

Shouldn't state res then fall back to timestamps? It will still be deterministic

@erikjohnston
Copy link
Member

Shouldn't state res then fall back to timestamps? It will still be deterministic

Deterministic, yes, but it won't necessarily come to the "right" conclusion. We specifically added leaves as auth events to make it less likely that state would "reset" and cause users to incorrectly be set to "leave" again.

Dropping bad events makes Matrix more fault tolerant / easier to repair. For instance, If a member event breaks, one can just leave and rejoin. Currently the rejoin will fail on Synapse because the old broken member event is still in the chain

It's worth noting that this is basically the only situation where we add a superfluous auth event (i.e. rejoining a public room), so it's quite a niche issue.

francescagiacco pushed a commit to prototypefund/conduit that referenced this issue Feb 14, 2022
The bug that caused broken joins was fixed here:
https://gitlab.com/famedly/conduit/-/commit/a5f004d7e9c783caf280884d7fd332c7bafa67ce

But that commit doesn't repair those joins afterwards because the broken
join event is still in the auth chain. This commit will leave out the
the member join event from the auth_events in leave->* transitions.

Also see matrix-org/matrix-spec-proposals#3695
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification An area where the spec could do with being more explicit
Projects
None yet
Development

No branches or pull requests

3 participants