User-Interactive Auth sessions need to be replicated across the processes of a worker-mode deployment #6705

anoadragon453 · 2020-01-14T16:44:49Z

Description

So I was trying to figure out why the /auth/xxx/fallback/web was working on a local homeserver instance but not on matrix.org.

In this flow, there's a sessions dict which governs the current state of your login/registration progress. When you complete a auth stage, your session in this dict is modified. This sessions dict is available to registration and login methods.

Turns out that in the recommended worker setup, you can end up with two workers, each with their own instance of sessions. One tracks your login progress, and another that gets modified when you successfully complete web fallback. However, because they are separate, you don't actually see any progress when you query /register after successfully completing web fallback registration stages.

Essentially /auth/xxx/fallback/web needs to get routed to the same worker as /register.

Synapse v1.8.0

The text was updated successfully, but these errors were encountered:

anoadragon453 · 2020-01-14T16:51:23Z

Also slightly worrying as I think mobile use this to do recaptcha...?

richvdh · 2020-02-23T10:24:44Z

We should investigate if this is actually breaking clients.

deepbluev7 · 2020-02-23T10:59:38Z

Yes, it is breaking clients. I can confirm that, as I just implemented the user interactive auth part of registration in nheko. Basically we try the first stage, but after the user completed the challenge and clicks okay, synapse still returns completed: []. I'm pretty sure I fixed our bugs and the fault is now with synapse/matrix.org in this case. If you want to verify that, it should be possible to test it with the latest nightly build of nheko (still building atm).

richvdh · 2020-03-06T16:41:26Z

thinking about this again: it doesn't need any more investigation. It just needs an update to the documentation, and an update to the matrix.org deployment (and worker-based modular.im deployments).

clokep · 2020-03-06T23:07:22Z

I'm going to do this.

richvdh · 2020-03-07T10:23:09Z

Thanks for picking this up @clokep. Just a reminder to us both that the update needs deploying on the matrix.org and modular.im infrastructure - I suggest you ping @benbz and @jaywink next week to see about getting that done.

deepbluev7 · 2020-03-07T14:12:19Z

Does this fix all cases, where auth fallbacks are used? I'd imagine all requests, that use interactive authentification, need to be routed to the same endpoint as the request, that requires auth. For example I believe that the auth fallback for device deletion needs to be routed to the same worker, that handles that request. Or am I missing something like this not being handled by a worker, so it should work fine?

richvdh · 2020-03-07T18:00:25Z

ohhhhh gosh that's a good point. this is going to be much harder to fix.

clokep · 2020-03-09T14:08:53Z

I took another look at what uses the sessions object from the auth handler (nothing directly uses it outside of the class, but there's a variety of indirect users via _get_session_info and _save_session).

I think the following patterns use it:

/register$
/auth/.*/fallback/web$
/account/password$
/account/deactivate$
/account/3pid/add$
/delete_devices
/devices/.+$ (deleting a device)
/keys/device_signing/upload$
/login$ (for non-JWT and non-token logins)

The first two on this list are now documented, not sure if we just need to document the rest of these or if there's more to do. Just documenting them seems fairly fragile though (we will likely run into this again in the future).

deepbluev7 · 2020-03-10T06:51:55Z

Yeah, those endpoints match my expectation, of where I know UI auth is used. I think just documenting them is a bit brittle. I can't even find some of them in the doc, so I guess those are handled by the master process? I think it would make the most sense, if all of those could be handled by the client_reader and successive requests from the same client to the client reader were routed to the same instance of the client_reader, but I don't really know where the cut-off for that would be (5 minutes?). I think there should be some way for workers to cross-talk maybe, so that they can continue fallback auth, if needed. Since this only affects requests that contain /auth/.*/fallback/web or auth inside the request body, maybe a better solution than documenting it could be found? Maybe if redis is used for communication between master processes, that could be extended to UI auth?

clokep · 2020-03-10T11:03:40Z

Maybe if redis is used for communication between master processes, that could be extended to UI auth?

This is where my mind went to as well, especially with the work that @erikjohnston has been doing.

richvdh · 2020-03-17T10:54:52Z

interim solutions to this could be:

just document that everything that does UI auth has to go to the master - though that is unsatisfactory because some of them can be quite high-cpu (particularly the bcrypting part of password-setting/registration)
create an internal http API on the master, which just handles UI auth, and which workers can call.

clokep · 2020-03-27T12:28:44Z

Somewhat related, #649 (comment) notes that UI auth sessions get dropped over restarts.

clokep · 2020-04-01T14:25:07Z

In addition to #6705 (comment) we could attempt to solve this by persisting the data more permanently:

Use Redis (this could be nice since I think it has utilities to auto-expire content), but I'm unsure of the final plan for Redis for workers in the future. It also means that the implementation for worker vs. non-worker will diverge more heavily, but I don't think it would be terrible to abstract.
Persist the data into SQL. This means we'll need to run some sort of periodic task to prune data, but besides that I don't see a ton of downsides.

richvdh · 2020-04-02T14:00:00Z

sticking it in the db sounds good

clokep · 2020-04-30T17:48:08Z

Fixed by #7302.

richvdh · 2020-05-07T12:50:51Z

Also slightly worrying as I think mobile use this to do recaptcha...?

For the record, no, none of riot-ios/riot-andriod/riotX use fallback auth for recaptcha.

neilisfragile added z-bug (Deprecated Label) z-p2 (Deprecated Label) labels Jan 22, 2020

richvdh added the p1 label Feb 23, 2020

deepbluev7 mentioned this issue Feb 23, 2020

Cannot get past ReCaptcha when creating new account Nheko-Reborn/nheko#97

Closed

richvdh removed the z-p2 (Deprecated Label) label Mar 6, 2020

richvdh mentioned this issue Mar 6, 2020

User-interactive Auth doesn't support SSO flows #5667

Closed

clokep self-assigned this Mar 6, 2020

clokep mentioned this issue Mar 6, 2020

Document that fallback auth must be routed to the same worker node as register #7048

Merged

This was referenced Apr 10, 2020

Convert auth handler to async/await #7261

Merged

Explicitly create UI Auth sessions #7268

Merged

richvdh changed the title ~~Worker configuration for web fallback registration is incorrect~~ User-Interactive Auth sessions need to be replicated across the processes of a worker-mode deployment Apr 17, 2020

clokep mentioned this issue Apr 17, 2020

Persist user interactive authentication sessions #7302

Merged

3 tasks

t3chguy mentioned this issue Apr 21, 2020

Account deactivation is not working if logged in using SSO element-hq/element-web#13310

Closed

clokep closed this as completed Apr 30, 2020

clokep mentioned this issue May 19, 2020

Potential bug when using SAML and workers might result in "Unsolicited response" errors #7530

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User-Interactive Auth sessions need to be replicated across the processes of a worker-mode deployment #6705

User-Interactive Auth sessions need to be replicated across the processes of a worker-mode deployment #6705

anoadragon453 commented Jan 14, 2020

anoadragon453 commented Jan 14, 2020

richvdh commented Feb 23, 2020

deepbluev7 commented Feb 23, 2020 •

edited

Loading

richvdh commented Mar 6, 2020 •

edited

Loading

clokep commented Mar 6, 2020

richvdh commented Mar 7, 2020

deepbluev7 commented Mar 7, 2020

richvdh commented Mar 7, 2020

clokep commented Mar 9, 2020

deepbluev7 commented Mar 10, 2020

clokep commented Mar 10, 2020

richvdh commented Mar 17, 2020

clokep commented Mar 27, 2020

clokep commented Apr 1, 2020

richvdh commented Apr 2, 2020

clokep commented Apr 30, 2020

richvdh commented May 7, 2020

User-Interactive Auth sessions need to be replicated across the processes of a worker-mode deployment #6705

User-Interactive Auth sessions need to be replicated across the processes of a worker-mode deployment #6705

Comments

anoadragon453 commented Jan 14, 2020

Description

anoadragon453 commented Jan 14, 2020

richvdh commented Feb 23, 2020

deepbluev7 commented Feb 23, 2020 • edited Loading

richvdh commented Mar 6, 2020 • edited Loading

clokep commented Mar 6, 2020

richvdh commented Mar 7, 2020

deepbluev7 commented Mar 7, 2020

richvdh commented Mar 7, 2020

clokep commented Mar 9, 2020

deepbluev7 commented Mar 10, 2020

clokep commented Mar 10, 2020

richvdh commented Mar 17, 2020

clokep commented Mar 27, 2020

clokep commented Apr 1, 2020

richvdh commented Apr 2, 2020

clokep commented Apr 30, 2020

richvdh commented May 7, 2020

deepbluev7 commented Feb 23, 2020 •

edited

Loading

richvdh commented Mar 6, 2020 •

edited

Loading