-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniform login across cities #1381
Comments
This is now really important since we have links on each deployment city that links to each other city and when you do switch cities you have to register a new account (which is very strange) |
Yeah this is one of those large projects we could assign to a talented undergrad. |
A simple solution that avoids the mess of making a proper distributed system is to just modify the code for sign in/admin panel so that they connect to all city databases and query all of them. This also has the added benefit of letting us keep our current infrastructure, keeping it easy to deploy different versions of the site to different cities, and keeping inter-city data physically independent. Another solution that I like less is to have a single database that all cities use. For the tables that need to, add a city column to keep track of the city the user is auditing (although as I'm writing this I'm realizing that we may not even need to keep track of that because things like pano IDs should contain that information). This would sting more as time goes on because this single database will grow too big and/or our throughput will be too large to handle it. |
Just gonna toss in my two cents here, I like the solution 1 idea of having each server utilize the other servers to authenticate. While I'm not super familiar with modern authentication mechanisms, I think this might be a good place to utilize JSON Web Tokens (JWT). If my understanding of JWT is correct, it lets us store user details (e.g. username, email address, logged-in status) in a cookie on the client side, but with an added signature from the server so that the user cannot tamper with it. If we use the same signing key across all sites, then I think every site will be able to read/verify a JWT cookie set by any of the others. This would let us avoid having to set up cross-site database access. The JWT idea was briefly brought up #1347, but I think it might be very relevant to this issue as well. |
I think the first solution sounds decent... And I don't think that checking all other servers for authentication would take too much time..? Some things that we should keep in mind while implementing:
Building off that last point, what if instead of checking all the servers for credentials when someone logs in, we just create an account with the same credentials on every server when someone signs up? |
I imagine the number of people with duplicates isn't that high if you exclude the Sidewalk team. We could do this in 2 steps where in step 1 we prevent people from making duplicates across the servers and send people with current duplicates an email with instructions on changing their email/username, and in step 2 we change the email/username of some duplicate users and give them instructions to recover their account.
That's a good point. I like the idea of replicating their accounts on all the databases. That would solve this Point 3 also brings up a lot of concurrency issues.
It sounds like if we let ourselves have 1 master server, we could solve all of these by making it generate unique IDs and act as a "lock manager" for accounts. I'm not great with distributed systems though so that might be an awful solution. Perhaps Postgres has some fancy replication features that enforce consistency? |
Probably right. I just did a quick check and found 3 non-researcher email addresses that are present in both the DC and Seattle databases.
This sounds the most promising to me. I'm also not an expert on distributed systems, so I think it would be best to research what others are doing in this regard (stack exchange, for example?). I feel like a dedicated authentication server (that then propagates the info out) may be our best bet though. Although the single point of failure would prevent users from authenticating everywhere, hopefully the authentication server wouldn't be doing too much and would be unlikely to fail frequently :) It seems like a central server for authentication would solve a lot of potential concurrency issues. Again, I think researching what others do in our situation is really important. If you can't find answers online, don't hesitate to post on StackOverflow, etc. Also would love to know if @athersharif has any thoughts about this thread? |
Gonna take a break from this one since the quarter is almost over. I think when we continue it, we should try to keep it simple since it only is affecting a small number of users. We could also do a simple version first and then transition to a properly distributed system once we need to handle more users. |
This came up again for our Taiwan deployments where it particularly makes sense to have a unified login:
|
After our most recent PR (#3429) and the accompanying server-side changes, this should soon be possible! We are now going to have a single database, where each city is in it's own schema. Since all cities will be sharing a database, we should be able to add an additional schema for user authentication, and every city should be able to share it! There will be some pain as we attempt to merge accounts for the same user that were created in multiple cities (possibly with different usernames/emails/passwords), but the pain will be worth it!! |
A couple clarifying questions:
|
Totally agree that we should have a single authentication schema within the same database. The question is whether this authentication system references existing logins from each city's schema... I think that what we ultimately want to do is to transfer the logins to a central schema, and then fully remove the authentication data from the schemas for the individual cities.
Let's keep the PR as small as possible. We can create new issues for things like unified view of user data, unified admin data, Gallery, etc. I think that this PR should focus solely on the authentication:
|
@misaugstad Here's some notes on what I've learned about the authentication system with Play Silhouette that seem relevant to this ticket: SilhouetteModule.scalaThis file defines the Silhouette Environments, Services, and Providers in the authentication system. EnvironmentThe Silhouette environment is an object which defines the active user, authenticator service, credentials provider, and event bus. AuthenticatorServiceUses the CredentialsProviderChecks the credentials given by the user and authenticates it by checking against the credentials stored in database. The diagram below illustrates how the system connects. The green blocks are the database tables that the authentication system interacts with, and the orange blocks are the implementations of the DAOs. Some things I couldn't figure out about that seem relevant:
Silhouette framework documentation |
Here's a plan for tackling this ticket and some questions:
However, this seems slow and adds unnecessary runtime on every failed login attempt, and also keeps old unused account data in the old schemas.
|
Thank you so much!! Will take a look later!
I like it!
I also like it!
I imagine that this is a one-time thing! I think that I would make a database dump as a backup for every city, then we do the one-time transfer and drop the tables from the city-specific schemas. Ideally the application should function like it's had unified login from the very beginning, so doing a one-time transfer and not having to continually check for data in the old schemas is ideal! |
Offboarding soon so here's an update on where this ticket is at: Commit with most recent changes: 9999808 Done so far:
To-do
Additional Notes
|
Thank you for this, this is super helpful!! Some other stuff I thought of while reading this:
@davphan one question: Why are the evolutions all split into different files? Is that just to make it easier to code and understand what's going on? Or was there a more practical reason? |
That makes much more sense now, there's an
will need to be modified, on top of adding that table to the Login Schema and any other plain SQL queries that reference that table.
I started off having everything in one SQL file, but there were issues when starting the website that seemed like the queries weren't being run in order (like the table creations would fail saying the schema had not been created, etc.). When I split queries up so that the necessary preceding queries were in previous SQL files, those errors disappeared. I'm not super familiar with how evolution files are compiled in scala so I'm not sure if that fixed it or if there was something else involved that I'm not aware of. |
Thank you so much for all of your work on this @davphan! You really were 90% of the way there in terms of getting the schemas set up and such! I've got it working on my local dev environment when moving the data from a single city into this schema. Now it's all about merging the existing data! Just to continue to remind myself: I haven't tested password resets yet. Signing in/up/out is working correctly though! Unfortunately, I've come to the conclusion that we're not going to be able to do this automagically through an evolutions file. We're going to have a short downtime when I will migrate the data to the new centralized authentication schema. Assuming that nothing goes wrong, the downtime should be incredibly short! The plan right now is to write all auth data from each city to a CSV, write a Python script that will merge the data and output a new CSV, then import that data into the new schema. Thankfully, I can essentially do that whole process locally using the production data as it exists now, and I can figure out every edge cases that we currently have. New edge cases don't pop up frequently, so I should be able to fully test everything before we actually try it out on prod. I've already started working through some edge cases and cleaning up some anomalies in the data. And then we can of course do the migration of the data on the test servers as a dry run of prod. It won't have all the data from prod (though I suppose that we could copy all the prod dbs over if we really wanted to 🤔), but it will give us the opportunity to debug any issues related to database permissions, etc. that I wouldn't be able to test on my local machine. I'm going to be incredibly careful with this, because we're talking about users being able to login! I plan to ensure that absolutely nothing breaks during this migration :) |
Lots of progress today! I believe that I've fixed all the anomalies in the data, and have the script set up to clean and prep the data. What's left:
|
I posted this on the PR, but I'll post it here as well! As promised, I have uploaded the scripts and whatnot that I used to do the data migration. They are in this Google Drive folder which only @jonfroehlich and I have access to! |
would be nice to have:
Essentially, to the user, everything should look unified even if the backend is modularized and split.
The text was updated successfully, but these errors were encountered: