Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add login using ORCID oauth #2276

Merged
merged 15 commits into from
Jan 8, 2025
Merged

Conversation

matkaczmarek
Copy link
Contributor

@matkaczmarek matkaczmarek commented Aug 20, 2024

Change introduces three new endpoints:

  • authorcid_login -> which exchanges token with ORCID using Oauth and logs in the user if ORCID is already connected to the existing user or redirects to orcid_register for new users
  • orcid_register -> present a form to register new users and links ORCID to it once registered
  • orcid_init_login -> builds URL and redirects to ORCID for authorization

This PR also introduces changes in the login view and SSO login view:
image

@matkaczmarek matkaczmarek force-pushed the feature/orcid_login_1243 branch 2 times, most recently from d2dc9b3 to 03a3e01 Compare August 20, 2024 15:01
@matkaczmarek matkaczmarek marked this pull request as ready for review August 22, 2024 14:59
@tompollard tompollard changed the title Add logging using ORCID oauth Add login using ORCID oauth Sep 5, 2024
@bemoody
Copy link
Collaborator

bemoody commented Sep 19, 2024

There are a few issues I see here - some small, some larger - but first we need to understand the high level functionality.

We currently have the option to add an ORCID ID to a user's profile. Is it your intention that someone who has already done that should now be able to log in via ORCID?

What does "log in via ORCID" mean, exactly? What is the process that the user goes through (explain the steps for somebody who doesn't understand OAuth) and how does this process prove that the person sitting at the keyboard is the person with that ORCID ID?

You also need to explain how "registration" works: describe at a high level what steps are taken, from when a new user first visits the site, until they are fully registered and logged-in.

@matkaczmarek
Copy link
Contributor Author

We currently have the option to add an ORCID ID to a user's profile. Is it your intention that someone who has already done that should now be able to log in via ORCID?

Yes, it was my intention that someone who has already linked an ORCID ID profile should now be able to log in using ORCID. In my opinion, those shouldn't be separate, user still have the option to log in using a username and password. Moreover, linking ORCID ID uses the same Oauth mechanism to authorize a user. It would be a bad user experience that once accounts are linked, users cannot use all linked account benefits and have to link ORCID one more time just to log in.

What does "log in via ORCID" mean, exactly? What is the process that the user goes through (explain the steps for somebody who doesn't understand OAuth) and how does this process prove that the person sitting at the keyboard is the person with that ORCID ID?

From a user's perspective, once the user clicks Log in via ORCID (maybe we should rephrase it to Log in using ORCID), they are redirected to the ORCID authorization page where the user logs in to ORCID platform using ORCID credentials (here they are proving that the person sitting at the keyboard is the person with that ORCID ID). Once it's done we are redirected to our Physionet page and logged in if authorization was correct.

From a technical point of view, when the user is redirected to Physionet page after authorization, Physionet and Orcid exchange first the code, then the token, which consists of encrypted Orcid user data, such as ORCID ID. This token could also be used to access more ORCID user data, but we are not using this functionality. On our site, we are validating the token against malicious behavior. If everything is correct and this ORCID ID is linked to some user in Physionet then this user is logged in.

You also need to explain how "registration" works: describe at a high level what steps are taken, from when a new user first visits the site, until they are fully registered and logged-in.

If, after the authorization process, we see that this ORCID ID is not linked to any account, then we redirect users to a standard registration form, where they have to create a Physionet account providing a username, email, and password. Orcid profile is automatically linked to the new account using data from the Orcid authorization token. If an account is not created then nothing happens.

@bemoody
Copy link
Collaborator

bemoody commented Sep 23, 2024

On our site, we are validating the token against malicious behavior.

What is validated? What kind of malicious behavior? Please be specific.

This article seems like a good place to start: https://oauth.net/articles/authentication/

@matkaczmarek
Copy link
Contributor Author

What is validated? What kind of malicious behavior? Please be specific.

This article seems like a good place to start: https://oauth.net/articles/authentication/

Thanks for a great article. Referring to it, when I was writing On our site, we are validating the token against malicious behavior, I had in mind Injection of access tokens which was resolved by retrieving the access token directly from ORCID endpoint not from request parameters: https://github.com/MIT-LCP/physionet-build/pull/2276/files#diff-2f424d21ad5aad51520be5d93166608c26e0df7a678f42851d72ad8b62fd9f15R536-R547

Also when implementing ORCID authentication, I followed their documentation (Implement OAuth, 3 legged OAuth): https://info.orcid.org/documentation/api-tutorials/api-tutorial-get-and-authenticated-orcid-id/ which results in authenticated ORCID user.

@matkaczmarek matkaczmarek force-pushed the feature/orcid_login_1243 branch from 04185f1 to 8a5df40 Compare September 25, 2024 13:08
@matkaczmarek
Copy link
Contributor Author

Hi @bemoody, after our last call I two changes:

  • I added id_token verification by checking the signature of the ID token using ORCID JSON Web Key Set (ORCID public keys used to verify JWT)
  • Log in using ORCID is now only presented to the user if ORCID_SCOPE contains an openid scope.

@bemoody
Copy link
Collaborator

bemoody commented Sep 27, 2024

If somebody's ORCID account was compromised on Monday, and they reset their ORCID password on Tuesday, and they try to log in to PhysioNet using ORCID on Wednesday, how do we know whether it's the real person or the attacker?

If they logged in to PhysioNet using ORCID on Monday, and their ORCID account was compromised on Tuesday, and they reset their ORCID password on Wednesday, and they visit a PhysioNet page on Thursday using the session cookie they received on Monday, how do we know whether it's the real person or the attacker?

@matkaczmarek
Copy link
Contributor Author

matkaczmarek commented Sep 30, 2024

If somebody's ORCID account was compromised on Monday, and they reset their ORCID password on Tuesday, and they try to log in to PhysioNet using ORCID on Wednesday, how do we know whether it's the real person or the attacker?

If the password is changed on Tuesday then all the active ORCID sessions should be invalidated including this Monday's one, therefor if the attacker tries to log in to Physionet on Thursday they will have to go through the OAuth authentication flow again which requires knowledge of the new ORCID password

If they logged in to PhysioNet using ORCID on Monday, and their ORCID account was compromised on Tuesday, and they reset their ORCID password on Wednesday, and they visit a PhysioNet page on Thursday using the session cookie they received on Monday, how do we know whether it's the real person or the attacker?

It depends on how long we keep the session on our side. When the ORCID account is compromised, Physionet itself cannot automatically know this. Unfortunately, ORCID (like most of the OAuth providers) does not notify relying parties about password resets or compromises. It is on our side to invalidate session cookies as often as possible (24 hours seems reasonable). If the attacker gets the session cookie to Physionet, he/she will be able to access the attacked account until the cookie expires. Also, if someone's ORCID account is compromised, good practice is to change the passwords to all websites this ORCID account was linked.

Those are very extreme scenarios. As website creators, we often cannot determine whether the account was compromised. Currently, in our flow, if an attacker gets access to a user's email, then Physionet cannot do anything, as the attacker can change the password (using reset password form) and log in to Physionet using a new password. In both scenarios, we trust the users that they can keep their email/ORCID accounts safe.

@bemoody
Copy link
Collaborator

bemoody commented Sep 30, 2024

If the password is changed on Tuesday then all the active ORCID sessions should be invalidated including this Monday's one, therefor if the attacker tries to log in to Physionet on Thursday they will have to go through the OAuth authentication flow again which requires knowledge of the new ORCID password

Right. So how do you know that the person went through the ORCID login - how do you know that the "code" they gave you was generated recently?

It is on our side to invalidate session cookies as often as possible (24 hours seems reasonable). If the attacker gets the session cookie to Physionet, he/she will be able to access the attacked account until the cookie expires

PhysioNet session IDs are valid for 14 days. This is not a security problem, because when you change your password, all other sessions are invalidated.

Having sessions that expire without warning or recourse is a significant usability problem (which is why I've insisted that we don't change from the Django defaults.) For example, I don't think it's okay if, after somebody spends 8 hours uploading a file, the upload is rejected because they had logged in 31 hours earlier. In fact, we could and should do better than we're doing now.

@bemoody
Copy link
Collaborator

bemoody commented Sep 30, 2024

I've now done a bunch of reading on OIDC, looked at some existing libraries and was unimpressed. ;) But it seems like there are a couple of possibilities:

  • OIDC providers may allow the RP to refresh the id_token in the same way as refreshing the access_token. This seems ideal if it works.

  • Embed in the page a hidden iframe that, every N minutes, redirects itself to the ORCID authentication page (using the query parameter prompt=none) - thereby ensuring that as long as you keep the browser open, the tokens are regularly refreshed.

@matkaczmarek
Copy link
Contributor Author

Right. So how do you know that the person went through the ORCID login - how do you know that the "code" they gave you was generated recently?

It doesn't say in the documentation but I checked it experimentally and code is valid for no more than 10 minutes. After this time I got The authorization code is invalid or has expired error and it prevented me from exchanging it for access token and as a result I was not able to log in to Physionet

PhysioNet session IDs are valid for 14 days. This is not a security problem, because when you change your password, all other sessions are invalidated.

Having sessions that expire without warning or recourse is a significant usability problem (which is why I've insisted that we don't change from the Django defaults.) For example, I don't think it's okay if, after somebody spends 8 hours uploading a file, the upload is rejected because they had logged in 31 hours earlier. In fact, we could and should do better than we're doing now.

That's a valid case, but as said on a meeting the attackers will have access to the Physionet as long as they will have valid session. If we want to keep session alive as long as the ORCID session they it will last couple of hours (no more than one day).

@bemoody
Copy link
Collaborator

bemoody commented Oct 3, 2024

To elaborate on my previous comment:

Naturally, we only know what happens on our server. Verifying the id_token lets us know that the client was authenticated to ORCID servers at a particular time, but it doesn't tell us whether the client is still authenticated.

We want to allow authenticated clients to remain logged-in to PhysioNet for a reasonably long time (say, at least 24 hours), and we also want unauthenticated clients to be rejected within a reasonably short time period after they cease to be authenticated (say, at most one hour.)

That means that we need to regularly receive updated information from the ORCID service to tell us whether that client is still authenticated from their point of view.

There is no clever way to make this work without polling the ORCID server in some way.

making the client do the polling

One way to poll the ORCID server is to tell the client to go and get a new authorization code. It's possible that this could be done by embedding an iframe (e.g., in navbar.html, with appropriate CSS to make it invisible), where the content of the iframe contains:

<meta http-equiv="refresh" content="600; url=https://orcid.org/oauth/authorize?response_type=code&scope=openid&prompt=none&client_id=...&redirect_uri=...&state=...">

Then, after 600 seconds, the client will request a new auth code from orcid.org; orcid.org will give either a new auth code or an error; and we tell the client to do the same thing 600 seconds later.

Some logic is needed to ensure that if the user is authenticated via ORCID, and we have not received a positive response from the ORCID server recently, then the user is treated as anonymous. It might be possible to do this by expiring the Django session, but it might be better to use a separate middleware for this purpose.

Of course, we can also add logic to the user login view to say: if the client was previously authenticated via ORCID and has now expired, then redirect directly to ORCID without showing a PhysioNet login form.

polling on the server side

The other way to poll the ORCID server is to use the refresh_token, as described here: https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens

This might or might not work, depending on the OIDC provider. The advantage is that it's seamless from the client's POV - it works in places where client-side redirection is impossible.

This would have to be done by middleware, I would think. The Django auth backend API doesn't seem to provide a way to do it.

@matkaczmarek
Copy link
Contributor Author

Thanks @bemoody for explaining what you want to achieve. It makes sense but let me verify what options we have and whether they are programmaticaly doable. Thanks also for the tip with iframe and refresh_token. I will come back to you with potential solutions.

@matkaczmarek matkaczmarek force-pushed the feature/orcid_login_1243 branch 3 times, most recently from ed65f4f to 4e2602a Compare October 4, 2024 10:17
@rafalkosla101
Copy link
Contributor

rafalkosla101 commented Oct 14, 2024

Since i have partly taken over this one:
As far as I have tested the implementation orcid is preventing his pages from being loaded in any kind of iframes to prevent security issues such as clickjacking attacks. We could be probably instead of iframe using a popup window that is somewhere hidden and checking what happens inside of that window but it is definitely not a elegant way to handle it and it would still be limiting the time of usage to a couple of hours

Of course, we can also add logic to the user login view to say: if the client was previously authenticated via ORCID and has now expired, then redirect directly to ORCID without showing a PhysioNet login form.

That in a way is quite limiting since a user would be trapped into "orcid login" way of accessing our platform whenever he/she connects orcid to the physionet itself, we could be adding some sort of disconnect feature but that kind of defeats a purpose. We could be also adding a button to end all active sessions which would to some extent minimalize the risk of attacker having 14 days of free physionet access

polling on the server side

The other way to poll the ORCID server is to use the refresh_token, as described here: https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens

Refresh token is not deactivated when user changes password and the token by itself has quite long expiry date which means that the attacker would be able to access the site anyway.

@tompollard tompollard self-assigned this Dec 10, 2024
@matkaczmarek matkaczmarek force-pushed the feature/orcid_login_1243 branch from 604a45b to ec72682 Compare December 10, 2024 14:31
Copy link
Member

@tompollard tompollard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from minor comments, this all looks good to me, except for the following issue which I think is quite problematic:

  • Most (or at least many) users will have an ORCID but the ORCID will not be linked to their PhysioNet account. In these cases, the user will end up with duplicate accounts.

Do you have any thoughts on how we can manage this issue?

physionet-django/physionet/settings/base.py Outdated Show resolved Hide resolved
physionet-django/physionet/settings/base.py Outdated Show resolved Hide resolved
.env.example Show resolved Hide resolved
physionet-django/physionet/settings/base.py Outdated Show resolved Hide resolved
.env.example Outdated Show resolved Hide resolved
physionet-django/user/forms.py Show resolved Hide resolved
physionet-django/user/forms.py Outdated Show resolved Hide resolved
physionet-django/user/templates/user/login.html Outdated Show resolved Hide resolved
physionet-django/user/urls.py Outdated Show resolved Hide resolved
physionet-django/user/views.py Show resolved Hide resolved
@matkaczmarek
Copy link
Contributor Author

Aside from minor comments, this all looks good to me, except for the following issue which I think is quite problematic:

  • Most (or at least many) users will have an ORCID but the ORCID will not be linked to their PhysioNet account. In these cases, the user will end up with duplicate accounts.

Do you have any thoughts on how we can manage this issue?

I tried retrieving email from ORCID and using it to check whether we already have a user in our database with the same email. Unfortunately, by default email in ORCID is set as a private attribute by default which means you cannot read it using API and access token. Users can set it to public in ORCID and only then we can see this email. Moreover, you need to have read access set in ORCID_SCOPE and then make additional API calls to retrieve this email. In my opinion, it is too much overhead to do when it is almost guaranteed to fail.

The only thing the user can do is to link the ORCID account to the existing Physionet account using Settings > Orcid view:
image
Moreover, you can always unlink the ORCID account from your Physionet account and link it with another one. So if users by mistake duplicate accounts they can always unlink and link them to the correct one.

@matkaczmarek
Copy link
Contributor Author

Hi @tompollard, I don't have write access to merge this PR. As I said in yesterday's meeting, I checked it one more time on HDN staging, and everything seems to work as it should. Could you merge it for me or give me and @rafalkosla101 write access so that we can merge PRs in the future?

@tompollard tompollard merged commit fec5d6e into MIT-LCP:dev Jan 8, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants