Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to use UAT environment #421

Closed
danielfromearth opened this issue Dec 29, 2023 · 32 comments · Fixed by #426
Closed

Option to use UAT environment #421

danielfromearth opened this issue Dec 29, 2023 · 32 comments · Fixed by #426
Labels
enhancement New feature or request

Comments

@danielfromearth
Copy link
Collaborator

danielfromearth commented Dec 29, 2023

Challenge

earthaccess does not currently support access to collections that are in NASA Earthdata's User Acceptance Testing (UAT) environment. Instead, the production (PROD) environment is assumed when logging in via earthaccess and there is no option to change the environment.

Desired functionality

When executing the earthaccess.login() method, an option to specify which environment should be available. And the authentication should be correctly applied to the specified environment.

Example

I encountered this challenge when trying to test data transformation code with a data collection for the Tropospheric Emissions Monitoring of POllution (TEMPO) instrument that is in UAT. I would like to search and download granules from this collection using earthaccess, but that is currently not possible.

Benefit

If the UAT environment can be accessed with earthaccess, its usability would be expanded to more dataset testing.

@danielfromearth danielfromearth added the enhancement New feature or request label Dec 29, 2023
@mfisher87
Copy link
Collaborator

Should we support manually overriding the EDL base URL to anything? That way we can support other existing environments like System Integration Testing (SIT), and any others that may be created in the future.

@danielfromearth
Copy link
Collaborator Author

danielfromearth commented Jan 2, 2024

Supporting SIT would be great too! It might be more secure/simpler to just allow UAT/SIT/PROD for now?

@danielfromearth
Copy link
Collaborator Author

@mfisher87 , @betolink,

Currently, one of the options for .login()'s strategy= argument is "environment".
Because of that, it seems to me like it might be confusing to add a new keyword argument to .login() named environment=, which would result in function calls looking something like this:

earthdata.login(strategy="environment", environment=Env.UAT)

Using "environment" twice where it means two different things looks strange, right?

Would "endpoint" perhaps be a better name for the new keyword argument? Other suggestions?

@jhkennedy
Copy link
Collaborator

"maturity" is generally how I think about UAT/SIT/PROD, but "cmr_endpoint" is a good option as well (just endpoint makes me go "to what?")

@danielfromearth
Copy link
Collaborator Author

nice, I like "cmr_endpoint" too

@danielfromearth
Copy link
Collaborator Author

I've started working on this issue but could use some more expert eyes to ensure an appropriate token is available when using UAT. Specifically, my question is:

Will the client_id parameter on this line — for the self.EDL_GET_PROFILE variable — need to change when the base of the URL changes from "urs.earthdata.nasa.gov" to "uat.urs.earthdata.nasa.gov"

@mfisher87
Copy link
Collaborator

mfisher87 commented Jan 10, 2024

Yeah, we'll need different client IDs for each environment.

This comment from a few lines up is prescient:

# Maybe all these predefined URLs should be in a constants.py file

The URLs all repeat the hostname too, so maybe instead of

EDL_GET_TOKENS_URL = "https://urs.earthdata.nasa.gov/api/users/tokens"

We would rather store

EDL_GET_TOKENS_PATH = "/api/users/tokens"

and concatenate these values with the variable hostname.

EDIT: Thinking a bit about this, the variable defined just above is what I think of when I think "endpoint", so maybe that's not an ideal term for UAT/SIT/PROD... 🤷

@danielfromearth
Copy link
Collaborator Author

Thanks @mfisher87. Then it seems like two questions remain...

  1. How do we get/determine what the Client ID is for each environment? (
  2. What should the .login() function signature look like?
    Another option for the new argument name: "cmr_environment", as in:
earthdata.login(strategy="environment", cmr_environment=Env.UAT)

How does that look?

@mfisher87
Copy link
Collaborator

  1. Maybe we need a mapping from envs (keys) to hostnames and client IDs. These are both solely dependent on the EDL environment/maturity level, to the best of my knowledge.
  2. I'm really struggling with what I think would be a good parameter name for this. I'll keep thinking!

@mfisher87
Copy link
Collaborator

mfisher87 commented Jan 10, 2024

I'm realizing something after seeing the suggestion cmr_environment. Changing these URLs is actually changing the Earthdata Login environment, not CMR environment. Should we be linking these two things together so when the user selects UAT, both CMR UAT and EDL UAT are used? Or should the user be able to change both independently? If the latter, maybe cmr_{environment,maturity} & edl_{environment,maturity} make sense. If the former, then we need to think about how to communicate that this parameter will change multiple APIs. I'm not opposed to being wordy, like cmr_and_edl_maturity.

@danielfromearth
Copy link
Collaborator Author

danielfromearth commented Jan 10, 2024

Yeah, I've been thinking of this argument as something that will change both EDL and CMR. Not just one or the other independently. Could be cmr_and_edl_env?

I'd also be happy with cmr_and_edl_maturity

@danielfromearth
Copy link
Collaborator Author

danielfromearth commented Jan 10, 2024

  1. Maybe we need a mapping from envs (keys) to hostnames and client IDs. These are both solely dependent on the EDL environment/maturity level, to the best of my knowledge.

I'm actually just not sure where to find the Client ID value to use for UAT.

Is a Client ID something I can find on some Earthdata page now — or, do we need to create a new Earthdata Application (in UAT, and then again in SIT)?

@mfisher87
Copy link
Collaborator

I don't know if any "application" (that term always throws me off when it refers to an integration) was created for UAT/SIT. @betolink I think would know.

Either way, while we're in there we should ensure all earthaccess maintainers are admins of all the EDL integrations :)

@danielfromearth
Copy link
Collaborator Author

@betolink, do you know the answer to the above question about the Client ID?

@betolink
Copy link
Member

betolink commented Feb 3, 2024

Sorry for the delayed response @danielfromearth, I think we can ignore client_id, this is only telling URS which app is requesting the user's info and this one is actually not even earthaccess', I think I took one of NSIDC's apps (yeah, we should perhaps create an earthaccess app in launchpad etc etc)

I like what you suggested

earthdata.login(cmr_environment=Env.UAT)

Or perhaps using a "EARTHACCESS_ENV=UAT" (default being prod)

then earthaccess will do something similar to what's already doing:

  1. Look for environment variables, but this time would be
    • EARTHDATA_USERNAME_UAT
    • EARTHDATA_PASSWORD_UAT
  2. Look for a profile in the .netrc that points to the UAT EDL
     machine urs.uat.earthdata.nasa.gov login `my_uat_user` `my_uat_password`
  3. Ask the user their credentials and authenticate using UAT EDL

@danielfromearth
Copy link
Collaborator Author

Based on discussions with @betolink and @nikki-t, going to move forward with this signature for now:

earthdata.login(earthdata_env=Env.UAT)

@chuckwondo
Copy link
Collaborator

After a bit of digging, I see that we require a client_id when we request a user's profile.

However, why are we including a function to get a user's profile?

I just submitted a question to EDL support regarding the need for a client_id because the documentation does not appear to align with the behavior:

If a user token is provided in the Authorization: Bearer header, then the client_id paremeter is required to identify the calling client application.

However, we are using an access token, not a user token, so the documentation implies to me that we should not need to supply client_id. Unfortunately, this is not the case. When not including client_id (a valid one), I get this error:

{'error': 'invalid_token', 'error_description': 'The required client id is missing from the request'}

So I see the following options for us:

  1. If EDL deems the error above to represent a bug (i.e., the doc is correct, and when using an access token, a client_id should not be required), wait for them to fix the issue. This is probably the least attractive option (and possibly the least likely, or certainly not timely).
  2. Register an earthaccess application in both OPS (PROD) and UAT in order to obtain a client_id for each system. BTW, I confirmed that the client_id we're using does not work for UAT, which makes sense.
  3. Eliminate the Auth.get_user_profile method. Again, why is this included in the first place? Do we really need this? Is anybody making use of it? If so, what for?

@danielfromearth
Copy link
Collaborator Author

@chuckwondo, your question about why resonates with me — and the third option sounds reasonable, but I don't really know what the range of considerations would be.

(Should we move this client_id question to a new issue, and also not wait on it for #426?)

@chuckwondo
Copy link
Collaborator

However, we are using an access token, not a user token, so the documentation implies to me that we should not need to supply client_id.

I see the problem. I incorrectly assumed we use access tokens because when I look at the auth token the key is "access_token". However, I see elsewhere in our code that we are actually obtaining user tokens, so the EDL API appears to be working as expected.

@chuckwondo, your question about why resonates with me — and the third option sounds reasonable, but I don't really know what the range of considerations would be.

(Should we move this client_id question to a new issue, and also not wait on it for #426?)

We cannot move the client_id question to a new issue because your PR won't work in UAT without a registered app in UAT so that we can obtain a client_id appropriate for UAT. The existing client_id works only in PROD.

We either create a UAT app to get a client_id (we should do the same for PROD too, but that might have other ramifications, which we might want to create a separate issue for), which then means we also need to move client_id to our new System class, OR we get rid of get_user_profile altogether, along with eliminating client_id altogether.

@betolink
Copy link
Member

betolink commented May 9, 2024

I think @danielfromearth already tested this in UAT and seemed to work. I'll test with some NSIDC UAT-only datasets and report back. I think the EDL documentation states that it shouldn't work but it does. I'd say we leave it like this for now and we open a new issue to register earthaccess as a client on its own.

@chuckwondo
Copy link
Collaborator

I think @danielfromearth already tested this in UAT and seemed to work. I'll test with some NSIDC UAT-only datasets and report back. I think the EDL documentation states that it shouldn't work but it does. I'd say we leave it like this for now and we open a new issue to register earthaccess as a client on its own.

No, get_user_profile does not work for UAT because the client_id is for PROD. Attempting to do so produces the following error:

{'error': 'invalid_token', 'error_description': 'This application does not exist'}

@betolink
Copy link
Member

betolink commented May 9, 2024

Ah OK, I was talking about the search and access part not the get_user_profile itself. Should we wait to merge this until we have valid client_id for earthaccess or merge with "works except for get_user_profile"?

@chuckwondo
Copy link
Collaborator

We could, but that goes back to my other question. Why do we have get_user_profile? Is anybody using it, and if so what for? I'd prefer simply to eliminate it because it is the only thing that uses client_id.

@chuckwondo
Copy link
Collaborator

If we can eliminate get_user_profile, then that eliminates the need for any client_id.

@betolink
Copy link
Member

betolink commented May 9, 2024

I thought we used it for the tokens but that's not the case, I feel we'll need it in the future but for now we can eliminate it.

@betolink
Copy link
Member

betolink commented May 9, 2024

and what about a release after this PR?

@chuckwondo
Copy link
Collaborator

My only concern would be the possibility of this being a breaking change since Auth.get_user_profile is part of the public API, so if some user of earthaccess is using that method, then we would break their code.

However, I would be surprised if anybody uses that method, so I'm willing to risk pulling the method.

@chuckwondo
Copy link
Collaborator

and what about a release after this PR?

Sure, sounds reasonable to me.

@chuckwondo
Copy link
Collaborator

My only concern would be the possibility of this being a breaking change since Auth.get_user_profile is part of the public API, so if some user of earthaccess is using that method, then we would break their code.

However, I would be surprised if anybody uses that method, so I'm willing to risk pulling the method.

@betolink, @mfisher87, @jhkennedy, any concern here with removing Auth.get_user_profile method so we eliminate the need for client_id?

@betolink
Copy link
Member

betolink commented May 9, 2024

no concerns, we should get a client_id and in the future bring this back if needed for other API integrations.

@chuckwondo
Copy link
Collaborator

no concerns, we should get a client_id and in the future bring this back if needed for other API integrations.

Great! @danielfromearth, please remove use of client_id throughout and drop the Auth.get_user_profile method, which will also require moving a few lines near the end of Auth._get_credentials.

Once that's done, I'll take one final (hopefully!) look at your changes so we can finally land your work. Thanks so much for your perseverance and patience!

@mfisher87
Copy link
Collaborator

Thanks so much for your perseverance and patience!

@danielfromearth 💯 this :) That PR had a lot going on, amazing work making this feature happen. And for the idea in the first place!

@chuckwondo amazing work asking important questions along the way!

chuckwondo pushed a commit that referenced this issue May 10, 2024
…tem (#426)

Co-authored-by: Matt Fisher <mfisher87@gmail.com>
Co-authored-by: Chuck Daniels <chuck@developmentseed.org>
Co-authored-by: Daniel Kaufman <danielfromearth@users.noreply.github.com>

Fixes #421
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants