Remove token #528

doug-newman-nasa · 2024-04-19T12:30:46Z

The new cassette for the test_data_links has some information in it pertaining to the user who executed the test. This PR redacts it.
BTW - at the hackathon I stated that I was going to add a VCR cassette for test_data_links which I attempted to PR today only to find the same work had been merged three days ago. We need to coordinate better.

📚 Documentation preview 📚: https://earthaccess--528.org.readthedocs.build/en/528/

chuckwondo · 2024-04-19T13:26:32Z

Thanks for catching and redacting the token!

chuckwondo

Thanks Doug! My bad on the token issue, and for failing to coordinate. It slipped my mind that you had mentioned this, but now I recall discussing with @mfisher87 the need to handle this very situation.

However, instead of simply hand-modifying the cassette, I'd like to see proper request and response filtering. Without using the vcrpy capabilities for scrubbing sensitive information, if someone were to re-record this cassette, they would run the risk of committing sensitive information again, if they didn't realize (like me) that sensitive information was being recorded.

See:

doug-newman-nasa · 2024-04-19T13:39:31Z

Yup. I looked into using
with my_vcr.use_cassette('test.yml', filter_post_data_parameters=['api_key']): requests.post('http://api.com/postdata', data={'api_key': 'secretstring'})
but I couldn't get it working properly.

chuckwondo · 2024-04-19T14:20:21Z

Yup. I looked into using with my_vcr.use_cassette('test.yml', filter_post_data_parameters=['api_key']): requests.post('http://api.com/postdata', data={'api_key': 'secretstring'}) but I couldn't get it working properly.

Ok. I'll take a look. If I can't sort it out in short order, I'll let you know and we can approve this redaction in the meantime. If that's the case, then I'll open a new issue to later address this via vcrpy functionality.

doug-newman-nasa · 2024-04-19T14:21:57Z

Sorry, I should have been more specific. I can keep looking at it. Just not immediately.

mfisher87 · 2024-04-19T17:39:25Z

Great catch, Doug!

jhkennedy · 2024-04-19T19:49:07Z

BTW - at the hackathon I stated that I was going to add a VCR cassette for test_data_links which I attempted to PR today only to find the same work had been merged three days ago. We need to coordinate better.

One way that might help here is to put PRs into, or back into, draft status if there's planned future work but the PR appears to be in ready-for-review or ready-to-merge state.

doug-newman-nasa · 2024-04-25T15:13:39Z

This is a 'preliminary pull request' because I wanted to get your eyes on the redact_key_values code. Not being a python coder I'm sure it requires some love. I'm going to work on redacting uri: https://urs.earthdata.nasa.gov/api/users/dschuck?client_id=

chuckwondo · 2024-04-25T16:11:24Z

This is a 'preliminary pull request' because I wanted to get your eyes on the redact_key_values code. Not being a python coder I'm sure it requires some love. I'm going to work on redacting uri: https://urs.earthdata.nasa.gov/api/users/dschuck?client_id=

Thanks Doug! I'll take an initial pass over your updates.

tests/unit/test_results.py

chuckwondo · 2024-04-25T16:16:01Z

tests/unit/test_results.py

+            ("client_id", "foo"),
+        ]
+
+        def redact_key_values(keys_to_redact):


Nice use of a higher-order function here! I'd suggest just making this a top level function. No need to nest this within the _get_vcr method, nor within the TestResults class at all.

tests/unit/test_results.py

chuckwondo · 2024-04-25T21:56:02Z

tests/unit/test_results.py

+            def before_record_response(response):
+                # Only do this if the response has not been recorded
+                string_body = response["body"]["string"].decode("utf8")
+                if REDACTED_STRING not in string_body:
+                    # Only do this is if the body contains one or more
+                    # of the keys to redact.
+                    if any(key in string_body for key in keys_to_redact):
+                        try:
+                            is_list = False
+                            # Marshall into json object, if it is a JSON object.
+                            payload = json.loads(string_body)
+                            if isinstance(payload, list):
+                                payload = payload[0]
+                                is_list = True
+                            for key in keys_to_redact:
+                                if key in payload:
+                                    # Redact the key value
+                                    payload[key] = REDACTED_STRING
+                            # Write out the updated json object to the response
+                            # body string.
+                            if is_list:
+                                payload = [payload]
+                            response["body"]["string"] = json.dumps(payload).encode()
+                        except ValueError:
+                            # If it is not a json object, return
+                            return response
+
+                return response


To simplify this logic a bit, I suggest not bothering to avoid redacting an already redacted payload. This redaction is idempotent, so redacting an already redacted payload yields the same result, and in the context of running unit tests, the performance impact is likely unnoticeable.

Suggested change

def before_record_response(response):

# Only do this if the response has not been recorded

string_body = response["body"]["string"].decode("utf8")

if REDACTED_STRING not in string_body:

# Only do this is if the body contains one or more

# of the keys to redact.

if any(key in string_body for key in keys_to_redact):

try:

is_list = False

# Marshall into json object, if it is a JSON object.

payload = json.loads(string_body)

if isinstance(payload, list):

payload = payload[0]

is_list = True

for key in keys_to_redact:

if key in payload:

# Redact the key value

payload[key] = REDACTED_STRING

# Write out the updated json object to the response

# body string.

if is_list:

payload = [payload]

response["body"]["string"] = json.dumps(payload).encode()

except ValueError:

# If it is not a json object, return

return response

return response

def redact(payload):

for key in keys_to_redact:

if key in payload:

payload[key] = "REDACTED"

return payload

def before_record_response(response):

body = response["body"]["string"].decode("utf8")

with contextlib.suppress(json.JSONDecodeError):

payload = json.loads(body)

redacted_payload = (

list(map(redact, payload))

if isinstance(payload, list)

else redact(payload)

)

response["body"]["string"] = json.dumps(redacted_payload).encode()

return response

Also, please move the entirety of redact_key_values to the top level. It does not need to be nested with the _get_vcr method. The nesting just adds unnecessary clutter.

Same goes for the nested redact_login_request below.

Oddly, I move this method out of TestCase it no longer gets called. Investigating.

doug-newman-nasa · 2024-04-26T12:46:11Z

I've made all the suggestions changes with the exception of moving redact_key_values out of TestCase which prevents it from being called. Looking into that next.

chuckwondo · 2024-04-26T18:08:45Z

@doug-newman-nasa, I put these at the top level in tests/unit/test_results.py (preceding class TestResults), and everything works fine for me:

def redact_login_request(request):
    if "/api/users/" in request.path and "/api/users/tokens" not in request.path:
        _, user_name = os.path.split(request.path)
        request.uri = request.uri.replace(user_name, REDACTED_STRING)

    return request


def redact_key_values(keys_to_redact):
    def redact(payload):
        for key in keys_to_redact:
            if key in payload:
                payload[key] = REDACTED_STRING
        return payload

    def before_record_response(response):
        body = response["body"]["string"].decode("utf8")

        with contextlib.suppress(json.JSONDecodeError):
            payload = json.loads(body)
            redacted_payload = (
                list(map(redact, payload))
                if isinstance(payload, list)
                else redact(payload)
            )
            response["body"]["string"] = json.dumps(redacted_payload).encode()

        return response

    return before_record_response

Give it a shot and let me know what you see.

Also, I'd like to request a minor tweak to an assertion in the function test_get_all_more_than_2k. Because the number of granules is changing, if someone decide to regenerate all of the cassettes, this test will fail due to a difference in the length of the granules list.

So instead of this assertion:

        self.assertEqual(len(granules), 2520)

let's do this, which will be less strict, but still validate the results correctly:

        self.assertIn(len(granules), range(2001, 3001))

doug-newman-nasa · 2024-04-26T19:09:28Z

Regarding test_get_all_more_than_2k what I am trying to test here is that the correct number of granules were marshaled from the result. Using a range comparison will limit the effectiveness of that test. However, I'm not actually testing against the value in the result rather the result I fished out manually. So I'm going to counter by asserting equality of
len(granules)
against the value of 'hits' in the response
string: '{"hits":2520,"took":138,"items":[]}'
Does that work?

chuckwondo · 2024-04-26T20:14:22Z

Regarding test_get_all_more_than_2k what I am trying to test here is that the correct number of granules were marshaled from the result. Using a range comparison will limit the effectiveness of that test. However, I'm not actually testing against the value in the result rather the result I fished out manually. So I'm going to counter by asserting equality of len(granules) against the value of 'hits' in the response string: '{"hits":2520,"took":138,"items":[]}' Does that work?

Perfect! Even better!

chuckwondo

Fantastic! Thanks Doug!

chuckwondo · 2024-04-26T20:52:35Z

I've approved this, but does anybody else want to review this?

BTW, I'd like to suggest that we squash merge PRs to have a tidier, more readable change history.

chuckwondo

Sorry Doug. One last tweak, please.

tests/unit/test_results.py

chuckwondo

Thank you sir! Nice work.

chuckwondo · 2024-04-29T12:19:25Z

Last call for someone else to review. If I don't get any takers today, I'll proceed with a squash merge near COB US Eastern.

doug-newman-nasa added 2 commits April 19, 2024 08:20

Removed token and id information from cassettte

092a32e

poetry lock updates

7622b1a

chuckwondo self-requested a review April 19, 2024 13:26

chuckwondo requested changes Apr 19, 2024

View reviewed changes

chuckwondo mentioned this pull request Apr 20, 2024

update the dependency and temporal filter for python_cmr 0.10.0 #520

Closed

doug-newman-nasa added 3 commits April 25, 2024 10:56

Redacted responses from EDL.

a2de410

Minor refactor on body string

0501bb3

Lint/Format changes

9fb8558

chuckwondo reviewed Apr 25, 2024

View reviewed changes

doug-newman-nasa added 2 commits April 25, 2024 16:54

Added request redacting and payload -> list code.

cd5e671

Format fixes

901d5f2

chuckwondo reviewed Apr 25, 2024

View reviewed changes

doug-newman-nasa added 3 commits April 26, 2024 08:00

Removed check for recorded already

5b7f5d1

Moved login redaction outside of TestCase

50fef6e

Added contextlib.suppress to remove try block

316f677

Refactored redaction out of TestCase class.

269076d

More robust check of granules marshaled.

18eb0b4

chuckwondo approved these changes Apr 26, 2024

View reviewed changes

chuckwondo requested changes Apr 27, 2024

View reviewed changes

tests/unit/test_results.py Show resolved Hide resolved

Resolved PR comment.

ea20c7c

chuckwondo approved these changes Apr 29, 2024

View reviewed changes

chuckwondo merged commit 73e7959 into nsidc:main Apr 29, 2024
11 checks passed

chuckwondo mentioned this pull request Apr 29, 2024

Issue 421 - option to use Earthdata User Acceptance Testing (UAT) system #426

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove token #528

Remove token #528

doug-newman-nasa commented Apr 19, 2024 •

edited by github-actions bot

Loading

chuckwondo commented Apr 19, 2024

chuckwondo left a comment

doug-newman-nasa commented Apr 19, 2024

chuckwondo commented Apr 19, 2024

doug-newman-nasa commented Apr 19, 2024

mfisher87 commented Apr 19, 2024

jhkennedy commented Apr 19, 2024

doug-newman-nasa commented Apr 25, 2024

chuckwondo commented Apr 25, 2024

chuckwondo Apr 25, 2024

chuckwondo Apr 25, 2024 •

edited

Loading

doug-newman-nasa Apr 26, 2024

doug-newman-nasa commented Apr 26, 2024

chuckwondo commented Apr 26, 2024

doug-newman-nasa commented Apr 26, 2024 •

edited

Loading

chuckwondo commented Apr 26, 2024

chuckwondo left a comment

chuckwondo commented Apr 26, 2024

chuckwondo left a comment

chuckwondo left a comment

chuckwondo commented Apr 29, 2024

Remove token #528

Remove token #528

Conversation

doug-newman-nasa commented Apr 19, 2024 • edited by github-actions bot Loading

chuckwondo commented Apr 19, 2024

chuckwondo left a comment

Choose a reason for hiding this comment

doug-newman-nasa commented Apr 19, 2024

chuckwondo commented Apr 19, 2024

doug-newman-nasa commented Apr 19, 2024

mfisher87 commented Apr 19, 2024

jhkennedy commented Apr 19, 2024

doug-newman-nasa commented Apr 25, 2024

chuckwondo commented Apr 25, 2024

chuckwondo Apr 25, 2024

Choose a reason for hiding this comment

chuckwondo Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

doug-newman-nasa Apr 26, 2024

Choose a reason for hiding this comment

doug-newman-nasa commented Apr 26, 2024

chuckwondo commented Apr 26, 2024

doug-newman-nasa commented Apr 26, 2024 • edited Loading

chuckwondo commented Apr 26, 2024

chuckwondo left a comment

Choose a reason for hiding this comment

chuckwondo commented Apr 26, 2024

chuckwondo left a comment

Choose a reason for hiding this comment

chuckwondo left a comment

Choose a reason for hiding this comment

chuckwondo commented Apr 29, 2024

doug-newman-nasa commented Apr 19, 2024 •

edited by github-actions bot

Loading

chuckwondo Apr 25, 2024 •

edited

Loading

doug-newman-nasa commented Apr 26, 2024 •

edited

Loading