Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact basic authentication passwords from log messages #5773

Merged
merged 10 commits into from
Oct 19, 2018
Merged

Redact basic authentication passwords from log messages #5773

merged 10 commits into from
Oct 19, 2018

Conversation

orf
Copy link
Contributor

@orf orf commented Sep 10, 2018

This PR #5590 had merge conflicts, so I made a new one with the changes suggested. Fixes #4746

@@ -880,15 +880,29 @@ def split_auth_from_netloc(netloc):
return netloc, user_pass


def redact_password_from_url(url):
def _redact_netloc(netloc):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pull this function out to module level so it can be tested separately. You can use the same parametrized test cases as test_split_auth_from_netloc(). This will combinatorially cut down the number of test cases you need to properly test redact_password_from_url(), since redact_password_from_url() will be a simple combination of redact_netloc() and transform_url().


def transform_url(url, transform_netlock):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this function to be above redact_password_from_url() and remove_auth_from_url() (and after redact_netloc()), since it is used in both of them, and it is nice to have a progression so that functions within a module depend only on functions that precede them (so it can be read from top to bottom).


def transform_url(url, transform_netlock):
purl = urllib_parse.urlsplit(url)
netloc = transform_netlock(purl.netloc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: transform_netloc()

('https://user:pass@domain.tld/project/tags/v0.2',
'https://domain.tld/project/tags/v0.2'),
('https://user:pass@domain.tld/svn/project/trunk@8181',
'https://domain.tld/svn/project/trunk@8181'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave these test cases alone in the same order, especially if you're not changing them. Otherwise, it makes it needlessly harder to tell whether the tests have been changed or not.

('git+ssh://git@pypi.org/something',
'git+ssh://git@pypi.org/something'),
('git+https://user:pass@pypi.org/something',
'git+https://user:****@pypi.org/something'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like more test cases than you need. If you break out redact_netloc() like I suggested above and test it separately, you should be able to reduce the number of test cases you need for combinatorial reasons. You can focus on testing redact_netloc() well (which is a simpler function). And then test_redact_password_from_url() can be limited to making sure that redact_password_from_url() is "wired up" correctly from redact_netloc() and transform_url().

@orf
Copy link
Contributor Author

orf commented Sep 11, 2018

Thanks for the prompt review @cjerdonek. I've made the changes you requested. Regarding testing, I just did a mock.patch to assert that redact_password_from_url is hooked up correctly. Is this OK?

# and are not recognized in the url.
def redact_netloc(netloc):
netloc, (user, passw) = split_auth_from_netloc(netloc)
if not user:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check for None it seems.

@cjerdonek
Copy link
Member

cjerdonek commented Sep 11, 2018

Thanks. Looks a lot better. I’m not a huge fan of mocks, especially when not needed. I think it would be just as good or better to do one or two test cases to confirm that it’s being called (e.g. one with and one without password).

Also, one reason the mock test isn’t so great is that it’s not checking the return value, which is the most important thing to be checking.

# Test the password containing a : symbol.
('user:pass:word@example.com', 'user:****@example.com'),
])
def test_redact_netloc(netloc, expected):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this test to be after the split-auth test (so same order as in the module).

@@ -153,4 +153,4 @@ def test_get_formatted_locations_basic_auth():
finder = PackageFinder([], index_urls, session=[])

result = finder.get_formatted_locations()
assert 'user' not in result and 'pass' not in result
assert 'user' in result and 'pass' not in result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also checking that “****” is in the result to check / make it more obvious that redaction should be happening?

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Sep 27, 2018
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Oct 10, 2018
@orf
Copy link
Contributor Author

orf commented Oct 10, 2018

Thanks for the review @cjerdonek, I've made the changes as requested.

Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.



@patch('pip._internal.utils.misc.transform_url')
def test_redact_password_from_url(mocked_transform_url):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like trying to mock here just obscures the test.


@patch('pip._internal.utils.misc.transform_url')
def test_redact_password_from_url(mocked_transform_url):
redact_password_from_url('user@example.com')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"user@example.com" isn't a full URL. I would test with an actual URL.

@@ -373,8 +374,6 @@ class Test_normalize_path(object):
# it's easiest just to skip this test on Windows altogether.
@pytest.mark.skipif("sys.platform == 'win32'")
def test_resolve_symlinks(self, tmpdir):
print(type(tmpdir))
print(dir(tmpdir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to this PR, so I would leave it alone.

@@ -907,6 +911,18 @@ def remove_auth_from_url(url):
return surl


def redact_password_from_url(url):
"""Replace the password in a given url with ****"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a period at the end of the sentence.

@@ -907,6 +911,18 @@ def remove_auth_from_url(url):
return surl


def redact_password_from_url(url):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this function to be after remove_auth_from_url() (since it is a "fancier" version).

# username/pass params are passed to subversion through flags
# and are not recognized in the url.
def redact_netloc(netloc):
netloc, (user, passw) = split_auth_from_netloc(netloc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just write out "password" since it's not that much longer.

Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more.

if user is None:
return netloc
password = '' if password is None else ':****'
return '{user}{passw}@{netloc}'.format(user=user,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"passw" -> "password"



def test_redact_password_from_url():
with patch('pip._internal.utils.misc.transform_url') as mocked_transform_url:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was saying before that I think mocking obscures the test. Why can't you just pass in a few real URL's like the other tests (e.g. one with a user-pass, one with no user-pass, and one with a user but no pass).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good idea, I dislike mocks as well. Done!

@cjerdonek
Copy link
Member

Looking a lot better. Thanks!

@cjerdonek cjerdonek added this to the Print Better Error Messages milestone Oct 10, 2018
Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final(?) minor comments.

src/pip/_internal/index.py Show resolved Hide resolved
news/4746.bugfix Outdated
@@ -0,0 +1 @@
Redact password from the URL in various log messages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Redact the password"

src/pip/_internal/utils/misc.py Show resolved Hide resolved

def transform_url(url, transform_netloc):
Copy link
Member

@cjerdonek cjerdonek Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mark this private: _transform_url.

# Return a copy of url with 'username:password@' removed.
# username/pass params are passed to subversion through flags
# and are not recognized in the url.
return transform_url(url, lambda netloc: split_auth_from_netloc(netloc)[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would define the lambda on its own line:
transform_netloc = lambda netloc: split_auth_from_netloc(netloc)[0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter dislikes this, and I also try not to do this. I think it's fine as it is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the linter not like? I’m not a fan of cramming a lot into one line. You can also define the function using def syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambdas are supposed to be anonymous - if you bind them to a name it kind of defeats the point. The linter points this out and tells you to use def.

I'll do whatever you suggest (I don't care that much), but a lambda seems perfect here. A whole named function for this seems overkill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think the code will be easier to understand if you give the function a name (whether that be a lambda or def syntax). It will also have a more parallel / symmetric look with redact_password_from_url below.

('user:pass:word@example.com', 'user:****@example.com'),
])
def test_redact_netloc(netloc, expected):
result = redact_netloc(netloc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"result" -> "actual" to match the earlier test.

('https://example.com', 'https://example.com')
])
def test_redact_password_from_url(auth_url, expected_url):
result = redact_password_from_url(auth_url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"result" -> "actual" (or "url") to match previous tests.

@@ -681,3 +701,14 @@ def test_split_auth_from_netloc(netloc, expected):
def test_remove_auth_from_url(auth_url, expected_url):
url = remove_auth_from_url(auth_url)
assert url == expected_url


@pytest.mark.parametrize('auth_url,expected_url', [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space to match previous test: auth_url, expected_url

@cjerdonek
Copy link
Member

Closing / opening to trigger CI to re-run.

@cjerdonek cjerdonek closed this Oct 19, 2018
@cjerdonek cjerdonek reopened this Oct 19, 2018
@cjerdonek cjerdonek merged commit 78371cc into pypa:master Oct 19, 2018
@cjerdonek
Copy link
Member

Thanks a lot for all your work and follow-through on this, @orf.

@orf orf deleted the redact-auth branch October 19, 2018 10:11
@orf
Copy link
Contributor Author

orf commented Oct 19, 2018

No problem, thank you for the prompt and detailed reviews @cjerdonek 🎉

@haizaar
Copy link

haizaar commented Nov 5, 2018

Does anyone know when this fix gets released?

@pradyunsg
Copy link
Member

It'll be a part of 19.0, scheduled early next year.

pip has a predictable release cadence, which @benoit-pierre linked to.

@lock
Copy link

lock bot commented May 31, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 31, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip prints out username and password from URLs with them
8 participants