Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only normalize package names when added to Pipfile #1826

Merged
merged 10 commits into from
Apr 11, 2018

Conversation

frostming
Copy link
Contributor

@frostming frostming commented Mar 22, 2018

Remove the dict() so that the mutation will affect pfile, parsed_pipfile will keep unchanged until the file is overwritten.

This fixes some issues when packages names don't comply with PEP423, see the tests.

And also a issue of contoml found when debugging, which can be triggered when trying to print parsed_pipfile: items returns a dictitems object and hence can't be connected with a list. The issue about items doesn't relate to the normalization, I put this in its own commit.

@techalchemy
Copy link
Member

Er you’re gonna have to describe the issue with contoml specifically

@techalchemy
Copy link
Member

Also note that we mock all of pypi so if you’re installing a package we don’t have mocked you need to add it everywhere it needs to be mocked. So you should use a package we have mocked

@frostming
Copy link
Contributor Author

frostming commented Mar 22, 2018

@techalchemy I don't quite understand

  1. items returns a dictitems object and hence can't be connected with a list. The issue about items doesn't relate to the normalization, I put this in its own commit.
  2. add pop so that it won't break https://github.com/pypa/pipenv/pull/1826/files#diff-7c0c15ee4eba8c237e3318a395816a13R360
  3. What packages are mocked? I picked one from PYPI_VENDOR_DIR, am I getting wrong?

@techalchemy
Copy link
Member

items returns a dictitems object and hence can't be connected with a list. The issue about items doesn't relate to the normalization, I put this in its own commit.

I asked you to describe what the bug is with contoml. This doesn't sound like a bug in contoml, it sounds like a problem with the approach you are taking to normalization.

@frostming
Copy link
Contributor Author

It is triggered when trying to print the contoml parsed object, which will call object.items() inside. The bug(illegal concatenating) lies there.

Please take a look at the diff

Copy link
Member

@techalchemy techalchemy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the diff and I read and re-read your answers and I grabbed your tests to see how they fail, and I think I see what you are talking about -- the test you wrote does fail, but it's not clear to me that that's a bug with contoml (I am not an expert). It seems more like a problem with our implementation of project.Project._pipfile.

If you think this needs to be fixed with a patch to contoml you should push a patch upstream. We have a patched version of the library but our intention isn't to maintain the library here, so unless the patch gets accepted upstream I can't see us including it.

I included a suggestion if you want to get these tests passing, although I think this is a behavioral change that might break some workflows?

@@ -353,7 +353,7 @@ def _pipfile(self):
# mutation time!
self.clear_pipfile_cache()
for section in ('packages', 'dev-packages'):
p_section = dict(pfile.get(section, {}))
p_section = pfile.get(section, {})
for key in list(p_section.keys()):
# Normalize key name to PEP 423.
norm_key = pep423_name(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of patching the entire contoml library, why don't we just change the way we handle this code here in our own library:

    p_section = pfile.get(section, {})
    for key in list(p_section.keys()):
        norm_key = pep423_name(key)
        if key != norm_key:
            p_section[norm_key] = p_section[key]
            del p_section[key]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I looked into the contoml/file.py, I find that using pop here won't break anything. Thanks anyway. New modification is committed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh @frostming sorry it took me so long to see this here -- with this change, does pipenv properly normalize the pipfile again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my search result of the code base, normalization will happen when Pipfile is changed: add package/remove package/add index

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might cause all comments in the file to be wiped out and re-normalize all names in the Pipfile (but hopefully we can add a unit test to check that doesn't happen)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtratner @frostming this is not actually desired behavior -- we don't want to rewrite the case of the entire pipfile every time someone installs a single package, we only want to normalize the one they are installing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, so would you mind I change the behavior here, together with recasing issue #1855 ?

@frostming
Copy link
Contributor Author

@techalchemy I thought the patched directory was to host vendor libraries with some customizations, otherwise they will go to vendor dir. So the bug fixing about the library isn't included?

If that is true, then I agree with your workaround. The fix of items won't affect tests but only debugging so just leave it.

@techalchemy
Copy link
Member

We patch stuff only when strictly necessary to get pipenv to work. Since it’s not necessary in this case and we are responsible open source citizens, we strongly encourage bugs to be fixed upstream. It also makes it easier to re-sync with the main library.

As for the fix itself here, brief summary: pipenv install some_package currently writes some_package to the Pipfile but installs some-package. This patch doesn’t change that but does call our pep423 formatter before handing the name back in project.Project.parsed_pipfile. There has been a bunch of discussion around this behavior so I just need a chance to look through issues and make sure this isn’t going to cause problems

@frostming
Copy link
Contributor Author

The issue can be reproduced in this way:

  1. Write some_package into Pipfile and do pipenv install
  2. Do pipenv install some-package. pipenv thinks it is not included in Pipfile and add a duplicate line some-package into it.

But also it was an obvious coding fault: mutate a dict without ref to the original object. I don't see why this should be rejected.

@techalchemy
Copy link
Member

@frostming please have patience. I was not writing a summary for your benefit but for the benefit of others when I point them to review. I already agreed with you that this is a bug.

But also it was an obvious coding fault: mutate a dict without ref to the original object. I don't see why this should be rejected.

I don't have any plans to reject this -- I only mean that we have to consider how fixing this bug might impact people who are using pipenv in production environments. This is a large project in use by tens- to hundreds- of thousands of systems. This change alters the representation of package names in existing pipfiles, even if it alters it to be correct. That might have unforeseen consequences, which would impact whether this gets merged as part of a major release or not.

@techalchemy techalchemy self-assigned this Mar 23, 2018
@@ -288,7 +288,6 @@ def ensure_pipfile(validate=True, skip_requirements=False):
if validate and project.virtualenv_exists and not PIPENV_SKIP_VALIDATION:
# Ensure that Pipfile is using proper casing.
p = project.parsed_pipfile
p.clear_pipfile_cache()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was buggy but was never hit because PIPENV_SKIP_VALIDATION is always true

pipenv/core.py Outdated
@@ -290,6 +289,7 @@ def ensure_pipfile(validate=True, skip_requirements=False):
err=True,
)
project.write_toml(p)
project.clear_pipfile_cache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use spaces!!!

@techalchemy
Copy link
Member

I had some time to look into this. Broadly, this behavior is the goal:

  • Normalize package names with pep423 etc when we write them to the pipfile (e.g. someone calls pipenv install from the command line)
  • Normalize all package names written to the lockfile
  • Return the normalized name for comparison against other packages in the index
  • Don't normalize anything we didn't write -- if someone manually writes a lockfile, leave the names alone including capitalization etc

This isn't a criticism of your PR or your code, I'm not sure it necessarily applies. I am laying out the needs of this part of the code so that we can work with some clarity -- thanks for your patience and continued effort around this, I know this section of our code is not that pretty and we really do appreciate the help

with open(p.pipfile_path, 'w') as f:
contents = """
[packages]
python_dateutil = "*"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test case that has an unnormalized name and confirm that it stays unnormalized and comments are preserved when you install a new package that doesn't overlap?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can easily check this by adding to a couple more lines to the pipfile

# Pre-comment - I should totally be preserved
requests = "*"  # Inline comment - should definitely be preserved too!

@frostming
Copy link
Contributor Author

@techalchemy Sorry for the messing with TABs, fixed
With the fix of this PR, package names will be normalized when pipfile is written(install, uninstall). Moreover, all names are PEP423ed and recased when imported from a requirements.txt

@jtratner This case doesn't apply to this PR. But your proposal is perhaps better and there is another discussion for it #1855 (comment)

Thanks all

@jtratner
Copy link
Collaborator

@frostming - I'm just concerned that popping out all of the keys will remove comments, so would be great if you could update your test case to confirm that comments are not lost.

@frostming
Copy link
Contributor Author

@jtratner Please review

@jtratner
Copy link
Collaborator

nice! This actually makes me realize another case (which wouldn't have to preserve the comments) - and I'm not sure whether it's covered in test cases or what current behavior is.

python_dateutil = "==1.5"

then you say pipenv install python_dateutil

should this change to?

python_dateutil = "*"

or stay the same?

Similarly, if you specify a version, should the pipfile be updated?

(@techalchemy - thoughts?)

@frostming
Copy link
Contributor Author

IMO "*" shouldn't overwrite. But another version specifier ">=", "<", "==" should overwrite that in Pipfile.

@techalchemy
Copy link
Member

The easy solution is to just rewrite the line without the comment, imo, but I dont know how hard it is to preserve the comment and just change the version

@frostming
Copy link
Contributor Author

frostming commented Mar 28, 2018

In contoml, you can change the value of a key while keeping the inline comment. Comments and key-values are different tokens in contoml parsed AST.

And if you delete the key, the inline comment will remain at the same line to become a line comment.

@frostming
Copy link
Contributor Author

@techalchemy @jtratner I refactored the recasing/normalization part a bit so that package names will be normalized only when user does pipenv install

Also fixes #1855 and covers it in test suites

@frostming frostming changed the title Normalize the package names in _pipfile Only normalize package names when added to Pipfile Mar 28, 2018
@frostming
Copy link
Contributor Author

Oops, multiprocess is broken, needs more coding

@techalchemy
Copy link
Member

I don’t think the multiprocessing issue is your fault — our tests are behaving strangely right now

@@ -231,7 +231,7 @@ def has_anonymous_entry():
if has_anonymous_entry():
return items
else:
return list(items) + [('', self[''])]
return items + [('', self[''])]
Copy link
Member

@uranusjr uranusjr Apr 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? I read the original issue description and it seems the old was correct; this would make the concat fail again because items is a dict_items, not list. Possibly a mistake during rebasing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a test is needed for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a mistake when rebasing, revert it now. Thanks

@techalchemy
Copy link
Member

At a glance this looks okay to me, thanks a bunch for taking care of this! I would like another set of eyes just to verify this too before I merge so @uranusjr @ncoghlan if one of you guys wouldn't mind

@uranusjr
Copy link
Member

The normalisation part looks good to me. I don’t quite understand the caching part and won’t comment.

This and my previous script entry change makes me think we’d probably remove direct access to parsed_pipfile and wrap operations with functions instead. But that can be covered in a later refactor. This PR is good as-is.

Copy link
Member

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I have a couple of minor comments inline, but also one major comment which echoes @jtratner's question above: I don't think the current implementation will do the right thing when a version constraint is already present in Pipfile, and no new version constraint is given on the command line.

key = 'dev-packages' if dev else 'packages'
section = self.parsed_pipfile.get(key, {})
for name in section.keys():
if pep423_name(name) == pep423_name(package_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to reconvert package_name on every iteration - it can be converted once before the loop.

Alternatively, we can require that package_name be prenormalised - that would fit with the check further down in add_package_to_pipfile, where normalisation is skipped for file URLs and version control references.

name = self.get_package_name_in_pipfile(package_name, dev)
if name and name != package_name:
# Replace the packge name
del p[key][name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this part of the change correctly, I don't think it does what we want in the case that @jtratner mentions in #1826 (comment), where Pipfile contains a stricter version specifier than project_name="*" and the install command is just pipenv install project_name.

The simplest fix I can see would be to just leave the denormalised name alone in the case where it's already present - the change to normalise keys on lookup in get_package_name_in_pipfile will handle such cases.

assert 'python_DateUtil' not in p.pipfile['packages']
contents = open(p.pipfile_path).read()
assert '# Pre comment' in contents
assert '# Inline comment' in contents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another couple of test cases are needed to check that:

  1. the version constraint is preserved for the python_DateUtil = "==1.5" case if no version constraint is given on the command line

  2. the version constraint is replaced if an explicit version constraint is given on the command line (including resetting it to a wildcard dependency with pipenv install 'python_DateUtil=*')

Copy link
Member

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version looks good to me, thanks! (back to @techalchemy to see if his earlier concerns have also been addressed)

@frostming
Copy link
Contributor Author

@ncoghlan Thanks for your inputs of great help!

No need to reconvert package_name on every iteration - it can be converted once before the loop.

Fixed

The simplest fix I can see would be to just leave the denormalised name alone in the case where it's already present - the change to normalise keys on lookup in get_package_name_in_pipfile will handle such cases.

Agree, see my latest commit.

Another couple of test cases are needed to check that:

  1. the version constraint is preserved for the python_DateUtil = "==1.5" case if no version constraint is given on the command line

Changed the tests a bit

  1. the version constraint is replaced if an explicit version constraint is given on the command line (including resetting it to a wildcard dependency with pipenv install 'python_DateUtil=*')

Tests added. install package=* is neither supported by pip nor pipenv, we can improve this later. I don't do it in this PR.

@techalchemy
Copy link
Member

@frostming I think as long as using an actual pin (e.g. package==x.y) replaces the correct entry it is probably good. I know this wound up being a complicated pr so thank you for your patience here, this actually has the potential to break a lot of stuff, but some of that is currently broken, so disentangling what behaviors we actually want was difficult. I’ll give it another look when I’m off mobile but otherwise I think we might be good!


c = p.pipenv('install requests')
assert c.return_code == 0
assert 'requests' not in p.pipfile['packages']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to say I am really happy with this, you did an amazing job organizing this PR and keeping the logic straight with all of the different requests and rewrites. The fact that this assert statement works kind of blows my mind.

@techalchemy
Copy link
Member

Final update to make sure we aren't colliding with other changes and then merging, this looks good.

@jtratner
Copy link
Collaborator

jtratner commented Apr 11, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug 🐛 This issue is a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants