Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

autover (or package using autover) reports incorrect version when installed inside a git repository #54

Closed
ceball opened this issue Jun 28, 2018 · 14 comments · Fixed by #65
Labels

Comments

@ceball
Copy link
Contributor

ceball commented Jun 28, 2018

I have a problem where asking an installed param package for its version is returning version info from an unrelated git repository instead of from the .version file in the installed param package.

I'll try to reproduce it (the problem is happening while pip is building a package that has param as a build time dependency), but my guess is it's something along the lines of...

If autover (or param) is installed as a package inside a git repository (i.e. installed at something like git_repo/some/path/to/lib/python/autover/), autover (or param) reports its own version as that of git_repo, rather than the version in git_repo/some/path/to/lib/python/autover/.version.

I.e. something related to https://github.com/pyviz/autover/blob/master/autover/version.py#L259 maybe? It looks suspicious to me (won't it match any repository?), although I don't fully understand what's happening/supposed to be happening there (e.g. not sure why the .version file isn't being used), so I might be totally wrong.

@jlstevens
Copy link
Contributor

You are right that https://github.com/pyviz/autover/blob/master/autover/version.py#L259 is supposed to prevent this but I have no idea what is happening when param is a build time dependency.

@ceball
Copy link
Contributor Author

ceball commented Jun 28, 2018

I also have no idea what pip is doing (will debug it), but it seems wrong that an installed param would ever report a git repository's info (let alone one that's not param's) instead of its .version info.

@jlstevens
Copy link
Contributor

That is by design: version control information is generally less stale than what you can find from a static file.

@ceball
Copy link
Contributor Author

ceball commented Jun 28, 2018

If autover (or param) is installed as a package inside a git repository (i.e. installed at something like git_repo/some/path/to/lib/python/autover/), autover (or param) reports its own version as that of git_repo, rather than the version in git_repo/some/path/to/lib/python/autover/.version.

I think that's what it is...

$ mkdir t123
$ cd t123
$ git init
$ echo test > test.txt
$ git add test.txt 
$ git commit -m "init"
$ git tag -a v0.0.1 -m "test"
$ conda create -n xyz python=3.6
$ conda activate xyz
$ pip install param -t .
Collecting param
  Cache entry deserialization failed, entry ignored
  Using cached https://files.pythonhosted.org/packages/6d/f1/a9feccdbc333813f02a54c808b330a75210b98699ef8199e0250763b8e14/param-1.7.0-py2.py3-none-any.whl
Installing collected packages: param
Successfully installed param-1.7.0
$ python
>>> import param
>>> param.__version__
'0.0.1'
$ rm -rf .git
$ python
>>> import param
>>> param.__version__
'1.7.0'

I used the path . above, but it could be any path inside the git repo:

$ rm -rf param* numbergen
$ python
>>> import param
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'param'
$ pip install param -t ./some/other/path
Collecting param
  Using cached https://files.pythonhosted.org/packages/6d/f1/a9feccdbc333813f02a54c808b330a75210b98699ef8199e0250763b8e14/param-1.7.0-py2.py3-none-any.whl
Installing collected packages: param
Successfully installed param-1.7.0
$ PYTHONPATH=./some/other/path python
>>> import param
>>> param.__version__
'0.0.1'
>>> param.__file__
'/tmp/t123/some/other/path/param/__init__.py'
$ cat some/other/path/param/.version 
{"git_describe": "v1.7.0-0-gb75e757", "version_string": "1.7.0"}

(And presumably doesn't apply only to param or autover.)

@ceball ceball changed the title autover (and hence param) reporting incorrect version of itself when installed inside another git repository? autover (or package using autover) reports incorrect version when installed inside a git repository Jun 29, 2018
@ceball
Copy link
Contributor Author

ceball commented Jun 29, 2018

Just to confirm, this applies to anything using autover, and isn't special to pip (i.e. it's only about being inside a git repository):

$ mkdir /tmp/t123 && pushd /tmp/t123
$ git init
$ echo test > test.txt
$ git add test.txt 
$ git commit -m "init"
$ git tag -a v0.0.1 -m "test"
$ conda create -p ./456
$ conda activate /tmp/t123/456
$ python -c "import holoviews; print(holoviews.__version__,holoviews.__file__)"
0.0.1 /tmp/t123/456/lib/python3.6/site-packages/holoviews/__init__.py
$ cat /tmp/t123/456/lib/python3.6/site-packages/holoviews/.version
{"git_describe": "v1.10.5-0-g98b4ea7d", "version_string": "1.10.5"}
$ conda list holoviews
# packages in environment at /tmp/t123/456:
#
# Name                    Version                   Build  Channel
holoviews                 1.10.5                   py36_0

Isn't the '' in repo_matches going to match any repository?

                output = run_cmd([cmd, 'remote', '-v'],
                                 cwd=os.path.dirname(self.fpath))
                repo_matches = ['', # No remote set
                                '/' + self.reponame + '.git' ,
                                # A remote 'server:reponame.git' can also be referred
                                # to (i.e. cloned) as `server:reponame`.
                                '/' + self.reponame + ' ']
                if not any(m in output for m in repo_matches):
                    return self

(I don't quite follow the whole chain to determine the version, though, as I mentioned above...)

@jlstevens
Copy link
Contributor

The line '/' + self.reponame + '.git' is supposed to check for the .git directory e.g param.git. This is to make sure it doesn't pick up on any other repository.

@ceball
Copy link
Contributor Author

ceball commented Jun 29, 2018

The line '/' + self.reponame + '.git' is supposed to check for the .git directory e.g param.git. This is to make sure it doesn't pick up on any other repository.

But it doesn't matter, because '' used as above is going to match anything - right?

The intention of '' seems to be to match "no remote". I.e. the whole logic is: "repo matches if there's no remote or if any remote matches the repo name". If that sounds right to you, I could change the above code so that's what happens.

Separately, I'd like to propose that the .version file should always be used if present (instead of looking to git first). This wouldn't work at present, because .version is always generated. I guess I'd like to see it generated and put into packages only. However, this is presumably a larger/riskier change instead of just being a bug fix (which is what I think the suggestion above would be), so I'll open an issue and we could consider it in the future.

@jbednar
Copy link
Contributor

jbednar commented Jun 29, 2018

I think we should consider having .version override info from git, because it provides a way to manually ensure the version is correct regardless of context. But there are many competing pressures there, and I don't know what the best answer is. E.g. we could also consider having a .version file committed to the repo after a major release but used only as as a fallback. The version in that file would have to be something valid but not actually currently used, indicating that it's at least version X but is not precisely X. E.g. something like 1.7.1post0dirty (not sure if that's legal, but you get the idea), such that even if someone gets a git archive, they at least know it's newer than 1.7.1, and that it's not precisely 1.7.1. I think that would take care of nearly every version issue we've been having.

@jlstevens
Copy link
Contributor

I.e. the whole logic is: "repo matches if there's no remote or if any remote matches the repo name". If that sounds right to you, I could change the above code so that's what happens.

From what I remember, that was the intention, yes. If you could make it behave that way and verify it addresses this issue, that would be really helpful.

I think we should consider having .version override info from git, because it provides a way to manually ensure the version is correct regardless of context.

I might be happy to have a switch to change the priority but I don't want to change the default behavior right now: as you say there are too many competing pressures and I can't reason about all the cases that changes to the default behavior might affect. However, offering the option to change the priority is ok as long as the user accepts potentially unforeseen consequences!

@ceball
Copy link
Contributor Author

ceball commented Jun 29, 2018

The intention of '' seems to be to match "no remote". I.e. the whole logic is: "repo matches if there's no remote or if any remote matches the repo name". If that sounds right to you, I could change the above code so that's what happens.

I've realized that I'd prefer not to mess with that code, because although it seems wrong to me for what it's trying to do, what it's trying to do also seems wrong...but at the same time, I know I don't fully get all the use cases. So I'd prefer to stay away from it! :)

Instead, I've decided to upgrade my 'packages should use the .version file' proposal into a plea :) I strongly think packages (wheel, sdist, conda package, etc) should be versioned in an uncomplicated way. Having a package run git to report its version information just seems completely wrong to me. I think this means that if there's a .version file, it should always be used - and so the .version file should only be put into packages (or other places where the version is supposed to be fixed).

I think we should consider having .version override info from git, because it provides a way to manually ensure the version is correct regardless of context. But there are many competing pressures there

I'm not sure what the competing pressures are for a released package. It's a mystery to me why a released package should ever have anything to do with git. Similarly, I don't see why a git repository would ever need to have a .version file in it (not talking about your suggestion of a committed file containing a version fallback for people who get an untagged github archive - that's a separate issue).

Anyway, the reason this all came up is from checking the version of build dependencies. In the setup.py of a project like parambokeh, I have something like "if param<1.6 raise Error('to build a parambokeh package, install pip>=10 and try again, or pip install param>=1.6 yourself first then try again')". param>=1.6 is listed as a build dependency in pyproject.toml, so the check is actually redundant for pip>=10...but the check completely fails when people do have pip>=10, because the param installed by pip as a build dependency reports its version as parambokeh's git repository version, rather than 1.6 as in the .version file.

Such setup.py version checks are only supposed to be temporary until pip 10 is common. The checks are only for people building parambokeh via setup.py who have pip<10. Those people are not regular users (who'll use released wheels/conda packages) - they are:

  • packagers who have out of date pip
  • users who want to use an unreleased version of parambokeh and have out of date pip
  • packagers/users who want to build parambokeh via setup.py, but who are not using pip (or other pyproject.toml-supporting tool) e.g. maybe they are doing python setup.py build or similar.

So, unless anyone strongly objects, I'll just drop those version checks, because agreeing what we should do for .version vs git is probably going to be difficult, and even if we agree, testing any such changes will take some time (probably with various problems most likely only coming to light after a period of real world usage).

@jbednar
Copy link
Contributor

jbednar commented Jun 30, 2018

I've realized that I'd prefer not to mess with that code, because although it seems wrong to me for what it's trying to do, what it's trying to do also seems wrong...but at the same time, I know I don't fully get all the use cases. So I'd prefer to stay away from it! :)

I don't think that's defensible; we can't just leave broken code in place, when we know it's broken!

Instead, I've decided to upgrade my 'packages should use the .version file' proposal into a plea :) (a) I strongly think packages (wheel, sdist, conda package, etc) should be versioned in an uncomplicated way. Having a package run git to report its version information just seems completely wrong to me. (b) I think this means that if there's a .version file, it should always be used - and so the .version file should only be put into packages (or other places where the version is supposed to be fixed).

I agree with (a), but (b) doesn't follow directly from it; seems like it's many steps removed from it. Are these the missing steps?

  1. Versioning in released packages should be straightforward; that's the easy (and most important) case [agreed],
  2. We don't want to modify code in the package itself to store the version number [probably agreed]
  3. We're ok with including an extra static file with the version info [agreed]
  4. We need the logic for processing that file not to depend on any external steps or tools that might fail [agreed]

If so I think I agree with all the steps, but I'm not certain there's no other reasonable option for every link in the chain, so it's worth making the argument explicitly.

I'm not sure what the competing pressures are for a released package. It's a mystery to me why a released package should ever have anything to do with git.

The pressures are all the various contexts in which versions are needed. I agree that a released package should never be contacting git.

Similarly, I don't see why a git repository would ever need to have a .version file in it (not talking about your suggestion of a committed file containing a version fallback for people who get an untagged github archive - that's a separate issue).

I don't see how that's a separate issue. There could be a separate solution, e.g. two such files, one .version for released packages (overriding anything if present) and a .version_ref or some such that provides a baseline version for github archive (not overriding anything). But unless we explicitly and actively separate the two issues in that way, I don't see how they are somehow fundamentally separate.

So, unless anyone strongly objects, I'll just drop those version checks, because agreeing what we should do for .version vs git is probably going to be difficult, and even if we agree, testing any such changes will take some time (probably with various problems most likely only coming to light after a period of real world usage).

I don't get it. What does dropping the version checks buy us, compared to simply fixing the immediate problem of an incorrect check for an empty repo name? I'd strongly prefer simply to decide about .version; it's pointless to delay things any further!

@ceball
Copy link
Contributor Author

ceball commented Jun 30, 2018

I fixed the check for no remote. However, it still doesn't report the version from the .version file - it now reports None. As far as I can tell, it only uses the .version file for the version if there's an exception while trying to run git.

ceball added a commit that referenced this issue Jun 30, 2018
@ceball
Copy link
Contributor Author

ceball commented Jun 30, 2018

I've made a PR that includes the above mentioned remote name check, and which also tries to address the problem after that (it checks the .version file for info if git hasn't returned a version).

@ceball
Copy link
Contributor Author

ceball commented Jun 30, 2018

I don't see how that's a separate issue

Ok, I said the .version file should only exist in packages, but yes, I just wanted a way that the full, static version info would take precedence if present. I was suggesting .version file present or absent as the mechanism for that, but I'd be fine with e.g. having a partial version fallback field in .version, and having the .version file always present - as long as the full static version info is used first if present in the .version file. The partial version fallback field would be used last (after checking the full version field, and after attempting to get info from git). So the partial version fallback would be used only for untagged github zip files and other weird situations like a git clone with the .git deleted.

I do think that adding such fallback info should be a new issue, but I agree it means my request to have the .version file only in packages is possibly shortsighted, and maybe there's a better mechanism for ensuring the static version info takes precedence.

I guess the competitor to having the fallback info is my suggestion of raising an exception in setup.py if the version number isn't present and can't be determined (#42 (comment)). I would be ok with either. The fallback suggestion is friendlier but more effort. I believe we previously decided not to do it because of the effort required to keep such info up to date, but it could now be done by pyctdev (I guess you'd have it open a PR after a release).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants