Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update requirements.txt with some version specs #34

Closed
russellb opened this issue Jun 17, 2024 · 9 comments
Closed

Update requirements.txt with some version specs #34

russellb opened this issue Jun 17, 2024 · 9 comments

Comments

@russellb
Copy link
Member

To assist in keeping dependencies more explicit and consistent across repositories, we should have version numbers specified for each dependency in requirements.txt.

Please see this one as an example: https://github.com/instructlab/instructlab/blob/main/requirements.txt

@RobotSail
Copy link
Member

We'd like this library to follow the best practices which have been laid out by the HuggingFace transformers library.

Looking at their setup.py, it looks like they only pin the versions on an as-needed basis.

My opinion is that if HuggingFace doesn't do this, then it's probably not something we need to adopt for our much-smaller library.

If there's a problem that's being caused by the lack of version pinning on our end though, I'm happy to revisit this.

@fabianlim
Copy link
Contributor

I had unpinned some versions in #25 . Probably good to unpin more.

@dhellmann
Copy link

Best practices for python dependencies call for using ranges in your package requirements and pinning versions only in CI jobs.

Using pinned versions in, for example, requirements.txt or constraints.txt, allows you to know and advertise exactly what versions have been tested in CI. That information is useful for users and repackagers to understand which versions of dependencies are compatible with more specificity than the ranges provide. Tools like dependabot will submit PRs to automatically update those pins to help you keep up with new releases of all of your dependencies.

Pinning to specific versions in the package dependencies in pyproject.toml or requirements.in is not a good practice, though. It makes it very easy for sets of packages that need to be installed together to have incompatible dependencies, which in turn makes it impossible to actually install them (I think that's what led to this ticket being filed). Pinned dependencies also make it difficult to deal with CVEs or other critical bugs in those dependencies, which makes delivering products from this project more challenging. Please do not pin to specific versions of libraries.

Instead of pinning, use version ranges. This ensures that repackagers and installers have some flexibility in case a dependency of your package has a critical CVE and needs to be updated. Those ranges should include a minimum version, and in some cases a maximum version (a "cap").

Specifying the minimum value for the range (foo>=x.y) allows you to declare that you need features that only show up in or after that version of the dependency, which means you won't get bugs from users trying to use instructlab with an old dependency that has an incompatible API or is completely lacking a feature you need.

Specifying a maximum value for the range (foo>=x.y,<x+1.0) allows you to prevent a library that is known to introduce breaking changes in new releases from being brought into your environments without being tested. That maximum version setting should be used with care, though, and only when definitely needed, because it can prevent good new versions from being used and can cause incompatibility issues similar to pinning versions.

The most effective way I have seen for setting maximum versions is to wait for something to break, then set a cap to exclude the version that causes the breakage until the dependency is fixed or your code is updated to work with the new version of the dependency. Bad releases of dependencies can be excluded using the requirements syntax !=a.b. If the new dependency requires incompatible changes in your code, then it becomes the new minimum version for your requirements range.

@RobotSail
Copy link
Member

Thank you very much for that in-depth explanation of version pinning best practices @dhellmann !

Then it sounds like what we need to do is:

  1. Add more minimum versions to dependencies
  2. Remove specific version pins (currently we pin transformers)
  3. Add maximum versions reactively as packages release new versions that break our existing implementation

@dhellmann
Copy link

Thank you very much for that in-depth explanation of version pinning best practices @dhellmann !

Then it sounds like what we need to do is:

  1. Add more minimum versions to dependencies
  2. Remove specific version pins (currently we pin transformers)
  3. Add maximum versions reactively as packages release new versions that break our existing implementation

I would add to that list a step separating the package dependencies and the CI dependencies.

To do that, you can use pip's --constraints argument to specify a file that includes pins in CI jobs and then keep that file separate from what's in pyproject.toml. If you use a tool like pip-compile you can get a complete list of all of the transitive dependencies you have and pin all of them in CI, and keep the list in pyproject.toml limited to only things you directly import (to keep the list manageable).

@ktam3
Copy link

ktam3 commented Jul 16, 2024

@RobotSail - is this done?

@RobotSail
Copy link
Member

@ktam3 I think we ended up not going in this direction due to @dhellmann's advice. But @russellb can specify otherwise here.

@ktam3
Copy link

ktam3 commented Jul 17, 2024

kk, let's close then if that's the case. i'll wait for final confirm though

@RobotSail
Copy link
Member

@russellb Confirmed that we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants