Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching conflicts when using extra dependencies #838

Closed
Ben-Epstein opened this issue Apr 2, 2024 · 7 comments
Closed

Caching conflicts when using extra dependencies #838

Ben-Epstein opened this issue Apr 2, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@Ben-Epstein
Copy link

Description:
This may relate to #626, and it may also conflict with your stated anti-goals, but i believe it's worth bringing to the surface as a potential bug you may want to investigate, as I can't find a direct issue around it, and it may be impacting many users who rely on this action.

If you build different extras with your python project, each containing their own independent dependencies, and you want to test to ensure that each extra has all of its necessary dependencies in a job, while also checking overall lint/type safety/testing, you may run into this issue as I have.

When you specify the cache cache: poetry or cache: pip etc, and point to your requirements.txt or more up to date pyproject.toml, the cache key doens't take into account what you are installing in that job.

So, if I have a pyproject like so

...
[tool.poetry.dependencies]
python = ">=3.10.9,<3.11"
numpy = "^1.22.3"
boto3 = "^1.24.59"
pydantic = {version = "<2.0", extras = ["dotenv"]}
jinja2 = "^3.1.2"

openai = {version = "0.28", optional = true}


[tool.poetry.extras]
openai = ["openai"]

And in my first job, i use setup-python and then run

poetry install --all-extras

but in another job, I run

poetry install

One may assume that openai will not be installed in the second job. But if i'm using caching, regardless of what I install with, everything from the first cache creation will be installed.

I would think that the install command itself would generate the hash, rather than the dependency file itself.

Based on your non-goals, I understand if this isn't something you want to pursue, but it might be worth documenting in a overly-clear way for users who may not understand this behavior upfront.

Thank you!

@Ben-Epstein Ben-Epstein added bug Something isn't working needs triage labels Apr 2, 2024
@gowridurgad gowridurgad self-assigned this Apr 18, 2024
@gowridurgad
Copy link
Contributor

Hello @Ben-Epstein , I have attempted to reproduce the issue on my end, but was unable to do so. In my test environment, the extras(openai) are not installed in the second job that use poetry install. Here's a screenshot for your reference. Could you assist by sharing a link to a simplified version that reproduces the problem? Thank you!
Screenshot 2024-04-18 at 5 36 01 PM
Screenshot 2024-04-18 at 5 35 14 PM

@gowridurgad
Copy link
Contributor

Hello @Ben-Epstein
Just a gentle reminder!

@gowridurgad
Copy link
Contributor

Hi @Ben-Epstein, Could you please assist by sharing a link to a simplified version that reproduces the problem?
Thank you!

@Ben-Epstein
Copy link
Author

Hi @gowridurgad sorry about that. I will take a look today to reproduce. Did you use poetry in that example? My project uses poetry so I'll try that.

@Ben-Epstein
Copy link
Author

Hi @gowridurgad I'm so sorry for the delay in the response.

I've reproduced the issue and shared it in this PR Ben-Epstein/poetry-setup-python-bug#2

Here are the critical steps to reproduce:

  1. Kick off a job that installs all dependencies through poetry (ie poetry install --all-extras)
  2. After the cache from that job is created, then change the install to be poetry install without the extras, but you'll see that there is a cache hit and packages you do not expect to be installed are in fact installed.

You can see steps 1 and 2 in the following commits:

  1. this commit creates the cache, and in the repo there is only 1 cache.
  2. this commit then allows the second job to run, and in the corresponding action you can see that it picked up the cache created from (1). You can see in the second step poetry run pip list that there are all of the extra dependencies that were installed from the commit in (1) when running poetry install --all-extras that shouldn't be there, since we are running poetry install. IE, there should have been a cache miss.

@gowridurgad
Copy link
Contributor

Hi @Ben-Epstein , The reason for this behavior is that the cache key didn't change between the two jobs and the caching mechanism is designed to reuse the cache if it finds one with the same key. To avoid this situation, you might consider using different cache keys for jobs with different requirements using actions/cache. Here is the screenshot for your reference. we will update the document accordingly.
Screenshot 2024-05-21 at 5 21 53 PM
Screenshot 2024-05-21 at 5 22 19 PM

@gowridurgad
Copy link
Contributor

Hello @Ben-Epstein , The PR has been merged and the Anti-Goals for caching poetry dependencies are updated in the document . For reference, you may visit the https://github.com/actions/setup-python/blob/main/docs/adrs/0000-caching-dependencies.md.
Thank You !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants