Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python buildpack fails to deploy without first purging cache #1520

Closed
arel opened this issue Dec 21, 2023 · 3 comments · Fixed by #1526
Closed

Python buildpack fails to deploy without first purging cache #1520

arel opened this issue Dec 21, 2023 · 3 comments · Fixed by #1526
Assignees
Labels

Comments

@arel
Copy link

arel commented Dec 21, 2023

The logic in the bin/compile script that renames paths from the $BUILD_DIR to /app is brittle and fails when files are cached with build directory names from prior builds.

As a consequence, when I deploy my project, the first time it succeeds. But, the second time and on, the project says it builds successfully but the website crashes because dependencies are not found.

Further, it seems that something changed between November 29 and December 5 on Heroku's end that made this issue appear for me. I am not sure what. Maybe the build directory naming changed.

Issue reproduction

Here is a minimal project that reproduces the build issue. It is definitely an issue on Heroku's end.

https://github.com/arel/debug-heroku-pipenv

Temporary workaround

For anyone else struggling with this, one workaround is to add your local packages to your PYTHONPATH. For example, I have a local version of botocore in ./vendor/botocore. So, by setting my PYTHONPATH (on the Heroku settings dashboard) to /app:/app/vendor/botocore, then it finds the local package. This is not a great solution, but it may help in a pinch.

@edmorley
Copy link
Member

edmorley commented Dec 21, 2023

@arel Hi! Thank you for filing an issue.

On 2023-11-30 the version of pipenv was updated from 2023.7.23 to 2023.11.15 (plus setuptools was upgraded):
https://github.com/heroku/heroku-buildpack-python/blob/main/CHANGELOG.md#v240---2023-11-30

If pipenv related behaviour has changed recently, then it's likely the new version of pipenv is the cause.

I will take a look at the repro you have provided soon (a minimal repro like that is very helpful, thank you!) - however, we are in the middle of a production change freeze until January due to the holidays, so I won't be able to make any changes to the buildpack until then (and I'm shortly going to be away myself).

To switch back to the old pipenv version in the meantime, you can use a buildpack URL of:
https://github.com/heroku/heroku-buildpack-python.git#v239

See:
https://devcenter.heroku.com/articles/buildpacks#buildpack-references
https://devcenter.heroku.com/articles/heroku-cli-commands#heroku-buildpacks-set-buildpack

@arel
Copy link
Author

arel commented Dec 21, 2023

Hi, @edmorley! I appreciate the fast response! That seems like a likely culprit.

I think the issue would affect any python package installed in site-packages (as egg-link, .pth, or *_finder.py) that references the build directory, since that only gets rewritten at runtime and I presume the build directory is ephemeral.

# At runtime, rewrite paths in editable package .egg-link, .pth and finder files from the build time paths
# (such as `/tmp/build_<hash>`) back to `/app`. This is not done during the build itself, since later
# buildpacks still need the build time paths.
if [[ "${BUILD_DIR}" != "/app" ]]; then
cat <<EOT >> "$PROFILE_PATH"
find .heroku/python/lib/python*/site-packages/ -type f -and \( -name '*.egg-link' -or -name '*.pth' -or -name '__editable___*_finder.py' \) -exec sed -i -e 's#${BUILD_DIR}#/app#' {} \+
EOT
fi

One potential solution would be to broaden the replace-pattern to match any Heroku build directory. Or, better, you could cache a list of all prior build directories and change the line above to replace any of them that are found.

Have a nice vacation, and happy holidays!

@edmorley
Copy link
Member

edmorley commented Jan 5, 2024

I've managed to track this down - it's actually a combination of a few separate issues.

First, the reason this started affecting your builds only recently, is that between Pipenv v2023.7.23 and v2023.8.19 an upstream regression was introduced, which changed the installation mode for local file = dependencies from being a standard install, to being an editable install.

That is, a dependency specifier like so:

[packages]
mypackage = {file = "packages/mypackage"}

...would previously have been installed as non-editable, whereas now it's installed as though editable = true had been specified.

Worse, it appears that even if one includes an explicit editable = false (note: false) to try and disable editable installation mode, it doesn't do anything.

I've filed this regression as:
pypa/pipenv#6054

Whilst this was still a regression, the only reason it caused issues here is that there was a pre-existing bug in the buildpack around local file = dependencies when installed in editable mode (as you noted).

Specifically, the current path rewriting handling relies on the fact that that we expect the installer to always be re-run to fix up any stale paths from the previous build.

(There's some backstory on the path rewriting in #1006 and #1252. The fact that paths change between build-time and run-time is a massive pain and thankfully going away with the next generation Cloud Native Buildpacks aka CNBs, xref CNB spec and the WIP Python CNB)

This re-running of the installer always occurs for standard Pip builds (since requirements files are non-deterministic given e.g. transitive deps, includes etc), and already occurred for Git VCS Pipenv builds (via this fragile check), however, there was no check for local path = file dependency builds.

I also found another bug unrelated to path rewriting (#1525), which makes me think that we should just never skip pipenv install as the lockfile is still not always deterministic, and instead defer to Pipenv to decide whether an environment needs updating.

One potential solution would be to broaden the replace-pattern to match any Heroku build directory.

So the problem with trying to match any build directory is that we would have to hardcode the expected build path style in the buildpack (eg via a hardcoded /tmp/build_* glob), and that path (a) is not guaranteed to stay the same over time on Heroku (in fact it's already changed once in the last couple of years), (b) could be a completely different path on non-Heroku platforms (this buildpack is used by eg Dokku and others).

Or, better, you could cache a list of all prior build directories and change the line above to replace any of them that are found.

Yeah one solution would be to:

  1. During each build, store the current build directory path in the build cache at a known location
  2. At the start of each cached build, rewrite paths in the restored-from-cache site-packages from OLD_BUILD_DIR to NEW_BUILD_DIR

However:

  • this still wouldn't help the first build of apps that occurs after the fix lands, since they won't have a cached previous build directory - so we'd either need another fix as well, or to force the cache to be cleaned
  • it adds more implementation (and test permutation) complexity, when our primary focus for new development is the upcoming CNBs (which don't need path rewriting)
  • we still need to re-run pipenv install for cached builds to resolve issues not related to path rewriting (such as Pipenv install is skipped in cases where it's not safe to do so #1525), and re-running pipenv install resolves this issue in a simpler way

I've opened #1526, which resolves the issue when tested against https://github.com/arel/debug-heroku-pipenv and also adds integration tests for editable Pipenv installs (the buildpack previously only tested editable installs with Pip).

Note: I ran into setuptools related import errors on the first rebuild using the fix branch of the buildpack - clearing the build cache resolved these. I believe they are caused by the debug-heroku-pipenv repo having a very old setuptools version in its lockfile, which can cause issues depending on which order Pipenv attempts to install packages. There is possibly another upstream Pipenv bug causing this, but for now I'd recommend keeping your lockfile up to date, so the setuptools version in there is compatible with what gets pulled in via Pipenv itself.

edmorley added a commit that referenced this issue Jan 5, 2024
Previously the buildpack would skip running `pipenv install` for repeat
Pipenv builds if (a) the SHA256 of the `Pipfile.lock` file had not changed
since the last successful build, and (b) there were no Git VCS references
in the lockfile.

However, this has a few issues:
1. There are other cases where it's not safe to assume that there is no
    need to re-run `pipenv install`, such as when installing a non-editable
    local dependency (see #1525), or when using editable local dependencies
    with the current path re-writing strategy (see #1520).
2. The current Git VCS check has false positives (see #1130, #1398).
3. Even if we try and add more checks/workarounds to resolve (1) and (2),
    we're still having to make assumptions about internal Pipenv implementation
    details, and hardcode these in the buildpack, hoping we didn't miss anything
    and that Pipenv's behaviour doesn't change over time (which is not the case,
    as seen by the recent regression pypa/pipenv#6054)

As such, we now instead always re-run `pipenv install`, and defer to Pipenv to
decide whether the environment needs updating.

This should still be fast, since the cached `site-packages` is still being used (and
if there are any scenarios in which it's not fast, then that's an upstream Pipenv
bug).

Integration tests were also added for various types of editable Pipenv installs,
since we previously only had test coverage of editable installs for Pip.

Fixes #1520.
Fixes #1525.
Closes #1130.
Closes #1398.
@edmorley edmorley self-assigned this Jan 5, 2024
@edmorley edmorley added the bug label Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants