Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_lambda_python_alpha: Using PIP_INDEX_URL and Codeartifact cause Docker to rebuild lambda every synth #27331

Closed
kcp-chewie opened this issue Sep 28, 2023 · 6 comments · Fixed by #27903
Labels
@aws-cdk/aws-lambda-python bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@kcp-chewie
Copy link
Contributor

Describe the bug

When following the instructions for using CodeArtifact, the lambda artifact is rebuilt anytime the CodeArtifact token changes, despite the actual code not changing at all.

Expected Behavior

The lambda shouldn't be rebuilt unless the actual code changes.

Current Behavior

The lambda is rebuilt with a new hash, which causes an unnecessary redeploy. This slows down the synth and deploy steps, and causes cold starts for all lambdas.

Reproduction Steps

Define any lambda function using the Custom Bundling with Code Artifact instructions. Observe that the lambda is rebuilt on each synth, even with no code changes.

Possible Solution

The Token included in the CodeArtifact URL should be ignored when computing the hash that determines if the lambda is rebuilt. Alternately, provide an option to ignore the PIP_INDEX_URL entirely.

Additional Information/Context

No response

CDK CLI Version

2.95.1

Framework Version

No response

Node.js Version

v20.6.1

OS

OSX 12.2

Language

Python

Language Version

Python 3.9.16

Other information

No response

@kcp-chewie kcp-chewie added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 28, 2023
@indrora
Copy link
Contributor

indrora commented Oct 2, 2023

I believe this is intentional due to how CodeArtifact handles these resources.

@indrora indrora removed the needs-triage This issue or PR still needs to be triaged. label Oct 2, 2023
@kcp-chewie
Copy link
Contributor Author

@indrora could you explain a bit more why this is intentional? It doesn't seem right that despite having the exact same code, the lambda gets rebuilt/redeployed.

The only thing that changes here is the CodeArtifact auth token, which doesn't seem like something that should be included in the lambda hash since it's temporary by definition!

@lpicchi
Copy link

lpicchi commented Oct 12, 2023

Having same problem.
I think as a package remains same version it is not important if it has been downloaded from any arbitrary repo.
So i kindly ask what is the reason to include PIP_INDEX_URL in the lambda hash?
It should be fixable even if there is a reason to hash PIP_INDEX_URL, authentication details can be removed from URL before hashing.

@kcp-chewie
Copy link
Contributor Author

I looked into this a bit more, and explored the fact that it's possible to pass in a custom assetHash instead of relying on the hash computed by the CDK code. Unfortunately, that won't work, as the bundling information (which includes the environment field, and thus the PIP_INDEX_URL) is included in 2 critical places, calculateHash and calculateCacheKey.

I'd be willing to take on the work here to fix this issue, but could use some guidance on the design first.

  • Should this be configurable? Assuming this implementation will just stripping the auth details and not ignore the index entirely, there is a good argument that including it in the first place was a bug we should just fix.
  • Should I just special case these variables directly in aws-cdk-lib/core/lib/asset-staging.ts? I honestly don't see any other great options but certainly open to suggestions!

@lpicchi
Copy link

lpicchi commented Oct 30, 2023

I think the best option to go is the ability to provide custom repos urls as a bundler option. So the logic applied to build_args remains the same. So CDK could inject your repos in PIP_EXTRA_INDEX_URL or PIP_INDEX_URL and hash them using the correct strategy like credentials removal or not hashing them at all. I still can't find any utility why the repo url should be hashed. As mentioned in other comment certain package version should be the same in ever repository.

@pahud pahud added p2 effort/medium Medium work item – several days of effort labels Nov 30, 2023
@mergify mergify bot closed this as completed in #27903 Dec 23, 2023
mergify bot pushed a commit that referenced this issue Dec 23, 2023
… PIP urls, causing an unnecessary rebuild (#27903)

Update the bundler hash logic to ignore the secret token that is included in the URL when fetching packages from Code Artifact. This token changes constantly, and prevents the results of a previous build from being reused, along with causing lambdas to be unnecessarily redeployed anytime the CDK is built, even if no code is changed.

This implementation strips the token from the hash, but does not change anything else about the hash. 

Open question
Currently this logic will error if an invalid URL string is passed for PIP_INDEX_URL or PIP_EXTRA_INDEX_URL. The build would fail later anyway when those URLs are being used, but I'm happy to try/catch this logic block to be more robust if that is preferred.

Closes #27331.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

paulhcsun pushed a commit to paulhcsun/aws-cdk that referenced this issue Jan 5, 2024
… PIP urls, causing an unnecessary rebuild (aws#27903)

Update the bundler hash logic to ignore the secret token that is included in the URL when fetching packages from Code Artifact. This token changes constantly, and prevents the results of a previous build from being reused, along with causing lambdas to be unnecessarily redeployed anytime the CDK is built, even if no code is changed.

This implementation strips the token from the hash, but does not change anything else about the hash. 

Open question
Currently this logic will error if an invalid URL string is passed for PIP_INDEX_URL or PIP_EXTRA_INDEX_URL. The build would fail later anyway when those URLs are being used, but I'm happy to try/catch this logic block to be more robust if that is preferred.

Closes aws#27331.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-lambda-python bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
4 participants