Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging or failing command sam build --use-container creating LambdaLayer when BuildArchitecture is amd64 #7523

Open
rstrahan opened this issue Sep 29, 2024 · 5 comments
Labels
area/build sam build command area/docker stage/bug-repro The issue/bug needs to be reproduced

Comments

@rstrahan
Copy link

Description:

Hanging or failing command sam build --use-container creating LambdaLayer when BuildArchitecture is amd64

Steps to reproduce:

I created a minimal repro (zipfile attached).

sam-build-test.zip

It fails for arm64. It works for x86_64.
To repro simply unzip attached in an EC2 dev instance (eg it can be repro'ed on a new vanilla Cloud 9 - either AL2 or AL2023 - doesn't matter - both fail)

ARM64
In the sam-build-test directory, run

sam build --use-container --template-file template-arm64.yaml 

the first time you run it, it might fail rather than hang:

$ sam build --use-container --template-file template-arm64.yaml

        SAM CLI now collects telemetry to better understand customer needs.

        You can OPT OUT and disable telemetry collection by setting the
        environment variable SAM_CLI_TELEMETRY=0 in your shell.
        Thanks for your help!

        Learn More: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-telemetry.html

Starting Build inside a container                                                                                                                                                                            
Building layer 'TestLayer'                                                                                                                                                                                   

Fetching public.ecr.aws/sam/build-python3.12:latest-arm64 Docker container image........<snip>
Mounting /home/ec2-user/environment/sam-build-test/src as /tmp/samcli/source:ro,delegated, inside runtime container                                                                                          
exec /usr/local/opt/lambda-builders/bin/lambda-builders: exec format error
Builder crashed:                                                                                                                                                                                             

Error: Expecting value: line 1 column 1 (char 0)
Traceback:
  File "click/core.py", line 1078, in main
  File "click/core.py", line 1688, in invoke
<snip>
  File "json/decoder.py", line 337, in decode
  File "json/decoder.py", line 355, in raw_decode

An unexpected error was encountered while executing "sam build".
Search for an existing issue:
https://github.com/aws/aws-sam-cli/issues?q=is%3Aissue+is%3Aopen+Bug%3A%20sam%20build%20-%20JSONDecodeError
Or create a bug report:
https://github.com/aws/aws-sam-cli/issues/new?template=Bug_report.md&title=Bug%3A%20sam%20build%20-%20JSONDecodeError

Unfortunately this error reveals no clues to me..
When I repeat the same command again, it hangs - forever:

$ sam build --use-container --template-file template-arm64.yaml
Starting Build inside a container                                                                                                                                                                            
Building layer 'TestLayer'                                                                                                                                                                                   

Fetching public.ecr.aws/sam/build-python3.12:latest-arm64 Docker container image......
Mounting /home/ec2-user/environment/sam-build-test/src as /tmp/samcli/source:ro,delegated, inside runtime container    

X86_64
In the sam-build-test directory, run

sam build --use-container --template-file template-x86_64.yaml 

Works fine.. The only difference in the two templates is the Architecture

$ diff template-x86_64.yaml template-arm64.yaml 
9c9
<       BuildArchitecture: x86_64
---
>       BuildArchitecture: arm64
14c14
<         - x86_64
---
>         - arm64

So we could probably get past this quickly by changing the Architecture from arm64 to x86_64, but x86_64 has higher runtime cost, and I'd rather get to the root cause for why it's not working now on arm64.

Observed result:

Hangs forever after Mounting command:

$ sam build --use-container --template-file template-arm64.yaml --debug
2024-09-29 18:45:57,128 | No config file found in this directory.                                                                                                                                            
2024-09-29 18:45:57,133 | OSError occurred while reading TOML file: [Errno 2] No such file or directory: '/home/ec2-user/environment/sam-build-test/samconfig.toml'                                          
2024-09-29 18:45:57,135 | Config file location: /home/ec2-user/environment/sam-build-test/samconfig.toml                                                                                                     
2024-09-29 18:45:57,137 | Config file '/home/ec2-user/environment/sam-build-test/samconfig.toml' does not exist                                                                                              
2024-09-29 18:45:57,167 | OSError occurred while reading TOML file: [Errno 2] No such file or directory: '/home/ec2-user/environment/sam-build-test/samconfig.toml'                                          
2024-09-29 18:45:57,170 | Using config file: samconfig.toml, config environment: default                                                                                                                     
2024-09-29 18:45:57,171 | Expand command line arguments to:                                                                                                                                                  
2024-09-29 18:45:57,173 | --template_file=/home/ec2-user/environment/sam-build-test/template-arm64.yaml --use_container --mount_with=READ --build_dir=.aws-sam/build --cache_dir=.aws-sam/cache              
2024-09-29 18:45:57,225 | 'build' command is called                                                                                                                                                          
2024-09-29 18:45:57,227 | Starting Build inside a container                                                                                                                                                  
2024-09-29 18:45:57,230 | No Parameters detected in the template                                                                                                                                             
2024-09-29 18:45:57,260 | There is no customer defined id or cdk path defined for resource TestLayer, so we will use the resource logical id as the resource id                                              
2024-09-29 18:45:57,262 | 0 stacks found in the template                                                                                                                                                     
2024-09-29 18:45:57,264 | No Parameters detected in the template                                                                                                                                             
2024-09-29 18:45:57,289 | There is no customer defined id or cdk path defined for resource TestLayer, so we will use the resource logical id as the resource id                                              
2024-09-29 18:45:57,291 | 1 resources found in the stack                                                                                                                                                     
2024-09-29 18:45:57,293 | --base-dir is not presented, adjusting uri ./src relative to /home/ec2-user/environment/sam-build-test/template-arm64.yaml                                                         
2024-09-29 18:45:57,297 | 1 resources found in the stack                                                                                                                                                     
2024-09-29 18:45:57,300 | Instantiating build definitions                                                                                                                                                    
2024-09-29 18:45:57,334 | Same Layer build definition found, adding layer (Previous: LayerBuildDefinition(TestLayer, /home/ec2-user/environment/sam-build-test/src, , b93b0847-9979-4e3c-9501-0dcbaf88e80d,  
python3.12, ['python3.12'], arm64, {}), Current: LayerBuildDefinition(TestLayer, /home/ec2-user/environment/sam-build-test/src, , 51dafd0a-7ebc-4e69-9e1d-93acb3954e9b, python3.12, ['python3.12'], arm64,   
{}), Layer: <samcli.lib.providers.provider.LayerVersion object at 0x7f82f64da110>)                                                                                                                           
2024-09-29 18:45:57,342 | Building layer 'TestLayer'                                                                                                                                                         
2024-09-29 18:45:57,351 | Checking free port on 127.0.0.1:7519                                                                                                                                               

Fetching public.ecr.aws/sam/build-python3.12:latest-arm64 Docker container image......
2024-09-29 18:45:57,475 | Mounting /home/ec2-user/environment/sam-build-test/src as /tmp/samcli/source:ro,delegated, inside runtime container  

Expected result:

It should not fail or hang, but rather it should succeed - as it does with x86_64

Additional environment details (Ex: Windows, Mac, Amazon Linux etc)

  1. OS: Amazon Linux 2 or Amazon Linux 2023 (new Cloud9 instance)
  2. sam --version: SAM CLI, version 1.112.0
  3. AWS region: us-east-1
# Paste the output of `sam --info` here
{
  "version": "1.112.0",
  "system": {
    "python": "3.11.3",
    "os": "Linux-5.10.225-213.878.amzn2.x86_64-x86_64-with-glibc2.26"
  },
  "additional_dependencies": {
    "docker_engine": "25.0.6",
    "aws_cdk": "2.159.1 (build c66f4e3)",
    "terraform": "Not available"
  },
  "available_beta_feature_env_vars": [
    "SAM_CLI_BETA_FEATURES",
    "SAM_CLI_BETA_BUILD_PERFORMANCE",
    "SAM_CLI_BETA_TERRAFORM_SUPPORT",
    "SAM_CLI_BETA_RUST_CARGO_LAMBDA"
  ]
}

Add --debug flag to command you are running

@rstrahan rstrahan added the stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. label Sep 29, 2024
@hawflau
Copy link
Contributor

hawflau commented Oct 3, 2024

@rstrahan thanks for raising the issue. We will try to reproduce the issue

@hawflau hawflau added stage/bug-repro The issue/bug needs to be reproduced area/docker area/build sam build command and removed stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels Oct 3, 2024
@hawflau
Copy link
Contributor

hawflau commented Oct 7, 2024

@rstrahan a quick check - if you use an arm64 EC2 instance instead, will building "template-arm64.yaml" in your provided example succeed?

@rstrahan
Copy link
Author

rstrahan commented Oct 7, 2024

I don't know, @hawflau - I have not tried. Quite possibly it would work, but our customer is using Cloud9 to run their builds, and C9 doesn't support arm64 instances. It has always worked fine on x86 C9 instances, until recently.. and in fact it still does work on some pre-existing older C9 instances (though I cannot trace the issue to specific versions of docker, sam cli, or aws cli.)

Hopefully you can repro the problem with the attached example, and see if you can determine the root cause.

In the meantime I was able to get unblocked thanks to help on the aws-sam-interest channel - it turns out you can build an arm64 lambda function/layer using an x86 container.. So the workaround was to specify the non-default image using --build-image public.ecr.aws/sam/build-python3.12:latest-x86_64.

But the essential issue remains that sam build --use-container now seems broken when running on an x86 EC2/Cloud9 instance when the target is a arm64 lambda layer.

@hawflau
Copy link
Contributor

hawflau commented Oct 8, 2024

Makes sense. And thanks for finding a workaround. I also managed to reproduce the issue.

Docker runs an arm64 image in a x86 environment with emulation, which usually incurs some level of performance hit.

When using --build-image public.ecr.aws/sam/build-python3.12:latest-x86_64, docker does not need to run emulation to run the image, and within the container, Lambda Builder only needs to download the arm64 linux wheels for the packages. That seems to be the reason why this workaround works. However, it might not work if a package does not provide a arm64 linux wheel and requires build from source.

That being said, I don't think there is anything we can do in SAM CLI. sam build --use-container using the build image of arch matching the host seems the right behavior. Potentially we could issue a warning if the target arch and the host arch don't match. I'll bring it up with team to discuss about it.

@rstrahan
Copy link
Author

rstrahan commented Oct 8, 2024

Docker runs an arm64 image in a x86 environment with emulation, which usually incurs some level of performance hit.

But this isn't a performance hit.. It's not working at all.

Can we find out why? If the issue is with docker, can we repro the issue without sam cli in the loop, and see if docker will support/fix it, or propose a workaround that could be implemented within the sam cli?

It also occurs to me that maybe there's a layer of logging available from docker that could be exposed and which might reveal more clues, since right now it seems to be failing silently (i.e. revealing no clues)..

I'll bring it up with team to discuss about it.

Thanks @hawflau !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build sam build command area/docker stage/bug-repro The issue/bug needs to be reproduced
Projects
None yet
Development

No branches or pull requests

2 participants