Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source_code_hash does not update #7385

Open
ghost opened this issue Jan 30, 2019 · 32 comments
Open

source_code_hash does not update #7385

ghost opened this issue Jan 30, 2019 · 32 comments
Labels
bug Addresses a defect in current functionality. service/lambda Issues and PRs that pertain to the lambda service. service/s3 Issues and PRs that pertain to the s3 service.

Comments

@ghost
Copy link

ghost commented Jan 30, 2019

This issue was originally opened by @joerggross as hashicorp/terraform#20152. It was migrated here as a result of the provider split. The original body of the issue is below.


Terraform Version

v0.11.11

Terraform Configuration Files

data "aws_s3_bucket_object" "lambda_jar_hash" {
  bucket = "${var.lambda_s3_bucket}"
  key    = "${var.lambda_s3_key}.sha256"
}

resource "aws_lambda_function" "lambda_function_s3" {

  s3_bucket = "${var.lambda_s3_bucket}"
  s3_key = "${var.lambda_s3_key}"
  s3_object_version = "${var.lambda_s3_object_version}"

  function_name = "${var.lambda_function_name}"
  role = "${var.lambda_execution_role_arn}"
  handler = "${var.lambda_function_handler}"
  source_code_hash = "${base64encode(data.aws_s3_bucket_object.lambda_jar_hash.body)}"
  runtime = "java8"
  memory_size = "${var.lambda_function_memory}"
  timeout = "${var.lambda_function_timeout}"
  description = "${var.description}"
  reserved_concurrent_executions = "${var.reserved_concurrent_executions}"

}

Debug Output

...

~ module.comp-price-import-data-reader-scheduled-lambda.aws_lambda_function.lambda_function_s3
last_modified: "2019-01-30T11:58:32.826+0000" =>
source_code_hash: "6HVMIk6vxvBy4AApmHbQis5Av2uQeSJh3XRosmKtv0U=" => "ZTg3NTRjMjI0ZWFmYzZmMDcyZTAwMDI5OTg3NmQwOGFjZTQwYmY2YjkwNzkyMjYxZGQ3NDY4YjI2MmFkYmY0NQ=="

Plan: 0 to add, 1 to change, 0 to destroy.

Crash Output

~ module.comp-price-import-data-reader-scheduled-lambda.aws_lambda_function.lambda_function_s3
last_modified: "2019-01-30T11:58:32.826+0000" =>
source_code_hash: "6HVMIk6vxvBy4AApmHbQis5Av2uQeSJh3XRosmKtv0U=" => "ZTg3NTRjMjI0ZWFmYzZmMDcyZTAwMDI5OTg3NmQwOGFjZTQwYmY2YjkwNzkyMjYxZGQ3NDY4YjI2MmFkYmY0NQ=="

Plan: 0 to add, 1 to change, 0 to destroy.

Expected Behavior

We generate an additional file in the s3 bucket along with the lambda jar file to be deployed in s3. The additional file contains a SHA256 hash of the deployed jar file. The hash value of the file is set to the source_code_hash property of the lamba function, by using the bas64 encode function.

We would expect that the hash is stored in the tfsate and reused when applying the scripts, so that the lambda jar file is not redeployed unless the hash changes.

Actual Behavior

We applied the scripts different times without changing the jar or hash file in s3. Nevertheless terraform always redeployes the jar. The output (see above) is always the same ("6HVMIk6vxvBy4AApmHbQis5Av2uQeSJh3XRosmKtv0U=" => "ZTg3NTRjMjI0ZWFmYzZmMDcyZTAwMDI5OTg3NmQwOGFjZTQwYmY2YjkwNzkyMjYxZGQ3NDY4YjI2MmFkYmY0NQ=="). It seems the the given hash is never stored in the tfstate.

@nywilken nywilken added question A question about existing functionality; most questions are re-routed to discuss.hashicorp.com. service/lambda Issues and PRs that pertain to the lambda service. service/s3 Issues and PRs that pertain to the s3 service. labels Jan 31, 2019
@dbgeek
Copy link
Contributor

dbgeek commented Feb 14, 2019

I do the same for my Go lambda function but I don´t use the base64encode as the sha256 sum is in the file.

Try this
source_code_hash = "${data.aws_s3_bucket_object.lambda_jar_hash.body}"

@cosots
Copy link

cosots commented Apr 23, 2019

Dear all,

we do have the issue as described above. The code looks similar. Each time we run terraform apply the Lambda function is redeployed, even if nothing has changed. I have looked at the output of terraform and can confirm, that the hash of source_code_hash is not updated in the state file.

@uzun0v
Copy link

uzun0v commented Apr 25, 2019

Hi All,

So the case here is pretty much the same:

My lambda function is fetching its source code from S3 object. I struggling to generate the proper source code hash string, which make the the lambda being updated every single run (even the source code is the same)

 ~ aws_lambda_function.lambda_fuctions
      last_modified:    "2019-04-25T09:59:53.110+0000" => <computed>
      qualified_arn:    "arn:aws:lambda:us-east-2:*:function:Test:1" => <computed>
      source_code_hash: "QkHfqU5xHUNfKaRgSj4t5cSqPBZeI70Ga+b8H8QwlWk=" => "NDI0MWRmYTk0ZTcxMWQ0MzVmMjlhNDYwNGEzZTJkZTVjNGFhM2MxNjVlMjNiZDA2NmJlNmZjMWZjNDMwOTU2OQ=="
      version:          "1" => <computed>

In order to get *.jar filesum I creates another txt file that contains sha256sum of the file

 sha256sum lambda.jar
4241dfa94e711d435f29a4604a3e2de5c4aa3c165e23bd066be6fc1fc4309569

If I use the buildin terraform filebase64sha256 function I see the same filesum as the one that terraform gets for the S3 object
filebase64sha256("../lambda.jar") QkHfqU5xHUNfKaRgSj4t5cSqPBZeI70Ga+b8H8QwlWk=

But we I generate the filesum base64 encoded locally - the string that I'm getting is different.
echo -n "4241dfa94e711d435f29a4604a3e2de5c4aa3c165e23bd066be6fc1fc4309569" | base64 NDI0MWRmYTk0ZTcxMWQ0MzVmMjlhNDYwNGEzZTJkZTVjNGFhM2MxNjVlMjNiZDA2NmJlNmZjMWZj NDMwOTU2OQ==

result from the terraform console
base64encode(filesha256("../lambda.jar")) NDI0MWRmYTk0ZTcxMWQ0MzVmMjlhNDYwNGEzZTJkZTVjNGFhM2MxNjVlMjNiZDA2NmJlNmZjMWZjNDMwOTU2OQ==

In the documents for 0.11.11 is explisitly said:

base64sha256(string) - Returns a base64-encoded representation of raw SHA-256 sum of the given string. This is not equivalent of base64encode(sha256(string)) since sha256() returns hexadecimal representation.

So the question here is how to generate the properly encoded SHA256 string on the host Linux bash

tf looks liket that

`data "aws_s3_bucket_object" "jar_hash" {
  bucket = "essilor-lambda"
  key    = "lambda-functions/xxx/xxx/xxxx/lambda.txt"
}

output "test" {
  value = "${base64encode(data.aws_s3_bucket_object.jar_hash.body)}"
}

resource "aws_lambda_function" "lambda_fuction" {
  s3_bucket = "essilor-lambda"

  s3_key           = "lambda-functions/xxx/xxxx/xxxx/lambda.jar"      
  function_name    = "Test"
  description      = "test"
  handler          = "xxxx.xxxx.xxxx.xxx.xxxxx"
  role             = "arn:aws:iam::xxxxx:role/xxxxxxxx"
  runtime          = "java8"
  timeout          = "5"
  publish          = "true"
  source_code_hash = "${base64encode(data.aws_s3_bucket_object.jar_hash.body)}"`

@teiohanson
Copy link

@uzun0v @cosots

Try something like this to generate the hash locally:

python3.7 -c "import base64;import hashlib;print(base64.b64encode(hashlib.sha256(open('$FILE','rb').read()).digest()).decode(), end='')"

@tracypholmes tracypholmes added needs-triage Waiting for first response or review from a maintainer. and removed question A question about existing functionality; most questions are re-routed to discuss.hashicorp.com. labels Jul 1, 2019
@aeschright aeschright added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Nov 21, 2019
@apottere
Copy link

apottere commented Dec 5, 2019

We're seeing the exact same issue, source_code_hash is never updated in the tfstate when applying so the lambda resource always requires updating no matter how many times we apply:

      ~ source_code_hash               = "83TsTFxfrLQJvQ8Re1YdXiGX2eQm1a1uX8Sc0bKeC3w=" -> "p6F5Wk4naphwng6ZQRNahuvJ7BUEFfHnMR9wQQpVkCM="

@duxan
Copy link

duxan commented Jan 29, 2020

An easier (alternative) way to update lambda function on code change, when sourced from S3, would be to set S3 bucket versioning and set lambda zip version:

data "aws_s3_bucket_object" "lambda_zip" {
  bucket  = "bucket_name"
  key     = "lambda.zip"
}

resource "aws_lambda_function" "run_hll_lambda" {
  s3_bucket         = data.aws_s3_bucket_object.lambda_zip.bucket
  s3_key            = data.aws_s3_bucket_object.lambda_zip.key
  s3_object_version = data.aws_s3_bucket_object.lambda_zip.version_id
  function_name     = "Lambda_name"
  role              = aws_iam_role.lambda_iam.arn
  handler           = "lambda_function.lambda_handler"
  runtime           = "python3.7"
}

@joerggross
Copy link

Using a version id will not work for us, because we want to use snapshot-versions during development time, without always deploying and referencing a new version number.

@joerowe
Copy link

joerowe commented May 16, 2020

I'm also experiencing this issue on v0.12.25, aws provider v2.62.0, with a jar that is uploaded directly to the lambda.

I've tried different hashing algorithms but the ones generated on apply never match the ones in state.

EDIT:
I noticed that the different hashes are referenced in the debug logs, related to the warning:
Provider "registry.terraform.io/-/aws" produced an unexpected new value for but we are tolerating it because i t is using the legacy plugin SDK.

I'm using:
Terraform v0.12.25
AWS v2.62.0

@mascah
Copy link

mascah commented Jun 11, 2020

I'm experiencing this with v0.12.20 and aws provider v2.65.0 with a zip file that's referenced from an s3 bucket.


data "aws_s3_bucket_object" "lambda" {
  bucket = aws_s3_bucket.lambda.id
  key    = "lambda.zip"
}

resource "aws_lambda_function" "lambda" {
  s3_bucket        = aws_s3_bucket.lambda.id
  s3_key           = data.aws_s3_bucket_object.lambda.key
  function_name    = "lambda"
  role             = aws_iam_role.lambda.arn
  handler          = "lambda.handler"
  timeout          = 300
  memory_size      = 256
  source_code_hash = base64sha256(data.aws_s3_bucket_object.lambda.etag)
  runtime          = "python3.8"
}

I'm using the etag from the s3 object as the input for the hash, which shouldn't change unless we upload a new version.

When I run apply twice in a row, the input hash is always the same, but the new hash is not being persisted to the state and the next run shows the same output.

 ~ source_code_hash               = "FxFe/pitsCj4XL/F+VORZASkGZdejRgNc7OABiKaWpg=" -> "oE4rN1nboxBBF64fQl8Q0GPtAE7bLqOofP/ACZPPz2A="

michallorens added a commit to michallorens/aws-lambda-python-layer that referenced this issue Jun 23, 2020
@kpashov
Copy link

kpashov commented Aug 28, 2020

I am experiencing the same issue, specifically inside a CI/CD pipeline. It does not occur on OSX and it does not occur in Docker on OSX when the project directory is mounted from OSX.

resource "aws_lambda_function" "index" {
  filename      = "../lambda/index.zip"
  function_name = "${var.project}_${var.environment}_redirect2index"
  role          = "${aws_iam_role.iam_for_basic_permission.arn}"
  handler       = "index.handler"
  source_code_hash = filebase64sha256("../lambda/index.zip")
  runtime = "nodejs12.x"
  publish = true
  provider = "aws.east"
}

However, with the same Docker image, TF version, and AWS Provider version, the hashes in the CI pipeline never match. The one generated by filebase64sha256("../lambda/index.zip") match between runs, however, the ones stored in state are completely different each time.

I thought this was an issue of something else getting hashed, such as a timestamp or similar, but the generated hash is the same. Somehow, that hash that gets computed doesn't get stored under source_code_hash.

This is actually quite a nasty problem because when the Lambda is used with CloudFront, the latter redeploys each time - because AWS thinks that a new version of the Lambda has been created. This then adds an additional at least 3, but often 10+ minutes to CD pipeline.

@eclosson
Copy link

eclosson commented Sep 2, 2020

I've done a little digging into this issue as I recently encountered it.

In my use case I generate the zip files frequently, even if the underlying contents don't change, the meta data changes in the zip file cause a different hash.

I tried to generate the hash of the contents outside of the zip and set it as the source code hash to get around this.

From my observations it appears that the source_code_hash field get's set in the state file from the filename field regardless of the content supplied to it. ie: filebase64sha256(aws_lambda_function.func.filename).

@kpashov
Copy link

kpashov commented Sep 4, 2020

I have found a workaround that works for my case - using the Terraform built-in zip provider (archive_file). Generating the zip outside Terraform seems to be causing issues, even though it shouldn't.

Something like this works fine for me and doesn't cause the lambda to be updated between subsequent runs of the CI/CD pipeline, even days apart.

data "archive_file" "lambda" {
  type        = "zip"
  source_file = "../lambda/index.js"
  output_path = "../lambda/index.zip"
}

resource "aws_lambda_function" "index" {
  filename      = data.archive_file.lambda.output_path
  function_name = "${var.project}_${var.environment}_lambda"
  role          = "${aws_iam_role.iam_for_basic_permission.arn}"
  handler       = "index.handler"
  source_code_hash = data.archive_file.lambda.output_base64sha256
  runtime = "nodejs12.x"
  publish = true
  provider = "aws.east"
}

@Miggleness
Copy link

Miggleness commented Nov 17, 2020

I'm reporting the same concern, too.

The main problem is the purpose of source_code_hash isn't clear. The documentation of aws_lambda_function states that source_code_hash is an argument that seems to have an impact to deployment, but that doesn't seem to be the case.

Looking at the source code, it is a computed field. After a successful deploy, the value of source_code_hash is overwritten by the response from AWS's API (code) by calling resourceAwsLambdaFunctionRead().

In short, the value assigned to source_code_hash doesn't affect deployment and is always overwritten, unless otherwise it matches, the hash returned by AWS API.

What we need

We need a way to deterministically trigger lambda deployments (e.g. after code change is detected) without presumptions that everyone uses the same process in packaging their code.

Is source_code_hash the correct attribute to use for this? Yes and no. I'd be nice to keep the hash returned by AWS's API, but probably we'd need another attribute similar to source_code_hash that meets our need.

Suggestion

  1. Update documentation so that source_code_hash is clearly defined as an output.
  2. Remove Optional:true from the schema for source_code_hash
  3. Add a new attribute change_trigger_hash that is optional and not computed. Suggestions for better name are welcome.
  4. If change_trigger_hash is null, then plan and apply would work as how they are working now
  5. If change_trigger_hash is not null, then compare current value to previous value. They they are the same, include change in plan. Otherwise, ignore resource change.

@aeschright does this sound like something that we can do? I'll submit a PR if yes

===========================
Update: Upon looking further, source_code_hash indeed triggers a change which makes my suggestion invalid. I'll try an idea out which I hope would work

@AGiantSquid
Copy link

I had a very similar problem where the statefile was not getting an updated source_code_hash after an apply. @Miggleness pointed me in the right direction by noting that the value in source_code_hash is overwritten by AWS. This means that the hash you use in your lambda resource definition must be computed the same way that AWS computes the hash. Otherwise, you will always have a different value in your source_code_hash, and your lambda will always be redeployed.

So when you see something like:

 ~ source_code_hash = "QuYMcyiptpzreIVxuq8AL+UWobBp3pDq045f2ISoKB0=" -> "42e60c7328a9b69ceb788571baaf002fe516a1b069de90ead38e5fd884a8281d" # forces replacement

The value on the left is the AWS calculated hash, and the value on the right is the value you are providing terraform in your lambda definition.

If you calculate the hash yourself with shell, use the following algorithm:

openssl dgst -sha256 -binary ${FILE_NAME}.zip | openssl enc -base64

If you calculate it with a python script, use something like the following:

import base64
import hashlib


def get_aws_hash(zip_file):
    '''Compute bash64 sha256 hash of zip archive, with aws algorithm.'''
    with open(zip_file, "rb") as f:
        sha256_hash = hashlib.sha256()

        # Read and update hash string value in blocks of 4K
        while byte_block := f.read(4096):
            sha256_hash.update(byte_block)

    hash_value = base64.b64encode(sha256_hash.digest()).decode('utf-8')

    return hash_value

@riisi
Copy link

riisi commented Jan 8, 2021

If you are deploying the lambda from a container image (https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/), then the AWS calculated hash will be the image digest.

Making a change to the image source, building and pushing, results in a different digest hash, which triggers the lambda updaate.

You can do something like:

terraform {
  required_providers {
    # Note that non hashicorp providers need to be declared in each child module - https://github.com/hashicorp/terraform/issues/27361
    docker = {
      source  = "kreuzwerker/docker"
      version = "2.9.0"
    }
  }
}

provider "docker" {
  registry_auth {
    address  = "<account id>.dkr.ecr.<region>.amazonaws.com"
    username = "AWS"
    // The password is read from DOCKER_REGISTRY_PASS
  }
}


data "docker_registry_image" "my-image" {
  name = "<account id>.dkr.ecr.<region>.amazonaws.com/my-repo/image-name:latest"
}

resource "aws_lambda_function" "my_lambda" {
  function_name    = "my-lambda"
  role             = aws_iam_role.lambda.arn

  package_type = "Image"
  image_uri = data.docker_registry_image.avping-ingest.name
  source_code_hash = split("sha256:", data.docker_registry_image.my-image.sha256_digest)[1]
}

The only other limitation I'd note with container images at present is that they are restricted to using an ECR repository in the same AWS account.

@ektedar
Copy link

ektedar commented Apr 16, 2021

If someone is still running into this issue, a fix that worked for me was the etag in the aws_s3_object resource block.

This tag triggers an update on the zip file on deployment.

resource "aws_s3_bucket_object" "lambda" {
  bucket   = var.s3_bucket
  key      = "lambda.zip"
  etag     = filemd5("lambda.zip")
}

If you have the lambda resource block pointing to the right bucket and key, the lambda should get updated. I have tried several steps involving manually zipping up the lambdas and using the archive data block but on deployment it never detected the change even with source hash. etag on the aws_s3_bucket_object did the trick.

EDIT:
Thanks to @heldersepu for pointing this out. If you are using KMS encryption source_hash will be a better alternative.

@heldersepu
Copy link
Contributor

heldersepu commented Mar 5, 2022

If someone is still running into this issue, a fix that worked for me was the etag in the aws_s3_object resource block.

This tag triggers an update on the zip file on deployment.

The etag has a big warning there, it is not compatible with buckets with KMS encryption, also with large files the etag is not an md5 digest, that will give problems too:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html

This is has been working fine for me to keep track of many files:

resource "aws_s3_object" "file" {
  for_each = fileset("${path.module}/..", "*.zip")

  bucket = aws_s3_bucket.deploy_bucket.id
  key    = each.value
  source = "${path.module}/../${each.value}"

  source_hash = filemd5("${path.module}/../${each.value}")
}

@danu165
Copy link
Contributor

danu165 commented Mar 12, 2022

I ran into this issue for aws lambda layers where I was using source_code_hash = filemd5("lambda_layer.zip"). My layer was being updated each time, possibly due to what @Miggleness mentioned where terraform gets the value from AWS and saves that value in the state rather than the actual md5 of the file.

Not sure why this worked, but I switched to using the filebase64sha256 function and now terraform saves the hash properly.

So, my fix is: source_code_hash = filebase64sha256("lambda_layer.zip")

@defyjoy
Copy link

defyjoy commented Jul 7, 2022

I ran into this issue for aws lambda layers where I was using source_code_hash = filemd5("lambda_layer.zip"). My layer was being updated each time, possibly due to what @Miggleness mentioned where terraform gets the value from AWS and saves that value in the state rather than the actual md5 of the file.

Not sure why this worked, but I switched to using the filebase64sha256 function and now terraform saves the hash properly.

So, my fix is: source_code_hash = filebase64sha256("lambda_layer.zip")

The issue with this is if you are considering the zip already exists however dynamically generating zips using "archive_file" data block doesn't fall under this situation since in that case the source_code_hash will fail as it wont find the zip during plan .

@onitake
Copy link
Contributor

onitake commented Sep 9, 2022

I'm using a similar approach to archive_file, but with the grunt work happening in a shell script:

resource "null_resource" "lambda" {
  provisioner "local-exec" {
    command = "./build.sh lambda.zip"
  }
  triggers = {
    source_hash = filebase64sha256("lambda.py")
  }
}

Since null resource state gets persisted, but the built zip file does not, I cannot rely on recalculating the hash on every run.
My workaround was uploading the zip file to an S3 bucket and deploying from there, but aws_lambda_function does not redeploy the function when the S3 object changes - even if I add a depends_on:

resource "aws_s3_object" "lambda" {
  bucket = "mybucket"
  key = "lambda.zip"
  source = "lambda.zip"
  source_hash = base64sha256(jsonencode(null_resource.lambda_channel_controller.triggers))
}
resource "aws_lambda_function" "lambda" {
  function_name = "lambda"
  depends_on = [aws_s3_object.lambda]
  s3_bucket = aws_s3_object.lambda.bucket
  s3_key = aws_s3_object.lambda.key
  #source_code_hash = aws_s3_object.lambda.source_hash
}

The only way to reliably trigger a deployment is by adding source_code_hash to the aws_lambda_function resource, which will cause a redeployment on every run. On the upside, I won't have to reupload anything, as Lambda can fetch the zip file from S3 directly.

archetypalsxe added a commit to FormidableLabs/pulumi-terraform-comparison that referenced this issue Sep 16, 2022
* Created a working version of the Lambda function
* Created a zip file of the Lambda code and uploaded to the S3 code
  bucket
* Added a CloudWatch Group and IAM role to be used by the Lambda
  function
* Added an IAM policy for logging to CloudWatch and attached to the IAM
  Role
* Enabled versioning on the S3 code bucket
* Ended up having to switch from using a SourceCodeHash to the S3
  version...
  * Tried to use both the asset and source hash of both the zip file as
    well as the uploaded S3 object and both were causing a continuous
    update of the function
    (hashicorp/terraform-provider-aws#7385)
  * Tried to manually create the base64 encoded hash from the Lambda
    code zip, but CDKTF was trying to access the zip before it was
    actually created
  * Seems like this is a better solution anyway, as the S3 bucket for
    the code should probably be versioned anyway
@systematicguy
Copy link

systematicguy commented Mar 24, 2023

My case what slightly different, but same effect: I am building a aws_lambda_layer_version resource.
I was trying to have source_code_hash = filebase64sha256("poetry.lock"), because that was the only file
that is there both build and deployment time. I wanted to be smart and make terraform skip deploying a new layer
if the poetry.lock did not change.

Then I faced the same issue like described here (and several forum posts).

I also ended up storing the hash alongside the zip file in s3 during build time.
I calculate the hash like @AGiantSquid suggests, using cat layer.zip | openssl dgst -binary -sha256 | openssl base64,
and then uploaded it to s3 alongside the zip file.

In the deployment terraform code I tried to apply the following, somewhat ugly construct:

data "aws_s3_object" "layer_zip_hash" {
  bucket = "the-bucket"
  key    = "layer-common.zip.hash"
}

resource "aws_lambda_layer_version" "lambda_layer" {
  layer_name = "layer-common"

  s3_bucket = "the-bucket"
  s3_key    = "layer-common.zip"
  # Nothing else works other than the hash of the zip file.
  #  https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_layer_version#source_code_hash
  #   "Must be set to a base64-encoded SHA256 hash of the package file specified with either filename or s3_key"
  #  Earlier we tried to use the hash of the poetry.lock file, but that didn't work,
  #  although terraform plan did show a change in the source_code_hash, forcing recreation,
  #  in the end the source_code_hash was not recording the custom hash of the poetry.lock file.
  #   https://github.com/hashicorp/terraform-provider-aws/issues/7385
  #   see this comment especially: https://github.com/hashicorp/terraform-provider-aws/issues/7385#issuecomment-733995977
  #  See storing the hash as an object technique here:
  #  https://discuss.hashicorp.com/t/lambda-function-error-call-to-function-filebase64sha256-failed-no-file-exists/20233/4
  source_code_hash = data.aws_s3_object.layer_zip_hash.body
  skip_destroy     = false

  compatible_runtimes = ["python3.9"]
}

My terraform plan finally has stopped marking my layer to be replaced.
The problem is, that it stopped for the wrong reasons. Now it does not update even if the actual zipped dependencies change!
It really works, whenever the layer zip's hash changes in s3, a new layer version will be produced. Make sure to upload the layer with content_type text/plain to avoid much frustration.

I have also looked into the fact that zipping twice the same content causes different hashes (no matter that the zips' content are the same), so I am starting to get clueless.

I use the deterministic_zip pip package to make sure my zip's effective content doesn't change, yet allow myself to upload all the time a new zip file, like @joerggross, too. (https://github.com/bboe/deterministic_zip)


My 2 cents, and I admit, I was facepalming myself as well, when I found out, but look at the documentation:
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function#source_code_hash

  • source_code_hash - (Optional) Used to trigger updates. Must be set to a base64-encoded SHA256 hash of the package file specified with either filename or s3_key. The usual way to set this is ${filebase64sha256("file.zip")} (Terraform 0.11.12 or later) or ${base64sha256(file("file.zip"))} (Terraform 0.11.11 and earlier), where "file.zip" is the local filename of the lambda layer source archive.

This is not a new entry (I went there and looked up in the git history, this piece of information was there for more than 4 years, 992d697).
Even it being old enough so I can shame myself not noticing it during implementation, I totally align my opinion with @Miggleness about making this a little bit less prone to mess up.

I am not yet sure, though, how to ergonomically eliminate the hashing function there.
Because I feel the source_code_hash parameter in its current form allows too much moving parts, causing all of us to
naively drop in different calculations. Whereas the documentation clearly states that it must be set to a specific hash function of a specific file.
An option could be the resource calculating the hash, lowering the level of abstraction of the parameter we provide to the source_package_file only, that can be a zip, jar, etc whatever we need to deploy.

@FarhanSajid1
Copy link

I had a very similar problem where the statefile was not getting an updated source_code_hash after an apply. @Miggleness pointed me in the right direction by noting that the value in source_code_hash is overwritten by AWS. This means that the hash you use in your lambda resource definition must be computed the same way that AWS computes the hash. Otherwise, you will always have a different value in your source_code_hash, and your lambda will always be redeployed.

So when you see something like:

 ~ source_code_hash = "QuYMcyiptpzreIVxuq8AL+UWobBp3pDq045f2ISoKB0=" -> "42e60c7328a9b69ceb788571baaf002fe516a1b069de90ead38e5fd884a8281d" # forces replacement

The value on the left is the AWS calculated hash, and the value on the right is the value you are providing terraform in your lambda definition.

If you calculate the hash yourself with shell, use the following algorithm:

openssl dgst -sha256 -binary ${FILE_NAME}.zip | openssl enc -base64

If you calculate it with a python script, use something like the following:

import base64
import hashlib


def get_aws_hash(zip_file):
    '''Compute bash64 sha256 hash of zip archive, with aws algorithm.'''
    with open(zip_file, "rb") as f:
        sha256_hash = hashlib.sha256()

        # Read and update hash string value in blocks of 4K
        while byte_block := f.read(4096):
            sha256_hash.update(byte_block)

    hash_value = base64.b64encode(sha256_hash.digest()).decode('utf-8')

    return hash_value

how would we use this to solve the issue?

@FarhanSajid1
Copy link

Any work arounds for this?

@luciangutu
Copy link
Contributor

luciangutu commented Jun 22, 2023

Any work arounds for this?

Look at the answer from @systematicguy: #7385 (comment)
Also, take a look here: https://stackoverflow.com/questions/57317157/terraform-lambda-source-code-hash-from-s3-data-doesnt-stick-in-the-state-cache

data "aws_s3_object" "this" {
  bucket = var.lambda_bucket_name
  key    = var.lambda_bucket_hash
}
resource "aws_lambda_function" "this" {
...
  source_code_hash = chomp(data.aws_s3_object.this.body)

Another solution is to use 's3_object_version':

resource "aws_s3_object" "function" {
  bucket      = aws_s3_bucket.function-bucket.bucket
  key         = local.key_filename
  source      = format("%s/%s", var.local_path, local.key_filename)
  source_hash = filemd5(format("%s/%s", var.local_path, local.key_filename))
}
resource "aws_lambda_layer_version" "lambda_layer" {
  s3_bucket  = aws_s3_bucket.function-bucket.bucket
  s3_key     = aws_s3_object.function.key
  layer_name = 'layer'
  # https://github.com/terraform-providers/terraform-provider-aws/issues/7385
  s3_object_version        = aws_s3_object.function.version_id
...
  depends_on               = [aws_s3_object.function]
}

@Berehulia
Copy link

Berehulia commented Jul 23, 2023

Hello.

In my case, I solved the issue with JAR archives redeployment.

The archive stores the modified date for each file as metadata, and because of this, the output hash differs depending on build time.

It is possible to change the outputTimestamp in maven-jar-plugin to a specific static date:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-jar-plugin</artifactId>
    <version>3.3.0</version>
    <configuration>
        <outputTimestamp>2023-01-01T00:00:00Z</outputTimestamp>
    </configuration>
</plugin>

After that hash becomes consistent for recurring builds, it should work till there is no additional variable metadata in the JAR file.


It will be nice to calculate hashes of source files or directories without packaging or other ways to find differences in sources using Terraform. I have been working with it for only a week, so maybe such an option already exists, but I don't find such -_-

@jvalore-medallia
Copy link

jvalore-medallia commented Jan 23, 2024

Old topic, but throwing in a summary and a possible workaround, as of hashicorp/aws v5.12.0, for those who stumble across this later...


If you don't include a source_code_hash then your lambdas will deploy once, the first time you do a plan/apply, then never update again even if the zip files change.


If you do include a source_code_hash then it has to be exactly what aws calculates on their end, or else your lambdas will redeploy every tf plan whether they changed or not.

filebase64sha256() does seem to do this, if you account for the caveat below with the zip files changing...


Some people say they have success with archive_file but it seems like maybe you shouldn't use that?


If you are building/compiling/bundling your code dynamically, then you can't rely on the sha hash of the zip, because the zip archive specification includes the created and last-modified timestamps for files. So every time you run something like typescript or webpack or esbuild or whatever build tool your language has, if the files are recreated, then the zip will change. i.e. if you re-run tsc on the same code, you get the same output js but different zips:

Code that won't change:

$ echo "console.log('test');" > main.ts

Build and zip the code once:

$ yarn run tsc main.ts

$ openssl dgst -sha256 -binary main.js | openssl enc -base64
oFeuE/P+XJMjkMS5pAPudQOMGJQ323nQt+DQ+9zbdAg=

$ zip main.zip main.js
  adding: main.js (stored 0%)

$ openssl dgst -sha256 -binary main.zip | openssl enc -base64
S67KwseOb4Fgxf3GApekcSVRl6OJHEqTVCxoMVeAv1M=

Build and zip again:

$ yarn run tsc main.ts

$ openssl dgst -sha256 -binary main.js | openssl enc -base64
oFeuE/P+XJMjkMS5pAPudQOMGJQ323nQt+DQ+9zbdAg=

$ zip main.zip main.js
  adding: main.js (stored 0%)

$ openssl dgst -sha256 -binary main.zip | openssl enc -base64
jSeLewPYPqKYHZ1DPfY77lJI8ppve0xj5b2ibeN4Nbw=

The hashes of the typescript compiled main.js are the same between builds, but the main.zip hash changes, because the timestamp of main.js changed.

I would say that its a flawed approach to rely on the zip file hashes; you really need the hashes of the contents of the zip archives, not the archives themselves. But since aws seems to internally use those, I guess there isn't much TF can do about that... I'm not sure how CDK and CloudFormation get around this (or maybe they are just as broken?)


The most viable solution I have found so far is to make your own zip files using a utility where you can override the created/last-modified dates.

For example in a JS project I am using zip-stream to add files to the zip archive and set their dates to new Date(0)

  archive.entry(
    Buffer.from(file.contents),
    {
      name: path.basename(file.path),
      date: new Date(0),
    },

which results in the resulting zip file now being the same when the code is the same.

Then in terraform the source_code_hash can be set consistently.

resource "aws_lambda_function" "my_lambda" {
  function_name    = "myLambda"
  filename         = "dist/myLambda.zip"
  source_code_hash = filebase64sha256("dist/myLambda.zip")

@heldersepu
Copy link
Contributor

heldersepu commented Jan 25, 2024

@jvalore-medallia
I agree with your "most viable solution" one caveat is that if you have a lot of files to zip it can be slow...
My workaround was to download the zip files from S3 and then only zip those that we detect changes

My zip files are all in one bucket and to download them is one simple command, I do:

aws s3 cp --no-progress --recursive s3://your-bucket-name/   .

In my case these are all lambda functions, and they are in a common folder, we can get what changed from GIT using:

FILES=$(git log -m -1 --name-only --pretty="format:" | grep "functions/")

then you can skip those that have not changed, for me it save me a lot of time, on normal runs when only a couple of functions change it was a more than 80% time reduction, and besides the time it takes to zip is also the "yarn install" and any other compilation steps.


An alternative to zip-stream that I use is git-restore-mtime

@justinretzolk
Copy link
Member

Hi all 👋 I believe this may have been resolved with #31887 in version 5.32.0 of the provider. It may be worth testing with that version to see if you're still experiencing this behavior.

@justinretzolk justinretzolk added the waiting-response Maintainers are waiting on response from community or contributor. label Feb 15, 2024
@pierky
Copy link

pierky commented Feb 16, 2024

@justinretzolk 5.32.0 didn't help in my case.

I was hitting the same issue documented here, where an apply done on 2 different platforms was generating an unexpected re-deploy of the Lambda, even though the codebase wasn't changed.

In my case I was able to narrow down the issue to the way the .zip file was generated. Specifically, on the second platform the source file was copied and its final timestamps and permissions were looking different from those found on the first platform. I managed to resolve the issue by "normalizing" the source file by resetting timestamps and permissions before the apply, using an external script. Probably worth mentioning this Python tool, which as far as I understand should do the same: https://pypi.org/project/deterministic-zip/

Just reporting my experience in case it could help someone else who finds themselves being in my same situation.

@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Feb 16, 2024
@pdmtt
Copy link

pdmtt commented Feb 23, 2024

For Python developers, I'd like to report the trouble I'm having with source_code_hash and my findings so far.

Archive files created from pip installation directories (pip install . --target directory) will always have different hashes, even when using tools such as deterministic-zip.

This is not a provider issue. It's how pip works. I haven't had the time to crack down exactly what's causing this issue and how to solve it. I suspect it's related to pip building wheels on the run when one is not present, which are also zip files.

If you'd like to reproduce my issue, I've made the following minimal project to test:

./pyproject.toml

[project]
name = "ziphashtest"
version = "0.0.0"
requires-python = ">=3.10"
dependencies = [
    "mypy==1.8.0"
]

./test.sh

set -e

cleanup () {
  rm -r install_*
  rm zip_*
}
trap cleanup EXIT

pip install -U pip --quiet
pip install deterministic_zip --quiet

for i in {0..1}; do
  pip install . --target install_"${i}" --quiet
  deterministic_zip zip_"${i}" install_"${i}"

  if [ "$i" = 0 ]; then former_md5=$(md5sum < zip_"${i}" | cut -c -32)
  else
    new_md5=$(md5sum < zip_"${i}" | cut -c -32)

    if [ ! "$former_md5" = "$new_md5" ]; then
      echo "hashes are not equal: ${former_md5} != ${new_md5}"
    fi

    former_md5=$new_md5
  fi
done

Terminal

$ /bin/bash ./execute_test.sh
Wrote zip_0
Wrote zip_1
hashes are not equal: f6fa3b5eee984ac735867df9b418bb5b != 6444fa5bc05903aea148cb4c5daeca12

@astrocox
Copy link

astrocox commented Nov 6, 2024

We're seeing this issue with 5.36.0, using the hash of a single python source file with no dependencies (so it can't possibly be a pip/packaging issue).

source_code_hash = filebase64sha256("${path.module}/lambdas/api.py")

it looks like the state is somehow just not being updated. If I run terraform apply twice, both times the NEW hash matches what we get by hashing the file in the CLI, but the hash being pulled from the state doesn't change after the first deploy.

@BillysCoolJob
Copy link

I am seeing the exact same issue with the filemd5, filesha256, and filebase64sha256 functions. The state file is just stuck with the original string value that it had when I ran my initial deploy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. service/lambda Issues and PRs that pertain to the lambda service. service/s3 Issues and PRs that pertain to the s3 service.
Projects
None yet
Development

No branches or pull requests