[WIP] Introduce S3-native state locking #35661

bschaatsbergen · 2024-09-01T19:38:23Z

This draft PR prototypes state locking for Terraform’s s3 backend using a .tflock file. It uses Amazon S3’s recently introduced conditional writes feature to implement a locking mechanism. When a lock is acquired, other Terraform clients attempting to lock the same Terraform state file will encounter an error until the lock is released.

Context

The internal/states/statemgr package provides interfaces and functionality for state managers, including the s3 backend. These state managers are responsible for writing and retrieving state from persistent storage.

This PR focuses on the methods implemented by s3: Lock and Unlock.

Currently, the s3 backend implements state locking using Amazon DynamoDB, by writing a LockID and a digest. Terraform uses these values in the Lock and Unlock methods in the s3 backend to manage state locking and unlocking. While DynamoDB has long been used for state locking, leveraging Amazon S3’s newly released conditional writes feature offers an S3-native approach. By using S3 directly, we eliminate the need for an additional AWS component, simplifying the backend architecture.

Implementation

A .tflock file, containing lock information, is used to represent the lock on the state file. This file contains lock information, including a unique lock ID and other metadata.

Acquiring a Lock
To acquire a lock, a .tflock file is uploaded to an S3 bucket to establish a lock on the state file. If the lock file does not already exist, the upload succeeds, thereby acquiring the lock. If the file already exists, the upload fails due to a conditional write, indicating that the lock is already held by another Terraform client. In addition, we set the LockID to the ETag of the .tflock file, this value is then checked when releasing the lock.

Releasing a Lock
To release a lock, the corresponding lock file is deleted from the S3 bucket. This action removes the file, thereby releasing the lock and making it available for other Terraform clients.

To maintain lock integrity, we use the ETag (Entity Tag) of the .tflock file as the LockID when acquiring the lock. Each S3 object has an ETag, which is a hash of its content. Before removing the lock, we compare the ETag of the lock file retrieved from S3 to the provided LockID. This ensures the lock is only removed if the ETag matches, confirming we are operating on the correct lock file. The lock file also includes metadata such as LockID and the Terraform state path, further ensuring that we are working with the intended state and lock file.

Conditional Writes

Conditional writes in Amazon S3 allows clients to perform write operations only if certain conditions about the existing object are met. Specifically:

Successful Write: If no object exists with the same key name, the write operation succeeds, resulting in a 200 response.
Failed Write: If an object with the same key name already exists, the write operation fails with a 412 Precondition Failed response.

For buckets with versioning enabled, S3 checks for the presence of a current object version. If no current version exists or if the version is a delete marker, the write operation succeeds; otherwise, it fails.

In scenarios where multiple Terraform processes attempt to acquire a lock (i.e., multiple concurrent conditional writes to the same S3 object), only the first process to successfully acquire the lock will succeed. Subsequent processes will receive a 412 Precondition Failed response, indicating that the lock is already held by another Terraform process.

According to Amazon S3 documentation, a 409 Conflict response may occur if a delete request for the object completes before a conditional write operation finishes. In such cases, uploads might need to be retried. However, given that delete operations should generally occur after a lock is successfully acquired, encountering this situation in the context of state locking for Terraform is unlikely (?)

This locking mechanism ensures that only one Terraform process can hold the lock at any given time, preventing concurrent modifications and ensuring consistent state management using the s3 backend and Amazon S3's conditional writes.

An opt-in feature

DynamoDB has long been the standard for Terraform state locking in the s3 backend, and it's fair to assume that many users depend on this mechanism. To provide a practitioner-friendly way to introduce the S3-native state locking via a lock file, this draft PR deprecates DynamoDB-related attributes in the s3 backend. It introduces a new lockfile boolean attribute to enable the S3-native locking mechanism.

Switching to the new S3-native locking mechanism

To be worked out.

Draft

I’m still working on this. The foundations of S3-native state locking are already in place. This is still a prototype, and I'm testing it to make sure it fits well with the current state locking experience in the S3 backend. Your feedback, comments, and suggestions are welcome and appreciated.

internal/backend/remote-state/s3/client.go

bschaatsbergen changed the title ~~feat: add lock file support for S3 backend~~ [WIP] Add lock file support for S3 backend Sep 1, 2024

bschaatsbergen changed the title ~~[WIP] Add lock file support for S3 backend~~ [WIP] Introduce S3-native state locking Sep 1, 2024

bschaatsbergen mentioned this pull request Sep 1, 2024

S3 remote backend without DynamoDB #35625

Open

bschaatsbergen commented Sep 3, 2024

View reviewed changes

internal/backend/remote-state/s3/client.go Outdated Show resolved Hide resolved

crw added enhancement backend/s3 labels Sep 3, 2024

jar-b reviewed Sep 3, 2024

View reviewed changes

internal/backend/remote-state/s3/client.go Outdated Show resolved Hide resolved

bschaatsbergen force-pushed the use-s3-conditional-writes branch from 5c183b9 to a657e41 Compare September 7, 2024 21:25

bschaatsbergen marked this pull request as ready for review September 7, 2024 21:50

bschaatsbergen requested a review from a team as a code owner September 7, 2024 21:50

feat: add lock file support for S3 backend

8be88f7

bschaatsbergen force-pushed the use-s3-conditional-writes branch from 8d21c63 to 8be88f7 Compare September 9, 2024 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Introduce S3-native state locking #35661

[WIP] Introduce S3-native state locking #35661

bschaatsbergen commented Sep 1, 2024 •

edited

Loading

[WIP] Introduce S3-native state locking #35661

Are you sure you want to change the base?

[WIP] Introduce S3-native state locking #35661

Conversation

bschaatsbergen commented Sep 1, 2024 • edited Loading

Context

Implementation

Conditional Writes

An opt-in feature

Switching to the new S3-native locking mechanism

Draft

bschaatsbergen commented Sep 1, 2024 •

edited

Loading