feat(s3)!: enable native s3 support via the rust sdk #89
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Dali only supports processing images that could be downloaded from an HTTP server without any authentication layer. This isn't very optimal when the images that need to be processed are stored in a private S3 bucket while Dali runs in an EKS cluster. The normal access strategy is using the EKS OIDC provider to assume an IAM role that is allowed to read from the bucket while being mapped to a K8S service account.
One alternative is to use a gateway endpoint, but this requires VPC whitelisting at the bucket level. This can cause dangerous side effects, such as potential undesired access of multiple services running in the same VPC like the service that needs S3 access.
Solution
The premise of this PR originally was to introduce the S3 sdk to Dali in order to take advantage of the OIDC authentication of the EKS cluster.
Unfortunately this couldn't have been achieved with the current setup of Dali as even the oldest AWS Rust SDK was not compatible with Dali's 4 years old runtime
actix-rt
. The first natural approach was to upgrade Actix (the web library behind Dali), but it proves that the new version isn't running well at all as it doesn't seem to handle memory very well due to some changes in the caching mechanism (issue and issue). In our case we reproduced a similar behavior when the pods would eventually run out of memory.Due to the previously mentioned issue, I've switched the web library from
Actix
toAxum
and the runtime totokio
fromactix-rt
.Axum
grew a lot in the Rust web ecosystem due to it's very good performance and easiness to be configured. As part of this, I've also tried to restructure the project a bit to make it more readable.The next step, is configuring the S3 SDK in an abstract way hence Dali continues to support an HTTP server as well while leaving it open closed for extension in case of the need to support other storage providers. For this I've created the ImageProvider trait.
Also, this PR conveniently introduces a local development environment that emulates both image providers for Dali:
It consists of a MinIO server running two buckets:
PS: The PR is still marked as Draft due to some GitHub Actions that need to be slightly changed, but from the code's perspective it's ready to be reviewed.