-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
secret store on lambda produces a variety of surprising errors but no valid responses #905
Comments
Little bonus, I did the add a VPC endpoint, with a proper routable name, secretsmanager.us-east-1.amazonaws.com and I still get this:
I will try hitting that with reqwest, which I imagine will work. |
Tried hitting it with a get from reqwest. Also timed out. I'm not doing any rocket surgery here, they're just on the default VPC for ease of testing. Looking at the Go and Python libraries there's almost no code around handling timeouts, but a lot of code for various other failure modes. I'm assuming I'm hitting timeouts on something obvious, wrt to routing, etc, but that still doesn't explain why a call to localhost:2773 also times out, unless there's a typo my coworkers and I can't see in the path or something? Looking at this now: https://repost.aws/knowledge-center/lambda-function-retry-timeout-sdk The rds setup creates this secret, which I thought was a really cool feature in comparison to GCP's "screw you" approach to database secrets, but if I can't access it from the SDK I'm going to end up turning off rotation and just passing it in as an ENV var (which for sure works, since it was the first thing I tried, prior to taking an eternity getting the same error, writing up said error, and still continuing to get said errors) |
I'm sorry I can't be more helpful here since I'm not so familiar with this Lambda feature. However, I can say for certain that we don't have any built-in support for accessing these secrets on localhost in Lambda in the SDK. It sounds to me like there's a configuration issue happening somewhere. It's almost as if nothing is actually listening on localhost:2773, and hence, it never connects. |
Yea I tried using TcpStream to connect to 2773, and that worked, but whatever it is expecting isn't what the rust library is sending. Now I'm just piecing together how much (header) auth is needed to interact with the secretsmanager via https via VPC endpoint. This one was a real shocker since everything up to this point with the SDK had 'just worked' including the SAM watch feature and cargo lambda. Thanks for reading, this is the closest I've been to smashing a computer in 5+ years so I haven't exactly scoped this bug down to its bare minimum. |
Since you're using reqwest, you can enable trace level logging for the hyper crate to get the exact thing it sends over the socket. That might be helpful in diagnosing the issue. |
tracing helped (setting it in RUST_LOG didn't work, but it did when I set the max level for the subscriber) ... this gets interesting, because it shows, yep it can do DNS, yep it can do the right POST stuff... but somehow that takes over 3 seconds. Perhaps timing out is a proper failure mode for the secrets manager when something else is wrong?
the |
More searching on that "checkout dropped for" phrase and I found this rather unhappy many year thread: hyperium/hyper#2312 |
After a lot of poking, I finally figured out where to change the endpoint for this library (to localhost) and now I think I have some sort of answer:
Gave it 6 retries over six seconds and the poor thing still replied it wasn't ready... Not sure what's happening there, the layer config is a single click add practically, and it is definitely hosting something on localhost:2773 |
I'm also having an issue accessing secrets in lambda functions using the SDK when I run |
Tried many things, came to this one which is the secrets manager hitting localhost:
bad token in the session token or header key?! weird. The code that generates this (I went back to hello world and stuffed the secret gathering into the function_handler):
The print on the config doesn't seem to show any working credential sources. |
Here is a debug print of the SdkConfig
|
One final note before the weekend, where I started was in essence this lambda example examples/examples/lambda/src/main.rs w/ secretsmanager instead of s3. It could be the entire issue is roles/permissions but no error message seems to reflect that. |
If you're running on lambda, I'd expect credentials to come from the environment. If you want to check to see if you have credentials, you can call the credential provider directly: let shared_config = aws_config::from_env()
.endpoint_url("http://127.0.0.1:2773/secretsmanager")
.load()
.await;
let creds = shared_config.credentials_provider().unwrap().provide_credentials().await; // probably need to import the trait
println!("{:?}", creds); // will be redacted |
Your message #905 (comment) shows a log line where it is in fact setting the token properly (see |
Thanks so much for these responses. I think I may have run into a dreaded two path -> two different issues situation. I haven't resolved it, but I wouldn't be surprised if
For now I'm going to punt on this, but I think path 2 is fixable by me via config (ie starting from cloudformation/SAM scratch), and path 1 might be fixable if I knew more about this sdk code. |
My nightmare is over. I can point to exactly where I went wrong and what the fix was. The fix was this comment by Ribeiro: https://stackoverflow.com/questions/62274069/aws-lambda-access-secrets-manager-from-within-vpc#comment121571222_62274140 My mistake was from this post from amazon: https://aws.amazon.com/blogs/security/how-to-connect-to-aws-secrets-manager-service-within-a-virtual-private-cloud/ < Step 6 is ... not enough if you don't have a 'wide open' security group and I think it blocked all of this from working regardless of what the code was doing by putting in any old security group for "the security group [...] created earlier". Still not sure if this was also preventing the Layer from communicating to the secrets manager (I assume it was) but otherwise if you don't have your groups set up neatly like that comment on stack overflow (allow 443 into the secrets vpc endpoint from the lambda subnets with one SG on the secrets endpoint and allow the lambda to speak outbound on 443 to the secrets manager SG you just created or the subnet the secrets manager is on) Sorry for my frustration, It was rather simple underneath the errors. Timeouts are disallowed routes or nonexistent routes, nothing more or less, sadly there's nothing screaming at you saying "don't take a 5 year old blog post at its word". Whoops. Thanks for all your help. Once I get this going I may experiment with the layer vs vpc endpoint setup. |
|
Glad to hear you figured it out! |
Describe the bug
I have the params+secrets layer attached v11. I have been repeatedly trying as is my bad habit, and so far neither of 2 methods have worked to retrieve the secret. I have followed the secretmanager example which "works on my machine" but on lambda I get the following surprising errors (w/ debug and full tracing on for smithy, afaict):
1st method: SDK:
As you can see nothing happens, the
debug!
trace from inside of the lazy_caching doesn't seem to fire off either, but no credentials are returned.2nd method:
http://localhost:2773/secretsmanager/get...
This one is more surprising to me because I haven't seen anyone getting this error.
At the end of my rope as I got in a loop of "just one more thing" over and over.
Expected Behavior
I wanted to get a working response that I could decode into JSON, instead I get timeouts or the BadRequest error
^ this code (with my aws creds local) works fine.
Current Behavior
BadRequestException 400 with plain http to localhost 2773.
[edit] SDK secretmanager errors like this (I had to up the exec time to get this):
Err(DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 3.1s }), connection: Unknown } }))
Reproduction Steps
Possible Solution
Permissions? Bad urls in the docs? Broken SDK function, that works on my (ARM) machine but not on ARM lambdas? Some terrifying
feature
issue?It is a truly next level problem where I have to write an issue. I'm praying this is just that I didn't allow my lambda to read a secret in some way and it's just a documentation failure, otherwise I honestly can't understand how this code could fail so opaquely. Thanks for reading at least.
Additional Information/Context
I went through hoping this would just work, but somewhere around hour 3 I realized wait a second, this really should just work or at least throw an error that someone can search for. This is on arm64, which I hope isn't the issue. I'm hoping this is a permissions issue that isn't mentioned in the documentation, but I know that (a) I have an AWS_SESSION_TOKEN (b) that 2773 on localhost is open to the running function (c) that I've tried 3% of the possible ways of getting the json to return (d) I have not tried POST-ing (d) reaching out to the externally hosted secrets managers doesn't seem to work either, but the exact URLs for individual requests are not given in the documentation at this time.
Version
Environment details (OS name and version, etc.)
arm64
Logs
The code that made this log is trying both the plain http request and the SDK.
The text was updated successfully, but these errors were encountered: