You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gist of the problem is an application that hangs because aws-api is not able to fetch metadata (based on examining thread stacks).
This is happening after an upgrade from com.cognitect.aws/api {:mvn/version "0.8.692"} to com.cognitect.aws/api {:mvn/version "0.8.730-beta01"}.
The problem only manifests when the code runs on an EC2 instance, where it uses the default credentials provider.
running it on my laptop (using profile credentials provider) works fine.
At least one other person mentioned seeing a similar problem.
Reproducer
(require '[cognitect.aws.client.api :as aws])
(require '[cognitect.aws.credentials :as credentials])
;; running this on an ec2 instance hangs
(def my-client (aws/client {:api :license-manager :credentials-provider (credentials/default-credentials-provider http-client)))
;; - using (credentials/profile-credentials-provider aws-profile) on my laptop works fine
(aws/invoke license-manager-api {:op :ListReceivedLicenses})
;; ... never completes ...
Stacktraces
When I run the above code in socket repl, it gets stuck
"Clojure Connection repl 1" #52 daemon prio=5 os_prio=0 cpu=272.22ms elapsed=1398.25s tid=0x0000ffff4c00e520 nid=0x165 waiting on condition [0x0000ffff23ffc000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.14/Native Method)
- parking to wait for <0x00000000c8d8e290> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.14/LockSupport.java:211)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.14/AbstractQueuedSynchronizer.java:715)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.14/AbstractQueuedSynchronizer.java:1047)
at java.util.concurrent.CountDownLatch.await(java.base@17.0.14/CountDownLatch.java:230)
at clojure.core$promise$reify__8621.deref(core.clj:7257)
at clojure.core$deref.invokeStatic(core.clj:2337)
at clojure.core$deref.invoke(core.clj:2323)
at clojure.core.async$fn__43145.invokeStatic(async.clj:138)
at clojure.core.async$fn__43145.invoke(async.clj:127)
at cognitect.aws.client.impl.Client._invoke(impl.clj:123)
at cognitect.aws.client.api$invoke.invokeStatic(api.clj:131)
at cognitect.aws.client.api$invoke.invoke(api.clj:112)
...
After careful examination, I found another thread suggesting a problem with metadata
"async-thread-macro-1" #58 daemon prio=5 os_prio=0 cpu=1.90ms elapsed=1013.14s tid=0x0000ffff600582f0 nid=0x169 waiting on condition [0x0000ffff19dfd000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.14/Native Method)
- parking to wait for <0x00000000d0ec73a8> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.14/LockSupport.java:211)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.14/AbstractQueuedSynchronizer.java:715)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.14/AbstractQueuedSynchronizer.java:1047)
at java.util.concurrent.CountDownLatch.await(java.base@17.0.14/CountDownLatch.java:230)
at clojure.core$promise$reify__8621.deref(core.clj:7257)
at clojure.core$deref.invokeStatic(core.clj:2337)
at clojure.core$deref.invoke(core.clj:2323)
at clojure.core.async$fn__43145.invokeStatic(async.clj:138)
at clojure.core.async$fn__43145.invoke(async.clj:127)
at cognitect.aws.ec2_metadata_utils$get_response_data.invokeStatic(ec2_metadata_utils.clj:63)
at cognitect.aws.ec2_metadata_utils$get_response_data.invoke(ec2_metadata_utils.clj:62)
at cognitect.aws.ec2_metadata_utils$IMDSv2_token.invokeStatic(ec2_metadata_utils.clj:157)
at cognitect.aws.ec2_metadata_utils$IMDSv2_token.invoke(ec2_metadata_utils.clj:148)
at cognitect.aws.region$instance_region_IMDS_v2_provider$reify__49949.fetch(region.clj:112)
at cognitect.aws.region$fn__49916$G__49912__49918.invoke(region.clj:24)
at cognitect.aws.region$fn__49916$G__49911__49921.invoke(region.clj:24)
at clojure.core$some.invokeStatic(core.clj:2718)
at clojure.core$some.invoke(core.clj:2709)
at cognitect.aws.region$chain_region_provider$reify__49927.fetch(region.clj:37)
at cognitect.aws.region$fn__49916$G__49912__49918.invoke(region.clj:24)
at cognitect.aws.region$fn__49916$G__49911__49921.invoke(region.clj:24)
at cognitect.aws.util$fetch_async$fn__49627$fn__49628.invoke(util.clj:297)
- locked <0x00000000d20cf868> (a cognitect.aws.region$chain_region_provider$reify__49927)
at cognitect.aws.util$fetch_async$fn__49627.invoke(util.clj:296)
at clojure.core.async$thread_call$fn__43264.invoke(async.clj:487)
at clojure.lang.AFn.run(AFn.java:22)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.14/ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.14/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java.base@17.0.14/Thread.java:840)
Workaround
Downgrading to the prior version of aws-api solves the problem.
The text was updated successfully, but these errors were encountered:
Is there anything special about your setup? I see in the snipped you shared you create the credentials provider with an explicit http-client. How is your http client created?
@jumarko Also, any chance there is some unhandled exception being printed to stderr?
Looking at the thread dump stack trace it seems that nothing is ever delivered to the promise-chan used to fetch instance metadata, which may indicate some core.async block is throwing an exception that is not being handled (and this would print to stderr by default, assuming no UncaughtExceptionHandler is set).
I reported this problem on Slack: https://clojurians.slack.com/archives/C09N0H1RB/p1739347946253059
The gist of the problem is an application that hangs because aws-api is not able to fetch metadata (based on examining thread stacks).
This is happening after an upgrade from
com.cognitect.aws/api {:mvn/version "0.8.692"}
tocom.cognitect.aws/api {:mvn/version "0.8.730-beta01"}
.The problem only manifests when the code runs on an EC2 instance, where it uses the default credentials provider.
At least one other person mentioned seeing a similar problem.
Reproducer
Stacktraces
When I run the above code in socket repl, it gets stuck
After careful examination, I found another thread suggesting a problem with metadata
Workaround
Downgrading to the prior version of aws-api solves the problem.
The text was updated successfully, but these errors were encountered: