Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No server chosen by com.mongodb.reactivestreams: Failed looking up SRV record #37618

Closed
charalampos-micholias opened this issue Dec 8, 2023 · 18 comments
Labels
area/mongodb kind/bug Something isn't working
Milestone

Comments

@charalampos-micholias
Copy link

charalampos-micholias commented Dec 8, 2023

Describe the bug

Issue using Quarkus 2.15.0 version

When trying a connection attempt in a mongo shard through a Quarkus application version >= 2.15.0 below error is propagated:

_2023-11-24 10:07:53,815 INFO  [com.vod.dxl.gr.sys.uti.SRVQuery] (vert.x-eventloop-thread-0) 0 0 27017 dxl-gr-preprod-shard-00-02-nuler.mongodb.net.
2023-11-24 10:07:53,907 INFO  [com.vod.dxl.gr.sys.uti.SRVQuery] (vert.x-eventloop-thread-0) 0 0 27017 dxl-gr-preprod-shard-00-00-nuler.mongodb.net.
2023-11-24 10:07:53,908 INFO  [com.vod.dxl.gr.sys.uti.SRVQuery] (vert.x-eventloop-thread-0) 0 0 27017 dxl-gr-preprod-shard-00-01-nuler.mongodb.net.
2023-11-24 10:07:56,108 INFO  [org.mon.dri.cluster] (executor-thread-1) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$441/0x0000000050de1ef0@11222f0c from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out
2023-11-24 10:07:59,508 INFO  [org.mon.dri.cluster] (cluster-656075d114c7f278b4f504d7) No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out_

Issue using Quarkus 3.1.0 version

As described on #32991 this issue is resolved in "Quarkus 3.0.2.Final" version

Based on the aforementioned proposal, upon upgrading on Quarkus "3.1.0.Final" a new error (resulting to same mongo connection inability) takes place.

Below you may find exact logs:

_07:24:02 INFO  traceId=afd520fa41b8e64ab71320ef860625cb, parentId=, spanId=914390a5279baf32, sampled=true [or.mo.dr.cluster] (vert.x-eventloop-thread-0) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$771/0x0000000064382208@64e7bf16 from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=localhost:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]}. Waiting for 30000 ms before timing out

Dedicated Script for further logs:
In attempt to have further view on Mongo SRV attributes we have created a dedicated script - triggered each time mongo data retrieval- is attempted that provides below logs:

_07:03:36 INFO  traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [gr.vo.dx.su.se.SRVQuery] (executor-thread-1) Providing SRV attributes logs....
07:03:36 INFO  traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [gr.vo.dx.su.se.SRVQuery] (executor-thread-1) 0 0 27017 dxl-gr-preprod-shard-00-01-nuler.mongodb.net.
07:03:36 INFO  traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [gr.vo.dx.su.se.SRVQuery] (executor-thread-1) 0 0 27017 dxl-gr-preprod-shard-00-02-nuler.mongodb.net.
07:03:36 INFO  traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [gr.vo.dx.su.se.SRVQuery] (executor-thread-1) 0 0 27017 dxl-gr-preprod-shard-00-00-nuler.mongodb.net.

07:24:08 INFO  traceId=3cd5ded71f5024a926d7b084400c8967, parentId=, spanId=616220dab51c5f28, sampled=false [or.mo.dr.cluster] (executor-thread-3) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$771/0x0000000064382208@53b297d6 from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=dxl-gr-preprod-shard-00-02-nuler.mongodb.net:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=dxl-gr-preprod-shard-00-01-nuler.mongodb.net:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=dxl-gr-preprod-shard-00-00.nuler.mongodb.net:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out_

Expected behavior

Quarkus should connect to Mongo Atlas without this SRV lookup or MongoSocketOpenException error

Actual behavior

Connection failure takes place when reaching the established time thresshold
"Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]}. Waiting for 30000 ms before timing out"

How to Reproduce?

Using the free tier mongo atlas account,

  • use the above mentioned quarkus versions (2.15.0 and 3.1.0 )
  • specify the mongo atlas cluster url in application.properties.
  • Then, write a simple rest service that when invoked connects to mongo to retrieve records.

Output of uname -a or ver

MINGW64_NT-10.0-22621 E0624331 3.3.3-341.x86_64 2022-01-17 11:45 UTC x86_64 Msys

Output of java -version

java version "17.0.7" 2023-04-18 LTS

Quarkus version or git rev

2.15.0 and 3.1.0

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

Dedicated script getMongoDebugLogs() :

public class SRVQuery {

    private static final Logger log = LoggerFactory.getLogger(SRVQuery.class);
    public static void getMongoDebugLogs() {
        try {
            // Set up the environment for creating the initial context
            Hashtable<String, String> env = new Hashtable<>();
            env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.dns.DnsContextFactory");
            env.put(Context.PROVIDER_URL, "dns:");
            DirContext ctx = new InitialDirContext(env);
            // Specify the name of the domain to query
            String domainName = "_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net";
            // Perform the SRV lookup
            Attributes attrs = ctx.getAttributes(domainName, new String[]{"SRV"});
            Attribute attr = attrs.get("SRV");
            log.info("Providing SRV attributes logs....");
            if (attr != null) {
                for (int i = 0; i < attr.size(); i++) {
                    log.info((String) attr.get(i));
                }
            } else {
                log.info("No SRV records found for " + domainName);
            }
            ctx.close();
        } catch (NamingException e) {
            log.info("Handing mongo debugging exception..");
            e.printStackTrace();
            log.info(e.getMessage());
        }
    }

}
@charalampos-micholias charalampos-micholias added the kind/bug Something isn't working label Dec 8, 2023
@quarkus-bot
Copy link

quarkus-bot bot commented Dec 8, 2023

/cc @evanchooly (mongodb), @loicmathieu (mongodb)

@loicmathieu
Copy link
Contributor

Hi,
Can you try with the latest Quarkus version, it should be 3.6.1.
If it still fails, can you provide a reproducer project?

@charalampos-micholias
Copy link
Author

charalampos-micholias commented Dec 12, 2023

Hello, good afternoon,
After several further tests we have identified the exact pattern that this issue occurs. More specific issue occurs on the below combination:

  • Quarkus version>= 2.15.0 (including Quarkus 3.1.0) and
  • Mongo connection through connection string formatted as : mongodb+srv://USERNAME:PASSWORD@XXXXXXX.nuler.mongodb.net

keep as important note -as a tested workaround- that issue does not appear upon combining Quarkus 3.1.0 and connection string containing exact mongo shards formatted as:
mongodb://USERNAME:PASSWORD@XXXXX-shard-00-00.nuler.mongodb.net:27017,XXXXX-shard-00-01-nuler.mongodb.net:27017/

So after many thorough test seems that even Quarkus 3.1.0 version faces inability to establish connection when a SRV mongo connection string is provided.

Exact logs for aforementioned combination:

15:09:53 INFO  traceId=, parentId=, spanId=, sampled= [or.mo.dr.cluster] (cluster-6578779b5cf3934a85277080) No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out

15:09:56 INFO  traceId=0caf882656bffbdbd9cae0ca6fbfeb3f, parentId=, spanId=c9077c52e865326b, sampled=false [or.mo.dr.cluster] (executor-thread-2) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$790/0x00000000903f70f8@4180c2b from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out

15:09:57 INFO  traceId=, parentId=, spanId=, sampled= [or.mo.dr.cluster] (cluster-6578779b5cf3934a85277080) No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out_

_15:10:23 ERROR traceId=ad5511d672b3e0370c637d1fb918f448, parentId=, spanId=12c2d61cc5444467, sampled=true [io.qu.mu.ru.MutinyInfrastructure] (vert.x-eventloop-thread-0) Mutiny had to drop the following exception: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., servers=[]
	at com.mongodb.internal.connection.BaseCluster.createTimeoutException(BaseCluster.java:428)
	at com.mongodb.internal.connection.BaseCluster.handleServerSelectionRequest(BaseCluster.java:324)
	at com.mongodb.internal.connection.BaseCluster.access$700(BaseCluster.java:69)
	at com.mongodb.internal.connection.BaseCluster$WaitQueueHandler.run(BaseCluster.java:503)
	at java.base/java.lang.Thread.run(Unknown Source)

As upgrading further in 3.6.1 will require extended effort in a series of services, would you please update us if there are implemented enhancements in mongo connection establishment area in Quarkus 3.6.1 that could resolve the above issue and which are they in high level?

Thank you all for your assistance

@va1erian
Copy link

Hello, we are experiencing similar issues with Quarkus 3.5.3, we have these stacktraces showing every once in a while when connected to a MongoDB Atlas cluster:
Unable to look up SRV record for host _mongodb._tcp.option-pl-0.91p5n.mongodb.net
we are using the reactive client. Sorry I don't have th full stacktrace, can't find a way to export it cleanly for datadog.

@roiar
Copy link

roiar commented Jan 4, 2024

hello all!
@loicmathieu what kind of corrections are available on 3.6.1?

@charalampos-micholias
Copy link
Author

Hello @loicmathieu, we would like to ask if there is an ongoing investigation on the above issue.
Thank you,

@danielspeixoto
Copy link

Facing the same problem here

@loicmathieu
Copy link
Contributor

@va1erian @roiar @charalampos-micholias in JVM mode we use the default MongoDB DNS resolver, nothing specific to Quarkus.
However, in native mode there is an experimental DNS resolver as the MongoDB DNS resolver uses JNI which is not supported in native, this is explained here: https://quarkus.io/guides/mongodb#using-mongosrv-urls

Did you deploy to native and use the experimental DNS resolver?

@va1erian
Copy link

va1erian commented Jun 13, 2024

@loicmathieu we deploy a regular JVM without touching the dns resolver configuration. However note that this issue disappeared at one point when we upgraded Quarkus, I can't recall the exact version but it has been a couple of month already. We are currently running Quarkus 3.10

@Tomatenjoghurt
Copy link

Tomatenjoghurt commented Jun 26, 2024

We're facing exactly the same issue running Quarkus 3.10.
Don't know if this is relevant, but we're using JVM Images for Java 17, so, if I read the documentation correctly, it should be this base image:
registry.access.redhat.com/ubi8/openjdk-17-runtime:1.19

Relevant Log Messages:

Exception while resolving SRV records
No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.some.name.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out

If we remove the srv Record from the config and add all the single database entries, it is working.

@geoand
Copy link
Contributor

geoand commented Sep 11, 2024

See #31971 (comment) for a possible workaround

@geoand
Copy link
Contributor

geoand commented Sep 13, 2024

We believe that #43222 fixes this, so I will close it for now, but if the issue persists with 3.15 (which will contain the fix), we can reopen

@geoand geoand closed this as completed Sep 13, 2024
@geoand geoand added this to the 3.15.0 milestone Sep 13, 2024
@gsmet gsmet modified the milestones: 3.15.0, 3.14.4 Sep 13, 2024
@Gabriel-Longo-at-DB1
Copy link

Gabriel-Longo-at-DB1 commented Sep 16, 2024

Hello @geoand , we are still facing the issue with Quarkus 3.14.4 (native), the only workaround we were able to apply with major effects, was ditching mongo+srv:// in favor of simple mongo:// connection string. (ref. we used: https://stackoverflow.com/questions/78713411/com-mongodb-mongoconfigurationexception-failed-looking-up-txt-record-for-host)

We inspected the solution in #43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.

We contacted Mongo Atlas support to little help, except an upsell that we bought, and the problem persisted.

Additional info: We have at least 4 Quarkus microservices running 3.8.6 (LTS) (Native) and 3.14 that are not affected by the issue.

@geoand
Copy link
Contributor

geoand commented Sep 17, 2024

We inspected the solution in #43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.

Can you elaborate on this a little more please?

@Gabriel-Longo-at-DB1
Copy link

Gabriel-Longo-at-DB1 commented Sep 17, 2024

We inspected the solution in #43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.

Can you elaborate on this a little more please?

Thanks for the reply, I'll try to elaborate a bit more:

Application will emit the following exception for a couple o seconds, then health-check will fail and app restarted, then we will randomly face the same problem for a few minutes, until connection is successful, then the cycle repeats, so in this case it seems the retry (part of the solution of #43222 ) doesn't work because the srv resolution will persistently fail for minutes at times.

Mutiny had to drop the following exception [Error Occurred After Shutdown]: com.mongodb.MongoTimeoutException: Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.<our-mongo-sub-domain>.mongodb.net'., servers=[]
	at com.mongodb.internal.connection.BaseCluster.createAndLogTimeoutException(BaseCluster.java:392)
	at com.mongodb.internal.connection.BaseCluster.selectServer(BaseCluster.java:148)
	at com.mongodb.internal.connection.AbstractMultiServerCluster.selectServer(AbstractMultiServerCluster.java:54)
	at com.mongodb.internal.binding.ClusterBinding.getReadConnectionSource(ClusterBinding.java:108)
	at com.mongodb.client.internal.ClientSessionBinding.getConnectionSource(ClientSessionBinding.java:128)
	at com.mongodb.client.internal.ClientSessionBinding.getReadConnectionSource(ClientSessionBinding.java:92)
	at com.mongodb.internal.operation.SyncOperationHelper.withSuppliedResource(SyncOperationHelper.java:141)
	at com.mongodb.internal.operation.SyncOperationHelper.withSourceAndConnection(SyncOperationHelper.java:122)
	at com.mongodb.internal.operation.SyncOperationHelper.lambda$executeRetryableRead$4(SyncOperationHelper.java:186)
	at com.mongodb.internal.operation.SyncOperationHelper.lambda$decorateReadWithRetries$12(SyncOperationHelper.java:289)
	at com.mongodb.internal.async.function.RetryingSyncSupplier.get(RetryingSyncSupplier.java:67)
	at com.mongodb.internal.operation.SyncOperationHelper.executeRetryableRead(SyncOperationHelper.java:191)
	at com.mongodb.internal.operation.SyncOperationHelper.executeRetryableRead(SyncOperationHelper.java:173)
	at com.mongodb.internal.operation.CommandReadOperation.execute(CommandReadOperation.java:48)
	at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:153)
	at com.mongodb.client.internal.MongoDatabaseImpl.executeCommand(MongoDatabaseImpl.java:196)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:165)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:160)
	at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:150)
	at io.quarkus.mongodb.health.MongoHealthCheck$MongoClientCheck$1.get(MongoHealthCheck.java:185)
	at io.quarkus.mongodb.health.MongoHealthCheck$MongoClientCheck$1.get(MongoHealthCheck.java:182)
	at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
	at io.smallrye.mutiny.operators.uni.builders.UniCreateFromItemSupplier.subscribe(UniCreateFromItemSupplier.java:28)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:635)
	at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1495)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base@17.0.12/java.lang.Thread.run(Thread.java:840)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:885)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:860)

this shorter stack trace is also logged:

Mutiny had to drop the following exception: com.mongodb.MongoTimeoutException: Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.<our-mongo-sub-domain>.mongodb.net'., servers=[]
	at com.mongodb.internal.connection.BaseCluster.createAndLogTimeoutException(BaseCluster.java:392)
	at com.mongodb.internal.connection.BaseCluster.handleServerSelectionRequest(BaseCluster.java:293)
	at com.mongodb.internal.connection.BaseCluster.access$800(BaseCluster.java:85)
	at com.mongodb.internal.connection.BaseCluster$WaitQueueHandler.run(BaseCluster.java:475)
	at java.base@17.0.12/java.lang.Thread.run(Thread.java:840)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:885)
	at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:860)

We are also trying find if there are any problems with DNS resolution in our infrastructure, but so far, nothing.

Edit: seems to be related to #32313

@geoand
Copy link
Contributor

geoand commented Sep 17, 2024

I wonder if quarkus.mongodb.dns.lookup-timeout will help in your case.

@Gabriel-Longo-at-DB1
Copy link

I wonder if quarkus.mongodb.dns.lookup-timeout will help in your case.

Great, we will try-it out and report the results, thank you very much!

@geoand
Copy link
Contributor

geoand commented Sep 17, 2024

🙏🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mongodb kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants