Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliable connection configuration for Azure functions consumption plan #357

Open
casper-79 opened this issue Sep 1, 2020 · 1 comment
Open

Comments

@casper-79
Copy link

casper-79 commented Sep 1, 2020

Describe the bug
We are building a serverless execution pipeline for analytics jobs on Azure functions consumption plan. The system stores and updates the state of its jobs in cosmosDB, which means we need strong consistency and synchronous writes. The job workload is a scheduled spike of jobs every hour, which we want to handle as quickly as possible. This implies lots of short lived consumption plan instances (100+) are spinning up at the same time and we found that the default cosmosDB client settings are not suitable for this scenario.

We have done load testing experiments with the Java CosmosDB SDK 4.2.0 with different client configurations (client is cached). Based on these experiments we have drawn the following conclusions:

  • Direct connection is the way to go, since it can be tuned to deliver much better performance than gateway. Significant tuning is required to make it reliable, however. If we simply use the default configuration for direct connection we are flooded with thousands of the “connection closed” DB exception seen below.
  • Limiting maxConnectionsPerEndpoint to 1 reduces the number of DB exceptions very significantly and does not appear to have any negative impact on performance
  • Setting a low value for maxRequestsPerConnection lowers the latency of synchronous writes very significantly (At least an order of magnitude for the 95th percentile compared to default). If it goes too low, however, we start to see an increase in the number of DB exceptions. The optimal parameter appears to be a trade off between performance and the number of DB exceptions thrown. We have found 10 be a good compromise.
  • Performance varies significantly with different versions of the SDK. In our experiments 4.3.1 provides less than half the query throughput of 4.2.0 using the connection configuration seen below.
DirectConnectionConfig directConnectionConfig = DirectConnectionConfig.getDefaultConfig();
directConnectionConfig.setMaxConnectionsPerEndpoint(1);
directConnectionConfig.setMaxRequestsPerConnection(10);
 
return new CosmosClientBuilder()
        .endpoint(environment.endpoint)
        .key(environment.primaryKey)
        .consistencyLevel(ConsistencyLevel.STRONG)
        .directMode(directConnectionConfig)
        .endpointDiscoveryEnabled(false)
        .buildClient();

We have not been able to find any guidelines on configuring the cosmosDB client for this truly serverless use case, so we would very much appreciate your comments on our findings. If we could somehow completely avoid the DB exception found below, that would be great…

To Reproduce
Create Java function app on consumption plan. Deploy http triggered function which writes and modifies small dummy objects in cosmosDB (less than 1 kb). Use SDK 4.2.0, default direct connection configuration, strong consistency, synchronous client and make sure to cache the client so it is only created once. Set cosmosDB RU to 10.000 in order to eliminate RU's as a bottleneck. Write a test that puts some load on the function (eg. 100 requests pr second). Observe how application insights is flooded with exceptions of the type seen below.

Exception while executing function: Functions.JobPersister Result: Failure Exception: IllegalStateException: RntbdServiceEndpoint({"id":3,"isClosed":true,"concurrentRequests":0,"remoteAddress":"cdb-ms-prod-westeurope1-fd20.documents.azure.com:14318","channelPool":{"remoteAddress":"cdb-ms-prod-westeurope1-fd20.documents.azure.com:14318","isClosed":false,"configuration":{"maxChannels":130,"maxRequestsPerChannel":30,"idleConnectionTimeout":0,"readDelayLimit":65000000000,"writeDelayLimit":10000000000},"state":{"channelsAcquired":0,"channelsAvailable":0,"requestQueueLength":0}}}) is closed Stack: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.microsoft.azure.functions.worker.broker.JavaMethodInvokeInfo.invoke(JavaMethodInvokeInfo.java:22) at com.microsoft.azure.functions.worker.broker.JavaMethodExecutorImpl.execute(JavaMethodExecutorImpl.java:54)

Expected behavior
It should be documented how to use cosmosDB with Azure functions consumption plan in a way that does not produce client exceptions.

Actual behavior
We found that the default client configuration does not work reliably with Azure functions consumption plan and were unable to find documentation covering this usecase. We have found a client configuration which works reasonably well, but there are still a large number of exceptions polluting our logs. We would appreciate guidance on how to proceed or documentation covering our use case.

@casper-79
Copy link
Author

It seems the active repo for the cosmosDB client has been moved to https://github.com/Azure/azure-sdk-for-java.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants