Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk operations delay #1277

Closed
vhe1 opened this issue Mar 13, 2020 · 3 comments
Closed

Bulk operations delay #1277

vhe1 opened this issue Mar 13, 2020 · 3 comments

Comments

@vhe1
Copy link

vhe1 commented Mar 13, 2020

Hi,
regarding the feedback wanted in https://devblogs.microsoft.com/cosmosdb/introducing-bulk-support-in-the-net-sdk/ , especially the bit where the containe waits for 1s to post all he data (if the batches aren't full):

Why not handing the container the tasks? Or the documents, or some kind of batch? That way the container knows precisely how many operations to do and doesn't need to wait.

My current scenario is migrating documents from one cosmos db container to another, using an azure function hanging on the change feed. There I get batches of max 100 documents and want to send them on as fast as possible, possible using a maximum rate.

So, the batches are small and the function is called often. Here, the delay is less than optimal.

Here's a sample code illustrating my use case:

[FunctionName("Function1")]
public static async Task Run([CosmosDBTrigger(
        SourceDb,
        SourceContainer,
        ConnectionStringSetting = "sourceTrigger_ConnectionString",
        LeaseCollectionName = "leases")]
        IReadOnlyList<Document> input,
        ILogger log)
{
    if (input != null && input.Count > 0)
    {
        log.LogInformation("Documents received " + input.Count);

        var container = CosmosClient.GetContainer(TargetDb, TargetContainer);

        var response = await Task.WhenAll(input.Select(doc => container.UpsertItemAsync(doc, new PartitionKey(doc.Id))));

        var errors = response.Where(i => (int) i.StatusCode >= 300).ToList();
        if (errors.Any())
        {
            var message = $"{errors.Count} errors: {string.Join(", ", errors.Select(i => i.StatusCode).Distinct())}";
            throw new Exception($"Aborting due to {message}");
        }

        log.LogInformation("Documents sent.");
    }
}

Nevertheless, brilliant job. :-)

@j82w
Copy link
Contributor

j82w commented Mar 13, 2020

Have you looked at the Bulk congestion control?

There is TransactionalBatch support if you want more control.

@ealsur any suggestions?

@ealsur
Copy link
Member

ealsur commented Mar 13, 2020

The reasons for not having an explicit API to define a Bulk is because the current approach allows for multiple concurrent threads/process across the span of the entire Application to benefit.

If an Application has multiple workers or threads sending operations concurrently, all these concurrent operations are grouped and processed regardless of their origin and the context of the caller.

Bulk support was added but the scenario is the same as the old Bulk Executor Library, it is meant to be used with large amount of documents, not small sets. For small sets (<=100) the performance benefit over just doing them as point operations (Bulk disabled) is just not big enough. The benefit starts to grow as more operations are involved.

The only improvement I can see in your case is increase the MaxItemsPerInvocation on the CosmosDBTrigger (reference)to a large number (10000 or more). If the flow of changes that your container is receiving is big, they will be treated as a single invocation (instead of multiple smaller ones). Obviously this will work if your account is indeed receiving an ingestion flow that is bigger, if you are getting 50 events per second, using MaxItemsPerInvocation of 10000 won't do any difference, since the trigger will only get 50 events.

Also consider caching the Container instance if it's going to be the same always.

@ealsur
Copy link
Member

ealsur commented Mar 16, 2020

Please reopen if you have further results after changing the configuration on the Trigger

@ealsur ealsur closed this as completed Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants