Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Azure.Data.Table memory consumption is 120% higher than Microsoft.Azure.Cosmos.Table #19733

Closed
HansOlavS opened this issue Mar 23, 2021 · 6 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Tables

Comments

@HansOlavS
Copy link

Describe the bug
When I try to stream through all entities in a table per partition concurrently it seems that garbage collection isn't working. After fetching about 400k (just streaming through without holding on to any entities) the memory consumption gets to 1 GB.

Expected behavior
Streaming through entities via await foreach (var page in pages) should only hold objects associated with the current page and previous pages should be garbage collected efficiently.

Actual behavior (include Exception or Stack Trace)
Memory consumption is around 120% higher when using Azure.Data.Table SDK than the legacy Microsoft.Azure.Cosmos.Table SDK.

To Reproduce

// Creating temp table
var tableClient = new TableClient(url, "tempdata", new TableSharedKeyCredential(accountName, accountKey));
await tableClient.CreateAsync();
var partitions = Enumerable.Range(0, 100).ToList();

// Creating entities 100k entities
Console.WriteLine("Creating entities 100k entities...");

var creationTasks = partitions.Select(async partitionKey =>
{
	for (var i = 0; i < 10; i++)
	{
		var batch = tableClient.CreateTransactionalBatch("" + partitionKey);
		var entities = Enumerable.Repeat("", 100).Select(x => new TableEntity("" + partitionKey, "" + Guid.NewGuid())).ToList();

		foreach (var entity in entities)
		{
			entity["Line1"] = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
			entity["Line2"] = "Donec tellus massa, finibus ac consequat sit amet, eleifend eget tellus.";
			entity["Line3"] = "Curabitur aliquet molestie rhoncus. Vestibulum blandit diam et bibendum interdum.";
			entity["Line4"] = "Ut auctor, diam non mollis faucibus, velit elit fringilla metus, vel tincidunt risus felis eget sapien.";
			entity["Line5"] = "Ut lacus leo, bibendum nec ullamcorper sed, tristique at erat.";
			batch.UpsertEntity(entity);
		}

		await batch.SubmitBatchAsync();
	}
}).ToList();

await Task.WhenAll(creationTasks);

// Force GC and wait 2 secs
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
await Task.Delay(2000);

Console.WriteLine("Streaming through 100k entities...");
var fetched = 0;

var streamTasks = partitions.Select(async x =>
{
	var iterations = 0;
	var result = tableClient.QueryAsync<TableEntity>($"PartitionKey eq '{x}'", 200);
	var pages = result.AsPages(null, 200);

	await foreach (var page in pages)
	{
		iterations++;
		var val = Interlocked.Add(ref fetched, page.Values.Count);
		lock (tableClient)
		{
			Console.CursorLeft = 0;
			Console.Write($"Fetched {val:N0} entities...");
		}

		// GC.Collect(); // If I leave out this line the process uses over 300 MB RAM
	}
}).ToList();

await Task.WhenAll(streamTasks);
Console.WriteLine("");

// Delete temp table
await tableClient.DeleteAsync();
tableClient = null;

Console.WriteLine("Done.");
Console.ReadKey();

Environment:

  • Azure.Data.Table.3.0.0-beta.5
  • .NET SDK (reflecting any global.json):
    Version: 5.0.201
    Commit: a09bd5c86c
    Runtime Environment:
    OS Name: Windows
    OS Version: 10.0.19042
    OS Platform: Windows
    RID: win10-x64
  • IDE and version : Visual Studio 16.9.1
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Mar 23, 2021
@HansOlavS
Copy link
Author

If you want to test the code and swap out with the legacy SDK, here's the code I tested with:

using Microsoft.Azure.Cosmos.Table;

var cloudStorageAccount = CloudStorageAccount.Parse("your-connectionstring-here");
var cloudTableClient = cloudStorageAccount.CreateCloudTableClient();
var cloudTable = cloudTableClient.GetTableReference("tempdata");

streamTasks = partitions.Select(async x =>
{
	var tableQuery = new TableQuery<DynamicTableEntity> { FilterString = $"PartitionKey eq '{x}'", TakeCount = 200 };
	var continuationToken = default(TableContinuationToken);

	while (true)
	{
		var result = await cloudTable.ExecuteQuerySegmentedAsync(tableQuery, continuationToken);
		continuationToken = result.ContinuationToken;

		var val = Interlocked.Add(ref fetched, result.Count());

		lock (tableClient)
		{
			Console.CursorLeft = 0;
			Console.Write($"Fetched {val:N0} entities...");
		}

		if (continuationToken == null)
			break;
	}
}).ToList();

@christothes christothes removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Mar 23, 2021
@christothes christothes self-assigned this Mar 23, 2021
@jsquire jsquire added Client This issue points to a problem in the data-plane of the library. Tables labels Mar 23, 2021
@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Mar 23, 2021
@christothes
Copy link
Member

@HansOlavS Thanks for filing this issue and for providing the detailed repro steps. We'll investigate.

@christothes
Copy link
Member

christothes commented Mar 26, 2021

@HansOlavS I did some profiling with the Performance Profiler in Visual Studio and the results show that there is actually not a significant difference in memory total allocations, but there is a significant difference in GC collections which is why it appears to be different at any given time.

Here is the NET Object Allocation Tracking graph for the track1 client (Microsoft.Azure.Cosmos.Table):
image

Note that there is never a large peak in live objects, but there 115 GC collections. The time taken to iterate through the 100k entities was roughly twice that of the track 2 client, which I believe gave more time for the GC to collect naturally.

Here is the NET Object Allocation Tracking graph for the track1 client (Azure.Data.Tables):
image

Note that the peak live objects are roughly equivalent, but there are only 56 GC collections.

To make the differences even more apparent, I configured the test application to use the server GC and also delay collections with TryStartNoGCRegion. This was done with 10 object allocation granularity (the previous test was with 100 granularity)

Track1 (Microsoft.Azure.Cosmos.Table):
image

Track2 (Azure.Data.Tables):
image

So in summary, the track2 client seems to allocate no more than track2 (perhaps less), and is significantly faster (roughly 2x).

@christothes christothes added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Mar 26, 2021
@ghost ghost removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Mar 26, 2021
@HansOlavS
Copy link
Author

Hi @christothes. Yeah, I mentioned in my "Expected behavior" section "Streaming through entities via await foreach (var page in pages) should only hold objects associated with the current page and previous pages should be garbage collected efficiently". So I knew this was a garbage collection issue, more or less. But don't you think it's a bit strange that the memory usage balloons out to well over 1 GB memory (when I have millions of entities to stream through) and it's a bit counter-intuitive that I have to handle GC myself by injecting GC.Collect? Isn't here some GC hints you can put in the page AsyncEnumerator or something like that?

@ghost ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Mar 26, 2021
@christothes
Copy link
Member

But don't you think it's a bit strange that the memory usage balloons out to well over 1 GB memory (when I have millions of entities to stream through) and it's a bit counter-intuitive that I have to handle GC myself by injecting GC.Collect? Isn't here some GC hints you can put in the page AsyncEnumerator or something like that?

In my testing, putting a Task.Delay(3500) in the pages loop of the repro code example accomplishes the same thing. It provides enough time for the GC to take an opportunity to collect. In other words, if you artificially slow the track 2 client down to be roughly equivalent from a throughput perspective, the GC will collect more often on its own. This doesn't appear to be an issue with the client, but the expected behavior of the GC when the process is very busy.

@christothes christothes added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Mar 26, 2021
@ghost ghost removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Mar 26, 2021
@HansOlavS
Copy link
Author

Ok 👍

@ghost ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Mar 26, 2021
@christothes christothes removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label May 3, 2021
azure-sdk pushed a commit to azure-sdk/azure-sdk-for-net that referenced this issue Oct 14, 2022
[2022-04-01-preview] Add New Api-version for Microsoft.ApiManagement (Azure#20399)

* Adds base for updating Microsoft.ApiManagement from version preview/2021-12-01-preview to version 2022-04-01-preview

* Updates readme

* Updates API version in new specs and examples

* APIM Auth Servers (Azure#19234)

* APIM Auth Servers

* adding x-ms-identifiers

* removing some weird, invisible special char

* formatting

* oAuth2AuthenticationSettings moved to AuthSettings

* Formatting

Co-authored-by: Milan Zolota <mizolota@microsoft.com>

* API Management Authorization Endpoints (Azure#19615)

* Add blockchain to latest profile

* Add additional types

* add authorizations definitions

* authorizations operations

* add examples

* update readme

* fix examples

* fix linter delete errors

* address CI validation errors

* prettier fix

* update to 2022-04

* fix readme

* Update specification/apimanagement/resource-manager/Microsoft.ApiManagement/preview/2022-04-01-preview/apimauthorizationproviders.json

Co-authored-by: Sean Kim <seaki@microsoft.com>

* update versions

* Apply suggestions from code review

Co-authored-by: Mark Cowlishaw <markcowl@microsoft.com>
Co-authored-by: Annaji Sharma Ganti <anganti@microsoft.com>
Co-authored-by: Annaji Sharma Ganti <42851022+annaji-msft@users.noreply.github.com>

* Move Long running Create Operation from Location based to Azure-AsyncOperation Header (Azure#19733)

* azure-asyncOperation

* prettier

* fix(apim): Add missing 'metrics' property to diagnostics contract in 2022-04-01-preview (Azure#20317)

* apim /PUT apis import add translateRequiredQueryParameters (Azure#20333)

* [2022-04-01-preview] Replace resource with proxyresource and TrackedResource (Azure#20461)

* replace resource with proxyresource

* revert to proxyresource

* Add type object to authorization definitions (Azure#20631)

Authorization definitions were missing "type": "object", and this change adds that key/value pair

* Add type object to policy fragment definition (Azure#20585)

* APIM Open ID Connect providers (Azure#20622)

* APIM Open ID Connect providers

* added new proeprties for update

* prettier

* [APIM] Add Nat Gateway (Azure#19990)

* Update apimdeployment.json

* Create ApiManagementCreateServiceWithNatGatewayEnabled.json

* fix typo in file

* Change Nat Gateway property to enum

* modify type of natgateway state

* update property name

* add example reference

* small fix in example

* rename to  outboundPublicIPAddresses

Co-authored-by: Samir Solanki <samirsolanki@outlook.com>

* [2022-04-01-preview] MIgrate2Stv2 API (Azure#20504)

* migrate2stv2

* updated to post

* 202 and location

* add body to 202

* remove body from 202

Co-authored-by: Vatsa Patel <vatsapatel@microsoft.com>
Co-authored-by: Samir Solanki <samirsolanki@outlook.com>
Co-authored-by: vatsapatel@microsoft.colm <vatsapatel@microsoft.colm>

* Address Authorizations MissingTypeObject errors (Azure#20919)

* Add forgotten If-Match header (Azure#20920)

* Add forgotten If-Match header

`If-Match` header for the `DeleteAuthorizationAccessPolicy.json file` was forgotten. This change adds the wildcard character for the `If-Match` header for that file.

* Update ApiManagementDeleteAuthorization.json

* Use common types for specs and count as readonly (Azure#21023)

* common types

* count readonly

* Sasolank/more review comments (Azure#21025)

* XML

* proxy to gateway

* Update Authorizations Spec (Azure#21027)

* Update definitions.json

Update wording for PostGetLoginLink endpoint description

* Update apimauthorizationproviders.json

Add 201 response to all Authorization PUT requests

* Updated examples and fixed formatting

There was a formatting issue within apimauthorizationproviders.json, and the Authorization examples needed to be updated with the new 201 responses for creating/updating Authorization entities.

* Add long-running-operation key/value

Added x-ms-long-running-operation: true to Authorization PUT requests

* Remove long-running-operations

* readonly revert (Azure#21050)

* Set  SchemaContract.Document as required. (Azure#20110)

* Updated documentation of the SchemaContract. Server use to return code 500 in case SchemaContract.Document is null. That issue was fixed in the APIM and server will return proper response code.

* Fix AzureApiValidation

* update field with properties

* revert remaining readonly on collection (Azure#21051)

* Change to camel casing for "accesspolicies" (Azure#21070)

* Change to camel casing for "accesspolicies"

* More camel casing updates for access policies

* list example fixed (Azure#21089)

* fix definition (Azure#21110)

* upgrade to v3 for common types (Azure#21109)

* upgrade to v3

* Space

* revert to v2 proxyResource

Co-authored-by: Milan Zolota <Hardell@users.noreply.github.com>
Co-authored-by: Milan Zolota <mizolota@microsoft.com>
Co-authored-by: Sean D Kim <seandkim14@gmail.com>
Co-authored-by: Mark Cowlishaw <markcowl@microsoft.com>
Co-authored-by: Annaji Sharma Ganti <anganti@microsoft.com>
Co-authored-by: Annaji Sharma Ganti <42851022+annaji-msft@users.noreply.github.com>
Co-authored-by: Tom Kerkhove <kerkhove.tom@gmail.com>
Co-authored-by: Korolev Dmitry <deagle.gross@gmail.com>
Co-authored-by: Logan Zipkes <44794089+LFZ96@users.noreply.github.com>
Co-authored-by: Rafał Mielowski <mielex@gmail.com>
Co-authored-by: malincrist <92857141+malincrist@users.noreply.github.com>
Co-authored-by: GuanchenIntern <109827715+GuanchenIntern@users.noreply.github.com>
Co-authored-by: VatsaPatel <vatsapatel13@gmail.com>
Co-authored-by: Vatsa Patel <vatsapatel@microsoft.com>
Co-authored-by: vatsapatel@microsoft.colm <vatsapatel@microsoft.colm>
Co-authored-by: Maxim Agapov <103097563+agapovm@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators Mar 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Tables
Projects
None yet
Development

No branches or pull requests

3 participants