-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Azure.Data.Table memory consumption is 120% higher than Microsoft.Azure.Cosmos.Table #19733
Comments
If you want to test the code and swap out with the legacy SDK, here's the code I tested with: using Microsoft.Azure.Cosmos.Table;
var cloudStorageAccount = CloudStorageAccount.Parse("your-connectionstring-here");
var cloudTableClient = cloudStorageAccount.CreateCloudTableClient();
var cloudTable = cloudTableClient.GetTableReference("tempdata");
streamTasks = partitions.Select(async x =>
{
var tableQuery = new TableQuery<DynamicTableEntity> { FilterString = $"PartitionKey eq '{x}'", TakeCount = 200 };
var continuationToken = default(TableContinuationToken);
while (true)
{
var result = await cloudTable.ExecuteQuerySegmentedAsync(tableQuery, continuationToken);
continuationToken = result.ContinuationToken;
var val = Interlocked.Add(ref fetched, result.Count());
lock (tableClient)
{
Console.CursorLeft = 0;
Console.Write($"Fetched {val:N0} entities...");
}
if (continuationToken == null)
break;
}
}).ToList(); |
@HansOlavS Thanks for filing this issue and for providing the detailed repro steps. We'll investigate. |
@HansOlavS I did some profiling with the Performance Profiler in Visual Studio and the results show that there is actually not a significant difference in memory total allocations, but there is a significant difference in GC collections which is why it appears to be different at any given time. Here is the NET Object Allocation Tracking graph for the track1 client (Microsoft.Azure.Cosmos.Table): Note that there is never a large peak in live objects, but there 115 GC collections. The time taken to iterate through the 100k entities was roughly twice that of the track 2 client, which I believe gave more time for the GC to collect naturally. Here is the NET Object Allocation Tracking graph for the track1 client (Azure.Data.Tables): Note that the peak live objects are roughly equivalent, but there are only 56 GC collections. To make the differences even more apparent, I configured the test application to use the server GC and also delay collections with Track1 (Microsoft.Azure.Cosmos.Table): So in summary, the track2 client seems to allocate no more than track2 (perhaps less), and is significantly faster (roughly 2x). |
Hi @christothes. Yeah, I mentioned in my "Expected behavior" section "Streaming through entities via await foreach (var page in pages) should only hold objects associated with the current page and previous pages should be garbage collected efficiently". So I knew this was a garbage collection issue, more or less. But don't you think it's a bit strange that the memory usage balloons out to well over 1 GB memory (when I have millions of entities to stream through) and it's a bit counter-intuitive that I have to handle GC myself by injecting GC.Collect? Isn't here some GC hints you can put in the page AsyncEnumerator or something like that? |
In my testing, putting a |
Ok 👍 |
[2022-04-01-preview] Add New Api-version for Microsoft.ApiManagement (Azure#20399) * Adds base for updating Microsoft.ApiManagement from version preview/2021-12-01-preview to version 2022-04-01-preview * Updates readme * Updates API version in new specs and examples * APIM Auth Servers (Azure#19234) * APIM Auth Servers * adding x-ms-identifiers * removing some weird, invisible special char * formatting * oAuth2AuthenticationSettings moved to AuthSettings * Formatting Co-authored-by: Milan Zolota <mizolota@microsoft.com> * API Management Authorization Endpoints (Azure#19615) * Add blockchain to latest profile * Add additional types * add authorizations definitions * authorizations operations * add examples * update readme * fix examples * fix linter delete errors * address CI validation errors * prettier fix * update to 2022-04 * fix readme * Update specification/apimanagement/resource-manager/Microsoft.ApiManagement/preview/2022-04-01-preview/apimauthorizationproviders.json Co-authored-by: Sean Kim <seaki@microsoft.com> * update versions * Apply suggestions from code review Co-authored-by: Mark Cowlishaw <markcowl@microsoft.com> Co-authored-by: Annaji Sharma Ganti <anganti@microsoft.com> Co-authored-by: Annaji Sharma Ganti <42851022+annaji-msft@users.noreply.github.com> * Move Long running Create Operation from Location based to Azure-AsyncOperation Header (Azure#19733) * azure-asyncOperation * prettier * fix(apim): Add missing 'metrics' property to diagnostics contract in 2022-04-01-preview (Azure#20317) * apim /PUT apis import add translateRequiredQueryParameters (Azure#20333) * [2022-04-01-preview] Replace resource with proxyresource and TrackedResource (Azure#20461) * replace resource with proxyresource * revert to proxyresource * Add type object to authorization definitions (Azure#20631) Authorization definitions were missing "type": "object", and this change adds that key/value pair * Add type object to policy fragment definition (Azure#20585) * APIM Open ID Connect providers (Azure#20622) * APIM Open ID Connect providers * added new proeprties for update * prettier * [APIM] Add Nat Gateway (Azure#19990) * Update apimdeployment.json * Create ApiManagementCreateServiceWithNatGatewayEnabled.json * fix typo in file * Change Nat Gateway property to enum * modify type of natgateway state * update property name * add example reference * small fix in example * rename to outboundPublicIPAddresses Co-authored-by: Samir Solanki <samirsolanki@outlook.com> * [2022-04-01-preview] MIgrate2Stv2 API (Azure#20504) * migrate2stv2 * updated to post * 202 and location * add body to 202 * remove body from 202 Co-authored-by: Vatsa Patel <vatsapatel@microsoft.com> Co-authored-by: Samir Solanki <samirsolanki@outlook.com> Co-authored-by: vatsapatel@microsoft.colm <vatsapatel@microsoft.colm> * Address Authorizations MissingTypeObject errors (Azure#20919) * Add forgotten If-Match header (Azure#20920) * Add forgotten If-Match header `If-Match` header for the `DeleteAuthorizationAccessPolicy.json file` was forgotten. This change adds the wildcard character for the `If-Match` header for that file. * Update ApiManagementDeleteAuthorization.json * Use common types for specs and count as readonly (Azure#21023) * common types * count readonly * Sasolank/more review comments (Azure#21025) * XML * proxy to gateway * Update Authorizations Spec (Azure#21027) * Update definitions.json Update wording for PostGetLoginLink endpoint description * Update apimauthorizationproviders.json Add 201 response to all Authorization PUT requests * Updated examples and fixed formatting There was a formatting issue within apimauthorizationproviders.json, and the Authorization examples needed to be updated with the new 201 responses for creating/updating Authorization entities. * Add long-running-operation key/value Added x-ms-long-running-operation: true to Authorization PUT requests * Remove long-running-operations * readonly revert (Azure#21050) * Set SchemaContract.Document as required. (Azure#20110) * Updated documentation of the SchemaContract. Server use to return code 500 in case SchemaContract.Document is null. That issue was fixed in the APIM and server will return proper response code. * Fix AzureApiValidation * update field with properties * revert remaining readonly on collection (Azure#21051) * Change to camel casing for "accesspolicies" (Azure#21070) * Change to camel casing for "accesspolicies" * More camel casing updates for access policies * list example fixed (Azure#21089) * fix definition (Azure#21110) * upgrade to v3 for common types (Azure#21109) * upgrade to v3 * Space * revert to v2 proxyResource Co-authored-by: Milan Zolota <Hardell@users.noreply.github.com> Co-authored-by: Milan Zolota <mizolota@microsoft.com> Co-authored-by: Sean D Kim <seandkim14@gmail.com> Co-authored-by: Mark Cowlishaw <markcowl@microsoft.com> Co-authored-by: Annaji Sharma Ganti <anganti@microsoft.com> Co-authored-by: Annaji Sharma Ganti <42851022+annaji-msft@users.noreply.github.com> Co-authored-by: Tom Kerkhove <kerkhove.tom@gmail.com> Co-authored-by: Korolev Dmitry <deagle.gross@gmail.com> Co-authored-by: Logan Zipkes <44794089+LFZ96@users.noreply.github.com> Co-authored-by: Rafał Mielowski <mielex@gmail.com> Co-authored-by: malincrist <92857141+malincrist@users.noreply.github.com> Co-authored-by: GuanchenIntern <109827715+GuanchenIntern@users.noreply.github.com> Co-authored-by: VatsaPatel <vatsapatel13@gmail.com> Co-authored-by: Vatsa Patel <vatsapatel@microsoft.com> Co-authored-by: vatsapatel@microsoft.colm <vatsapatel@microsoft.colm> Co-authored-by: Maxim Agapov <103097563+agapovm@users.noreply.github.com>
Describe the bug
When I try to stream through all entities in a table per partition concurrently it seems that garbage collection isn't working. After fetching about 400k (just streaming through without holding on to any entities) the memory consumption gets to 1 GB.
Expected behavior
Streaming through entities via
await foreach (var page in pages)
should only hold objects associated with the current page and previous pages should be garbage collected efficiently.Actual behavior (include Exception or Stack Trace)
Memory consumption is around 120% higher when using Azure.Data.Table SDK than the legacy Microsoft.Azure.Cosmos.Table SDK.
To Reproduce
Environment:
Version: 5.0.201
Commit: a09bd5c86c
Runtime Environment:
OS Name: Windows
OS Version: 10.0.19042
OS Platform: Windows
RID: win10-x64
The text was updated successfully, but these errors were encountered: