Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customer is asserting that V12 Storage SDK has slower performance that V2.1 sdk #9596

Closed
amishra-dev opened this issue Jan 25, 2020 · 15 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Milestone

Comments

@amishra-dev
Copy link

Ask from XClient : Provide suggestions towards this concern and whether we have any upcoming improvements in V12 along these lines.
The customer also wants to know about the support lifecycle timelines for v2, if they decide to continue using v2.

customer is comparing the performance of Python SDK v12 with Python Storage SDK v2.
They observe that SDK v2 performance is better while doing concurrent upload or download of 600 bytes blob 1800 times using ThreadPoolExecutor,.
The code and test results are present in the IcM attachments. Engineer was able to get similar results in-house too.

Ask from XClient : Provide suggestions towards this concern and whether we have any upcoming improvements in V12 along these lines.
The customer also wants to know about the support lifecycle timelines for v2, if they decide to continue using v2.

@amishra-dev amishra-dev added Storage Storage Service (Queues, Blobs, Files) Service Attention Workflow: This issue is responsible by Azure service team. customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Jan 25, 2020
@amishra-dev
Copy link
Author

ICM 171649877 has the samples and repro steps

@mikeharder
Copy link
Member

mikeharder commented Jan 30, 2020

I was also able to reproduce this perf regression using the customer-provided applications. We are investigating the root cause and how we can make V12 meet or exceed the performance of V2.

@mikeharder mikeharder added the EngSys This issue is impacting the engineering system. label Jan 30, 2020
@mikeharder mikeharder removed their assignment Feb 4, 2020
@weshaggard weshaggard added Client This issue points to a problem in the data-plane of the library. and removed EngSys This issue is impacting the engineering system. labels Feb 11, 2020
@lmazuel lmazuel added the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label May 4, 2020
@amishra-dev amishra-dev added bug This issue requires a change to an existing behavior in the product in order to be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jul 15, 2020
@lmazuel lmazuel added this to the Backlog milestone Jul 20, 2020
@mgsnuno
Copy link

mgsnuno commented Jul 30, 2020

@mikeharder any update on this? fsspec/adlfs#57 is another result of the changes introduced with V12.

@mikeharder
Copy link
Member

@mgsnuno: Thanks for reporting the issue at fsspec/adlfs#57. If this is indeed a perf regression in azure-storage-blob v12, we would like to investigate and fix it.

However, it's not clear to me that fsspec/adlfs#57 is related to this issue. It looks like fsspec/adlfs#57 is about large files, while this issue is specific to small files (600 bytes).

Can you please open a new issue in this repo with details for how to repro your scenario, and the size of the perf regression you are seeing from v2 to v12?

@mikeharder
Copy link
Member

mikeharder commented Jul 30, 2020

For the original scenario uploading and download 600 byte blobs, using the customer-provided repro app in the ICM, I was able to verify v12 is 7-21% slower than v2:

Scenario Size Parallel 2.0.1 - time 2.0.1 - ops/s 12.1.0 - time 12.0.1 - ops/s 12.1.0/2.0.1 - ops/s
UploadSerial 600 1 13994 129 14992 120 93%
DownloadSerial 600 1 9282 194 10313 175 90%
UploadParallel 600 4 3864 466 4900 367 79%
DownloadParallel 600 4 4324 416 5296 340 82%

I was also able to get similar results using the Python perf framework (https://github.com/mikeharder/azure-sdk-perf/tree/master/python/azure-storage-blob-perfstress).

I think the next step is for someone to profile the v2 and v12 code, and find areas for improvement in v12.

@xiafu-msft
Copy link
Contributor

xiafu-msft commented Jul 30, 2020

@mgsnuno
Copy link

mgsnuno commented Aug 4, 2020

@mikeharder I understand your request, I haven't done it yet because it is pretty clear in dask/adlfs the issue as mentioned in fsspec/adlfs#57, but I don't have code replicating with azure V12 storage alone. The main issue is not the file being big or small and more than speed, it's the memory usage, I see 2x-5x higher memory usage.

If you can replicate the issue as I described in fsspec/adlfs#57 awesome, then you can maybe debug the V12 usage. If you want an example without dask, I'll work on it.

@mikeharder
Copy link
Member

@mgsnuno: We are eager to investigate any performance regression in v12, whether it's lower throughput or higher memory usage. But to investigate we will need a repro using only azure-storage-blob.

@hermanschaaf
Copy link

In case it helps anyone else out there: when moving from Azure blob storage v2 to v12, we experienced a significant regression in performance when uploading many small files (-50%). However, we discovered a seemingly undocumented session argument to BlobServiceClient which allowed us to share one session per process across many BlobServiceClient instances. We were doing this in v2, but originally thought it was not possible in v12. For example:

import requests
from azure.storage.blob import BlobServiceClient

storage_account_name = "mystorageaccount"
session = requests.Session()
clients = [BlobServiceClient(
    account_url=f'https://{storage_account_name}.blob.core.windows.net/',
    session=session,
) for client in range(10)]

Using a shared session led to fewer connection round-trips and resulted in performance matching previous levels seen on v2. The original problem was particularly pronounced in our case because we are uploading files from a region in Europe to one in America, but I imagine it can have an impact any time where there are many BlobServiceClient instances being created. If possible in your case, it should also be equivalent to use a single BlobServiceClient for all the uploads in a process.

@mikeharder
Copy link
Member

@hermanschaaf: Thank you for reporting this.

The missing documentation is a known issue covered by #12122.

I opened a new issue #13797 to discuss changing Python to share HTTP connections by default, so you would not need to manually share a session.

@xiafu-msft
Copy link
Contributor

we are still investigating the problem. Will update this issue once we find anything

@amishra-dev
Copy link
Author

@xiafu-msft is still looking into this.

@amishra-dev
Copy link
Author

@tasherif-msft this one is on you now

@tasherif-msft
Copy link
Contributor

tasherif-msft commented Dec 1, 2021

This Perf investigation has been broken up to different pieces below:
Sharing HTTP session: #13797
Listing: #19755 and the PR to address this is currently being POC'd #19814
Parallel Datalake Uploads: #16085
Downloads/uploads:
#16080
#16079
#16078

Currently listing is being tackled first. I will go ahead and close this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests