Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Consider sharing a single HTTP connection pool by default #13797

Closed
mikeharder opened this issue Sep 16, 2020 · 4 comments
Closed
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. pillar-performance The issue is related to performance, one of our core engineering pillars.

Comments

@mikeharder
Copy link
Member

To improve performance, Python should consider sharing a single HTTP connection pool by default, as is already done by other languages like .NET.

For example, this .NET code will use a single HTTP connection for both uploads, significantly improving the performance of the second upload:

var client1 = new BlobClient(connectionString, "test", "test");
client1.Upload(Stream.Null, overwrite: true);

var client2 = new BlobClient(connectionString, "test", "test");
client2.Upload(Stream.Null, overwrite: true);

In contrast, this Python code will use a new HTTP connection for each upload:

client1 = BlobClient.from_connection_string(connection_string, "test", "test")
client1.upload_blob([], overwrite=True)

client2 = BlobClient.from_connection_string(connection_string, "test", "test")
client2.upload_blob([], overwrite=True)

It's possible to reuse the connection in Python by passing a shared session, but I think this should be the default behavior:

session = requests.session()

client3 = BlobClient.from_connection_string(connection_string, "test", "test", session=session)
client3.upload_blob([], overwrite=True)

client4 = BlobClient.from_connection_string(connection_string, "test", "test", session=session)
client4.upload_blob([], overwrite=True)

To repro, run the apps at https://github.com/mikeharder/azure-sdk-http-sharing and use a tool like Wireshark to monitor the HTTP connections.

Another issue is that the session parameter is not included in the API docs, but I believe this is covered by #12122.

CC: @johanste, @lmazuel

@mikeharder mikeharder added the pillar-performance The issue is related to performance, one of our core engineering pillars. label Sep 16, 2020
@tasherif-msft tasherif-msft self-assigned this Mar 19, 2021
@tasherif-msft
Copy link
Contributor

Hey @mikeharder! currently a session object is shared when we instantiate a parent client - for example a BlobServiceClient as a singleton - and create instances of BlobClients, ContainerClients, etc. This happens by simply passing that session when initiating the children clients.
We could definitely improve our documentation by letting the user know that they can share a session object between independently created client objects (like the example you illustrated above)

However, I can't think of a way to have this done by default (under the hood) where one session object persists for a given python process so its used by any client created in that process (I assume this is what you meant by default behavior?)

And on a finer grain, I don't believe there is a way to share a connection pool between different sessions, referring to https://requests.readthedocs.io/en/master/user/advanced/#keep-alive it states "thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!".
So a connection pool keyed by (hostname, port) pair is maintained within a session.

In .NET this is done by default? or like Python, users have to explicitly pass that session object?
Let me know! I'm still getting the hang of core and underlying architecture.

Thank you :)

@mikeharder
Copy link
Member Author

@tasherif-msft: I believe .NET, Java, and JS all share a single "session" by default (where "session" here is a vague term meaning the clients re-use the same HTTP connection when possible). You can compare the .NET and Python sample apps here:

https://github.com/mikeharder/azure-sdk-http-sharing

In Python, I think this could be implemented with a shared static "default session" used if none is passed in.

@lmazuel lmazuel added the Client This issue points to a problem in the data-plane of the library. label Dec 17, 2021
Copy link

Hi @mikeharder, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 15, 2024
@lmazuel
Copy link
Member

lmazuel commented Jun 10, 2024

For closure on this thread, current recommendation is to pass a custom transport (not a custom session):

from azure.core.pipeline.transport import RequestsTransport
transport = RequestsTransport()

client3 = BlobClient.from_connection_string(connection_string, "test", "test", transport=transport)
client3.upload_blob([], overwrite=True)

client4 = BlobClient.from_connection_string(connection_string, "test", "test", transport=transport)
client4.upload_blob([], overwrite=True)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. pillar-performance The issue is related to performance, one of our core engineering pillars.
Projects
None yet
Development

No branches or pull requests

4 participants