Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizable receive timeout for SMB Connection #285

Merged
merged 2 commits into from
Aug 19, 2024

Conversation

leonnis12
Copy link
Contributor

This PR extends the configurability of the SMB Connection class and register_session methods to allow modifying the transport.recv timeout used for both keep-alive checks that keep Windows from closing a connection at ~16 minutes, and for checking the health of the connection.

In the usecase of my application, the SMB connection is used over an unstable network, which sometimes loses the connections between the machines. This triggers Connection timed out errors from the health check implemented in _process_message_thread, but only after 10+ minutes (the hardcoded timeout) where the application hangs. I would like to be able to configure this timeout to be smaller for use cases like this, to perform the health check more often, and be able to handle the timeout error in a reasonable ammount of time.

Current behaviour:

  • The connection is checked every 10 minutes using a SMB2_ECHO command.
  • If the connection is broken due to a network problem, the connection will be stuck until the timeout passes, the ECHO command is sent and the server does not respond to it.

Expected Behavior

  • The timeout should be configurable for the cases where network errors are common, and handling them often requires waiting 10+ minutes for each such problem.

This might also be a fix for #117 (mentions a similar problem), by allowing configuring a smaller timeout value, which does not keep the application stuck for long and allows the error to be handled in my application in a timely manner.

The PR extends the health checks added in #135 with the configurable health check interval value.

@DragosFlorea
Copy link

This would be very helpful for my use case as well

@jborean93
Copy link
Owner

Thanks for the PR.

I am somewhat weary about exposing the retry mechanism timeout as a public API as I feel like the implementation is a bit of a bandaid over the real problem rather than a good solution. I know I've been dragging my heels on trying to rework the TCP transport side where my aim was to improve this situation but I'm on the fence about accepting this type of change.

The proper solution in my mind is to figure out how to detect when the socket has been "dropped" but potentially in your situation the client still thinks its connected and thus will wait for the full timeout.

@DragosFlorea
Copy link

DragosFlorea commented Aug 15, 2024

I understand your concern... And yes the best solution is to know exactly when the connection has been dropped but a logic on the client side to cut off the connection after timeout is not bad either. I mean you can have this as backup if you cannot detect the dropped connection
If you do not want to expose the timeout as a public api, which is a low level configuration that can mess up things, what do you say about a environment variable marked as experimental until you have a better solution?

@jborean93
Copy link
Owner

I think I can justify an environment variable here, especially if it's helping in some situations. I would have to stress it wouldn't be covered under the public API and could change/be removed in the future.

The socket stuff can be complicated, if you know the reason why the socket was closed but never reported back to the smbprotocol then I'll love to see a reproducer for it. That'll help me test out new scenarios in the future and hopefully avoid the need for the timeout altogether.

@leonnis12
Copy link
Contributor Author

I agree that the proper solution would be to detect when the socket has been "dropped", but also that the environment variable "fix" should be enough to help in the situations where this happens, until a proper solution can be found at least.

I've updated the pull request to not expose the timeout through the public API, and instead be available to be overridden through an environment variable marked as experimental from the Connection class.

Regarding reproducing the issue, simulating a network that drops packets from time to time, or has random response time spikes should be enough to trigger this issue, for further testing in the future. The following is an example from the network where I encountered the issue constantly.
pings

Copy link

codecov bot commented Aug 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.04%. Comparing base (2ce49ef) to head (8e16e2b).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #285   +/-   ##
=======================================
  Coverage   99.04%   99.04%           
=======================================
  Files          24       24           
  Lines        5115     5116    +1     
=======================================
+ Hits         5066     5067    +1     
  Misses         49       49           
Flag Coverage Δ
macOS 68.15% <100.00%> (+<0.01%) ⬆️
py3.10 99.00% <100.00%> (+<0.01%) ⬆️
py3.11 99.00% <100.00%> (+<0.01%) ⬆️
py3.12 99.00% <100.00%> (+<0.01%) ⬆️
py3.8 99.00% <100.00%> (+<0.01%) ⬆️
py3.9 99.04% <100.00%> (+<0.01%) ⬆️
ubuntu 96.81% <100.00%> (+<0.01%) ⬆️
windows 98.96% <100.00%> (+<0.01%) ⬆️
x64 99.04% <100.00%> (+<0.01%) ⬆️
x86 98.96% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jborean93
Copy link
Owner

I'll need to sort out the CI problems on macOS separately, I believe the image version now uses the arm64 builds so the x86_64 ones are failing without further work needed. That's not your problem to deal with so thanks for the changes and agreeing to the env var solution for now!

@jborean93 jborean93 merged commit 1572393 into jborean93:master Aug 19, 2024
22 of 25 checks passed
@leonnis12
Copy link
Contributor Author

Would it be possible to make a minor release which includes this fix, until the changes for 1.14 are ready? It would be really helpful.

@jborean93
Copy link
Owner

v1.14.0 has been published on PyPI with this change https://pypi.org/project/smbprotocol/1.14.0/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants