Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse expiration_interval as select() timeout #401

Merged
merged 1 commit into from
Jan 3, 2022
Merged

Reuse expiration_interval as select() timeout #401

merged 1 commit into from
Jan 3, 2022

Conversation

MichaIng
Copy link
Contributor

@MichaIng MichaIng commented Nov 23, 2021

❓ What kind of change does this PR introduce?

  • 🐞 bug fix
  • 🐣 feature
  • πŸ“‹ docs update
  • πŸ“‹ tests/coverage improvement
  • πŸ“‹ refactoring
  • πŸ’₯ other

πŸ“‹ What is the related issue number (starting with #)

#378
cherrypy/cherrypy#1908

❓ What is the current behavior? (You can also link to an open issue here)

The connection handler loop runs once every ~0.01 sections, leading to a significant idle system load on small hardware.

❓ What is the new behavior (if this is a feature change)?

The connection handler loop runs once every ~expiration_interval, which is required to assure that expired connections are closed in time, which is done in the same loop. On Windows, the maximum timeout is capped to 0.05 seconds to make this the maximum additional delay when select() does not return once a socket is ready.

πŸ“‹ Other information:

This is a continuation of: #352

πŸ“‹ Checklist:

  • I think the code is well written
  • I wrote good commit messages
  • I have squashed related commits together after the changes have been approved
  • Unit tests for the changes exist
  • Integration tests for the changes exist (if applicable)
  • I used the same coding conventions as the rest of the project
  • The new code doesn't generate linter offenses
  • Documentation reflects the changes
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences

This change is Reviewable

@MichaIng
Copy link
Contributor Author

Lint does not like the additional "cognitive complexity". However, the new code does only make sense exactly there. Not sure whether to ignore or how to resolve. One idea: The if new_conn is not None: could be moved into self.server.process_conn, to skip None connections wherever this function is called. That one is pretty small, so no problem with complexity there: https://github.com/cherrypy/cheroot/blob/3a92b45/cheroot/server.py#L2058-L2064

@codecov
Copy link

codecov bot commented Dec 29, 2021

Codecov Report

Merging #401 (607ad40) into master (999ef79) will decrease coverage by 0.35%.
The diff coverage is 100.00%.

❗ Current head 607ad40 differs from pull request most recent head b67e931. Consider uploading reports for the commit b67e931 to get more accurate results

@@            Coverage Diff             @@
##           master     #401      +/-   ##
==========================================
- Coverage   80.50%   80.15%   -0.36%     
==========================================
  Files          28       28              
  Lines        4390     4393       +3     
==========================================
- Hits         3534     3521      -13     
- Misses        856      872      +16     

cheroot/connections.py Outdated Show resolved Hide resolved
cheroot/connections.py Outdated Show resolved Hide resolved
cheroot/connections.py Outdated Show resolved Hide resolved
cheroot/connections.py Outdated Show resolved Hide resolved
@MichaIng
Copy link
Contributor Author

MichaIng commented Dec 29, 2021

@webknjaz
Many thanks for your review. I applied most suggestions and applied Sphinx parameter syntax in run() and _run() as well, clearly declaring expiration_interval as float (like it is in parent server as well). I'm not sure if I understand the reason for the module level timeout constant, especially now where all min() inputs are clearly defined as floats?

About the 0.05:

  • Note that the reason for the current 0.01 seconds timeout has been solved long ago, hence the original PR to remove it, significantly reducing idle CPU load, using only the idle connection expiry internal as timeout, which is the only left reason to use one at all.
  • It is Windows only which seems to have a buggy select() implementation which just does not work as it should. So only for Windows a select() timeout cap is required to have no regression. Check the benchmarks done in the original PR.
  • We could keep using 0.01 seconds for Windows, it is a balance between unnecessary idle load (100 vs 20 forced socket parse loops every second) and max connection delay (0.01 seconds vs 0.05 seconds). 0.05 seconds seems a sane balance value to me, but this is open for discussion, or we could even implement an additional input argument for this. I was just hoping that Windows' Python either gets select() fixed or we find a different way to avoid high frequent socket loops on Windows, soon.

The below solves the cognitive complexity CI failure, while of course it doesn't solve it in real as the condition has just moved into min():

select_timeout = min(expiration_interval, 0.05 if IS_WINDOWS else expiration_interval)

Furthermore it is too long (>79 characters) to satisfy CI. Hence I left the previous condition syntax, which looks more clear to me, also indicating more explicitly that only on Windows we still need a capped select() timeout. But this leaves the cognitive complexity issue. If we really want a Windows only global variable, this would solve that part, moving that logic to the top of the script.

I can btw squash the commits the next days. Currently I'm on the road, having GitHub desktop only here, which does not allow to amend to pushed commits πŸ™„.

cheroot/connections.py Outdated Show resolved Hide resolved
@webknjaz
Copy link
Member

I'm not sure if I understand the reason for the module level timeout constant, especially now where all min()
inputs are clearly defined as floats?

I was trying to look out for maintainability+readability. The reason is that having random numbers in the middle of a Python
module doesn't carry any meaning. They are just magic numbers that don't have any comments/metadata attached to them and
explaining whether how they were chosen, why they are needed etc. But if you assign them to variables, those variable names
can add extra context. Still, since the values are immutable per runtime, it doesn't make sense to put the constraint inside the function body and it makes sense to have it right below the imports.

About the 0.05:

  • Note that the reason for the current 0.01 seconds timeout has been solved long ago, hence the original PR to remove it, significantly reducing idle CPU load, using only the idle connection expiry internal as timeout, which is the only left reason to use one at all.
  • It is Windows only which seems to have a buggy select() implementation which just does not work as it should. So only for Windows a select() timeout cap is required to have no regression. Check the benchmarks done in the original PR.
  • We could keep using 0.01 seconds for Windows, it is a balance between unnecessary idle load (100 vs 20 forced socket parse loops every second) and max connection delay (0.01 seconds vs 0.05 seconds). 0.05 seconds seems a sane balance value to me, but this is open for discussion, or we could even implement an additional input argument for this. I was just hoping that Windows' Python either gets select() fixed or we find a different way to avoid high frequent socket loops on Windows, soon.

I was talking about the current default 0.5 that you started reusing with this change, not 0.05.

The below solves the cognitive complexity CI failure, while of course it doesn't solve it in real as the condition has just moved into min():

select_timeout = min(expiration_interval, 0.05 if IS_WINDOWS else expiration_interval)

Yeah, and it still hides that half a second timeout under *NIX.

Furthermore it is too long (>79 characters) to satisfy CI. Hence I left the previous condition syntax, which looks more clear to me, also indicating more explicitly that only on Windows we still need a capped select() timeout. But this leaves the cognitive complexity issue. If we really want a Windows only global variable, this would solve that part, moving that logic to the top of the script.

I don't really like how the current change hides the substitution of the default from 0.01 to 0.5 (which is what actually happens here, implicitly). If this is the goal, it should be clearly documented.

I can btw squash the commits the next days. Currently I'm on the road, having GitHub desktop only here, which does not allow to amend to pushed commits πŸ™„.

That's alright, it's not urgent, desktop and mobile apps are pretty limited.

@MichaIng
Copy link
Contributor Author

MichaIng commented Dec 30, 2021

Do you think it makes sense to leave a comment why the previously used 0.01 is not required anymore? We could link the discussion from the originating PR or the one that did solve this a long time ago. My idea was to not mention in the code why something is not there anymore, but only why something is (still) there, i.e. this comment:

        Use ``expiration_interval`` as ``select()`` timeout
        to assure expired connections are closed in time.

        On Windows cap the timeout to 0.05 seconds
        as ``select()`` does not return when a socket is ready.

It tells you why we need a timeout at all (maybe not clear enough?), which is (on *nix) only to ensure that the expiration_interval is effective, and hence re-using exactly that value is the only reasonable thing to do. Also expiration_interval can be changed when invoking CherryPy/cheroot, while 0.5 is the default. So it is only on Windows that we do still need this magic number, as well explained. But yeah, not sure how to properly decide and document whether 0.05 is good, better than 0.01 etc, since it highly depends on hardware and usage. Also before, there was no explanation why 0.01 was used in the first place, it was just a ~random number which seemed small enough in most cases to handle requests sufficiently quickly (now assured via select() on *nix). But we can do the search/choice for the right value, or even making it a user-selectable variable like expiration_interval, in a separate PR, i.e. reverting to 0.01 to not change anything for Windows.


EDIT: Different topic, but actually the expiry could be done in a dedicated loop with expiration_interval so that the (new) request handler loop does not require any timeout/input on *nix. This would untangle those two values and further clarify things.


EDIT2:

I was trying to look out for maintainability+readability. The reason is that having random numbers in the middle of a Python
module doesn't carry any meaning. They are just magic numbers that don't have any comments/metadata attached to them and
explaining whether how they were chosen, why they are needed etc. But if you assign them to variables, those variable names
can add extra context. Still, since the values are immutable per runtime, it doesn't make sense to put the constraint inside the function body and it makes sense to have it right below the imports.

Since expiration_interval is a function input, this part of the logic needs to stay in the function. We could make:

import ...
...
_WINDOWS_SELECT_TIMEOUT = 0.05
...
select_timeout = min(expiration_interval, _WINDOWS_SELECT_TIMEOUT)

but to not force people scrolling up and down to get the whole reason behind this, we'd need to explain why Windows requires this capped timeout once at the variable declaration and once in the function (like it is done already). The more I think about it, the more I think it makes sense to make it a user-choosable input, naming it like windows_max_request_latency or so. I'm just not sure whether we should start implementing options for something which is only a suboptimal workaround for a bug (or misuse) of select() in Python on Windows, which we can get rid of hopefully soon.

@webknjaz
Copy link
Member

@MichaIng that sounds reasonable. I just want to clarify that docstrings have a different purpose than code comments. Docstrings explain how things are but code comments give more insight into why/how the implementation decisions were made.
I think that for now, it's probably okay to keep that windows constant inside the method and deal with it later. OTOH, putting it into the module-global constant would allow monkey-patching that value. It's probably a good idea to keep what you have in the docstring but additionally add an actual code comment explaining the motivation behind handling the windows corner case separately, mentioning that the value has been chosen empirically.
As for documenting the substitution of 0.01, I think the best place to mention is the commit message.

P.S. @mar10 do you want to test this with your webdav project?

@webknjaz
Copy link
Member

@MichaIng oh, and do you think this change could be tested somehow automatically in the CI?

@MichaIng
Copy link
Contributor Author

I just want to clarify that docstrings have a different purpose than code comments.

Makes sense. I'm obviously no Python coder, no professionally trained coder at all, hence still learning about such conventions πŸ™‚. I can split off something from the docstring into a code comment (which is hashed lines #, right?), like the sentence about the Windows special case.

As for documenting the substitution of 0.01, I think the best place to mention is the commit message.

Ah yes, that makes sense. I'll tie this into the commit message. As of that, "Good commit messages" test fails:

error: pathspec 'patch-1' did not match any file(s) known to git

It does not work on PRs from forks as the repo/owner from where the test is triggered is used, not the one of the originating branch: https://github.com/platisd/bad-commit-message-blocker/blob/master/entrypoint.sh
Did a PR on my repo:

Conformance to the 7 rules of a great Git commit message:
[PASSED] Separate subject from body with a blank line
[PASSED] Limit the subject line to 50 characters
[FAILED] Capitalize the subject line
[PASSED] Do not end the subject line with a period
[FAILED] Use the imperative mood in the subject line
[FAILED] Wrap the body at 72 characters
[  NA  ] Use the body to explain what and why vs. how

Is there some convention about a prefix/tag in the subject line, like connections: currently to express that it is about the connections handler script (which I see in other merged PRs, also failing the test πŸ€”)? Otherwise, aside of capping lines at 72 chars, I'd change it to:

Use "expiration_interval" as "select()" timeout

do you think this change could be tested somehow automatically in the CI?

Not sure how. There is two things:

  • The idle CPU is/should be lower due to much less socket handler loop iterations, but this cannot really be tested, also depends on the originating system, of course.
  • The expiry interval is to be kept, which is probably possible to test. Though the exact time until a connection expires is random between ~instantly up to the expiry interval, respectively above the expiry internal due to processing time, which again depends on the system.
  • Also not sure how to apply unit tests since those functions do not return something.

For such things a benchmark as part of CI pipeline would be great, AFAIK the one used in the originating PR to detect the issue on Windows is part of the repo already, so could be made a workflow of? We would not be able to test whether this PR reduces idle CPU load, but whether it affects connection handler performance, especially on Windows.

@webknjaz
Copy link
Member

I just want to clarify that docstrings have a different purpose than code comments.

Makes sense. I'm obviously no Python coder, no professionally trained coder at all,
hence still learning about such conventions πŸ™‚.

Ah, I didn't realize. You're doing great! If you need pointers on some materials to read,
let me know β€” I've got plenty.

I can split off something from the docstring into a code comment (which is hashed
lines #, right?), like the sentence about the Windows special case.

Yes to the leading-#-lines. No to removing the info about Windows from the docstring.
Since that note highlights a behavioral difference that affects the end-users, it's useful
for them to know that it is different. But an additional code comment could expand on
why (it's the main use-case of the code-comments β€” mention unobvious motivations).

These two types of content have different target audiences β€” the docstring may be
useful for the end-users just poking in Python REPL to see what sorts of things exist
in the importable modules (for various reasons, mostly exploratory), they also may
end up in completely auto-generated private API docs (https://cheroot.cherrypy.dev/en/latest/pkg/cheroot.connections/#cheroot.connections.ConnectionManager.put).

OTOH the code comments are meant for code contributors and project archeologists
who can read what the code does but would not see why that solution was
implemented (unless it's obvious). This is why code comments should normally only
be used in places that may lack this sort of context but the preference is two write
code in a way that would not create such confusion where possible.

Having magic constants in code is one of the cases where the reader would see an
arbitrary number that wouldn't be explainable with a comprehensive variable name.

As for documenting the substitution of 0.01, I think the best place to mention is
the commit message.

Ah yes, that makes sense. I'll tie this into the commit message. As of that, "Good
commit messages" test fails:

error: pathspec 'patch-1' did not match any file(s) known to git

It does not work on PRs from forks as the repo/owner from where the test is
triggered is used, not the one of the originating branch:
platisd/bad-commit-message-blocker@master/entrypoint.sh

Ah, interesting, I didn't realize. This should probably be reported upstream.

Did a PR on my repo:

Conformance to the 7 rules of a great Git commit message:
[PASSED] Separate subject from body with a blank line
[PASSED] Limit the subject line to 50 characters
[FAILED] Capitalize the subject line
[PASSED] Do not end the subject line with a period
[FAILED] Use the imperative mood in the subject line
[FAILED] Wrap the body at 72 characters
[  NA  ] Use the body to explain what and why vs. how

Is there some convention about a prefix/tag in the subject line, like connections:
currently to express that it is about the connections handler script (which I see in
other merged PRs, also failing the test πŸ€”)? Otherwise, aside of capping
lines at 72 chars, I'd change it to:

Use "expiration_interval" as "select()" timeout

Yeah, in general I expect the commit messages to follow the same standards as
the Git project itself has. They are mostly explained in https://cbea.ms/git-commit/.
Composing Git commits can be compared to story-telling. Here's a few pointers to
draw inspiration from: https://twitter.com/nedbat/status/1283147492616556544 and
https://dhwthompson.com/2019/my-favourite-git-commit.

In case, you'd like to dive deep into Git, here's a collection of links I posted some
time back: pypa/packaging.python.org#968 (comment).

As for 50/72 rule, the commit title / subject line is limited to 50 chars but the prose
below can use lines of 72 chars. There is no prefix requirement but the best
practices insist on the imperative mood which is probably why the prefix breaks
this assumption.

Currently, I'm trying not to be strict about the Git history (although it's important),
especially because the CI check is an experiment to see if it works well.

How about

Reuse `expiration_interval` as `select()` timeout

do you think this change could be tested somehow automatically in the CI?

Not sure how. There is two things:

  • The idle CPU is/should be lower due to much less socket handler loop
    iterations, but this cannot really be tested, also depends on the originating
    system, of course.
  • The expiry interval is to be kept, which is probably possible to test. Though
    the exact time until a connection expires is random between ~instantly up to the
    expiry interval, respectively above the expiry internal due to processing time,
    which again depends on the system.
  • Also not sure how to apply unit tests since those functions do not return something.

For such things a benchmark as part of CI pipeline would be great, AFAIK the one
used in the originating PR to detect the issue on Windows is part of the repo already,
so could be made a workflow of? We would not be able to test whether this PR
reduces idle CPU load, but whether it affects connection handler performance,
especially on Windows.

That's what I thought. Plus depending on the load on the CI platform, some VMs may
be slower occasionally that also impacts benchmark-based tests. Let's skip automatic
testing, then. But in the future, we could experiment with these ideas separately.

This is a continuation of: #352

As discussed in that pull request, on Windows the select() method does
not return when a socket is ready. While the reason is still to be
found out, to get the benefit of using a usually much higher timeout
merged, Windows is not handled differently: The timeout is capped to
0.05 seconds to assure that connections are not delayed more than that.

0.05 seconds are used as an empirically obtained balance between max
connection delay and idle system load. Benchmarks show a mean
processing time per connection of ~0.03 seconds on Linux and with 0.01
seconds timeout on Windows:
#352
While this highly depends on system and hardware, 0.05 seconds max
delay should hence usually not significantly increase the mean
time/delay per connection, but significantly reduce idle system load by
reducing socket loops to 1/5 with 0.01 seconds.

Signed-off-by: MichaIng <micha@dietpi.com>
@MichaIng
Copy link
Contributor Author

MichaIng commented Jan 3, 2022

I added a code comment + commit message paragraph now to give further details about 0.05s. I hope it is not too long πŸ˜„: There are at least three benchmarks on the original PR with different systems which all show a mean connection processing time of ~0.03 seconds (on Windows when having a 0.01s timeout). Given that 0.05s on Windows means and average 0.025s delay before a connection is processed, this does not significantly add further delay in those tested cases, but reduces idle socket loops, which are the major idle CPU usage factor, to 1/5th (compared to 0.01s).

Commit message line length now capped: https://github.com/MichaIng/cheroot/pull/1

===========================
Conformance to the 7 rules of a great Git commit message:
[PASSED] Separate subject from body with a blank line
[PASSED] Limit the subject line to 50 characters
[PASSED] Capitalize the subject line
[PASSED] Do not end the subject line with a period
[PASSED] Use the imperative mood in the subject line
[PASSED] Wrap the body at 72 characters
[  NA  ] Use the body to explain what and why vs. how

The test still fails as it additionally checks the previous commit which does not satisfy the line length πŸ˜„. This is as the master branch on my fork is outdated, hence the PR further contains all new upstream master commits.

@MichaIng MichaIng changed the title connections: expiration_interval as select timeout Reuse expiration_interval as select() timeout Jan 3, 2022
@webknjaz webknjaz merged commit fe8dd50 into cherrypy:master Jan 3, 2022
@webknjaz webknjaz added the bug Something is broken label Jan 3, 2022
@webknjaz
Copy link
Member

webknjaz commented Jan 3, 2022

@MichaIng this is great! As for the comment/commit size β€” I don't think that it correlates with the usefulness. It's okay to have them big if that's what's necessary to explain all the context.

@MichaIng
Copy link
Contributor Author

MichaIng commented Jan 3, 2022

Great, many thanks for merging. I'm looking forward for the next release so that we can finally migrate with our project to latest CherryPy with cheroot πŸ‘πŸ™‚.

@webknjaz
Copy link
Member

webknjaz commented Jan 3, 2022

Yeah, I need to make up my mind regarding a feature vs a bugfix release number. After that, it should be ready soon.

@webknjaz
Copy link
Member

webknjaz commented Jan 3, 2022

I think I'll go for v8.6.0 because this patch has a behavior change / feature to influence the underlying timeout from the exposed settings.

@MichaIng
Copy link
Contributor Author

MichaIng commented Jan 3, 2022

Makes sense. This is something to not have merged automatically via dependabot-like CI/CD pipeline without having a look at the changelog, at least when a project generally aims to be used on Windows as well. If there are any cases of negative performance import, it's good when it can be associated to this change easily so we can think about exposing it as variable or so.

@webknjaz
Copy link
Member

webknjaz commented Jan 4, 2022

Here you go https://github.com/cherrypy/cheroot/runs/4697142529?check_suite_focus=true#step:4:24

MichaIng added a commit to HTPC-Manager/HTPC-Manager that referenced this pull request Jan 4, 2022
`cheroot` v8.6.0 has been released, used in CherryPy v9.0 and above, which has the high idle CPU load resolved: cherrypy/cheroot#401

Since this was the reason for using a CherryPy<9.0 version dependency, this is hereby removed.

Signed-off-by: MichaIng <micha@dietpi.com>
@MichaIng MichaIng deleted the patch-1 branch January 4, 2022 01:57
@mar10
Copy link
Contributor

mar10 commented Jan 4, 2022

P.S. @mar10 do you want to test this with your webdav project?

I re-run my older tests on new Hardware. IIRC, 8.3.1 was the last originally 'problematic' Cheroot release?

Windows 11, Ryzen 5 @ 3.9 GHz
WsgiDAV 4.0.0-a2

Runtime 30 secs, 10 parallel sessions
The script GETs a static file, PUTs a new file, GETs that file, and loops again.

This is what I see on Windows:

  • Cheroot 8.3.1 Executed 2,760 activities (~ 9 requests per user per second)
  • Cheroot 8.5.2 Executed 23,584 activities (~ 78 requests per user per second)
  • Cheroot 8.6.0 Executed 8,640 activities (~ 28 requests per user per second)

My script-runner and WebDAV server my not be perfect for absolute benchmarks, but the comparison is still interesting.

I can re-run on Mac later.

@MichaIng
Copy link
Contributor Author

MichaIng commented Jan 4, 2022

Oh boy, still a major requests per second difference on Windows. The benchmarks in #352 done by you and others showed an average performance of 0.02 - 0.035 seconds per request (now yours is ~0.013 with v8.5.2). Given a random request is coming in when in average 0.025 seconds timeout are left (half of the now hardcoded maximum of 0.05 on Windows), I'd expected this to not change significantly. But of course it depends on the performance of the underlying system, and how the benchmark is done. E.g. at higher concurrency (instead of doing the requests one after another), the timeout looses weight as the processing of the accumulated incoming connections since the last loop gains weight.

I think the major difference which is relevant is that the maximum possible delay for a random incoming connection on Windows increased from 0.01 to 0.05 seconds, and the question is whether this is noticeable by users or not.

I did some fresh idle CPU usage comparison of our project on a Linux VM. The host is a notebook, some years old, but full power given to the VM, and the idle CPU usage dropped from ~2-3 seconds per minute (on a single core) with v8.5.2 to ~0.3 seconds per minute with v8.6.0, with the default 0.5 seconds timeout there => up to 2 loops per second. On a Raspberry Pi 1/Zero this is much more significant, of course, hence a huge benefit for using cheroot/CherryPy on embedded low-performance systems. Would be interesting to see whether on Windows with now only up to 20 instead of 100 loops per second, there is a reduced idle CPU usage measurable as well, which balances the additional connection delay sufficiently.

Otherwise, of course, we can use again 0.01 on Windows, to not change anything there.

@mar10
Copy link
Contributor

mar10 commented Jan 4, 2022

My server and also the test tool are written in Python, so multithreading performance is not-so-good. I would not refer to the numbers for absolute request timings. I was also running server and tool in two shells on the same box.

Here's my results now with macOS tests:

WsgiDAV 4.0.0-a2
Runtime 30 secs, 10 parallel sessions
The script GETs a static file, PUTs a new file, GETs that file, and loops again.

Windows 10, Ryzen 5 @ 3.9 GHz

  • Cheroot 8.3.1 Executed 2,760 activities (~ 9 requests per user per second)
  • Cheroot 8.5.2 Executed 23,584 activities (~ 78 requests per user per second)
  • Cheroot 8.6.0 Executed 8,640 activities (~ 28 requests per user per second)

macOS 12, i5 @ 2.9 GHz

  • Cheroot 8.3.1 Executed 5,468 activities (~ 18 requests per user per second)
  • Cheroot 8.5.2 Executed 12,596 activities (~ 42 requests per user per second)
  • Cheroot 8.6.0 Executed 12,660 activities (~ 42 requests per user per second)

@MichaIng
Copy link
Contributor Author

MichaIng commented Jan 4, 2022

On Windows, could you raise concurrency, e.g. 20 parallel sessions, to see whether this leads to closer results as expected? I'll also spin up Python again on my Windows machine to repeat tests with https://github.com/cherrypy/cheroot/blob/master/cheroot/test/test_wsgi.py and your stressor with timeout values between 0.01 and 0.05.

@mar10
Copy link
Contributor

mar10 commented Jan 5, 2022

Windows 10, Ryzen 5 @ 3.9 GHz

Same scenario, different no. of parallel users. Best of 3

#1 #2 #3 Best
Cheroot 8.5.2 5 users 9,160 9,672 9,180 9,672
10 users 22,532 21,708 23,312 23,312
20 users 23,528 23,516 23,544 23,544
Cheroot 8.6.0 5 users 2,540 2,540 2,540 2,540
10 users 5,008 10,080 6,604 10,080
20 users 23,720 23,404 23,596 23,596

@mar10
Copy link
Contributor

mar10 commented Jan 5, 2022

I did not follow the thread, so sorry if this has already been discussed:
We need a small timeout to improve responsiveness on Windows and we need a larger timeout to reduce idle load on all systems?
Could an approach be, have a lazy timeout (e.g. 0.1), but temprarily reduce the timeout to 0.01 if the last request is younger than 1 second ago?

@MichaIng
Copy link
Contributor Author

MichaIng commented Jan 5, 2022

Oh wow, so 20 parallel users vs 10 parallel users completely eliminates the difference between 0.01 and 0.05. From my point of view, such high request numbers are usually coming from a large number of multiple clients while a single client which hammers a lot of requests in such a short time is usually a bad client, brute-force attack or similar, where some additional delay has more benefit than damage πŸ˜„. Given these results, I'd say 0.05 is totally fine.

We need a small timeout to improve responsiveness on Windows and we need a larger timeout to reduce idle load on all systems?

This hardcoded timeout is used on Windows only, so also 0.01 vs 0.05 only reduce idle CPU load on Windows while it has not effect on other OSes.

Could an approach be, have a lazy timeout (e.g. 0.1), but temprarily reduce the timeout to 0.01 if the last request is younger than 1 second ago?

In theory more magic could be added, but given that it is all a workaround for a bug or at least undocumented behaviour, I personally don't think that it is good invested development time, and the more complicated it gets the more the risk is the we have unexpected downsides on other ends. Also, given your benchmarks, I don't think that there is much potential for significant real world performance improvements, rather than idle CPU load improvements on a typical Windows system/server.

@mar10
Copy link
Contributor

mar10 commented Jan 6, 2022

Agreed. I just tested copying a few hundred files using Windows 11 File Explorer mapped to a WebDAV folder and got similar performance for cheroot 8.5.2 and 8.6.0.

webknjaz added a commit that referenced this pull request Jan 12, 2022
@MichaIng
Copy link
Contributor Author

MichaIng commented Jan 12, 2022

I also ran some tests again with #352 (comment) and ranged concurrency/max_workers and the timeout. Indeed somewhere between 15 and 20 concurrent workers, the difference between 0.01 and 0.05 seconds timeout fades. Interestingly then even using a much higher timeout (like 1 or 2 seconds) has no effect in most of the test runs, only in rare cases when running the test I see a max latency of >=1 or >=2 seconds (tied to the timeout), meaning that all connections are handled within one loop iteration? I fail to fully understand the results to be true πŸ˜„. Basically I think that it means the system/cheroot has no idle/wait time due to the timeout but is kept busy to handle incoming (and outgoing) connections until either the timeout has been reached or all connections have been handled. E.g. here with 2 seconds timeout and 15 concurrent connections, the max latency reflects the timeout (not always):

min: 0.01392409999971278
mean: 0.6504647386999922
max: 2.0572720000000118
stdev: 0.9031201408467256
variance: 0.8156259888030095

And with 18 concurrent connections (still 2 seconds timeout):

min: 0.012550200000077893
mean: 0.05631920509999964
max: 0.5884331999995993
stdev: 0.036258509721686503
variance: 0.0013146795272376349

matching very closely the result with 0.01 seconds:

min: 0.012735999999677006
mean: 0.05238322417999916
max: 0.5964617999998154
stdev: 0.017878120680357632
variance: 0.00031962719906143125

The differences are all within the error range between test runs, and I did a lot of repeating whenever changing a value.

I then also raised the overall number of connections with pool_maxsize=10000 and for _ in range(10000) and higher (in linked test), and the timeout appears more often as max latency then, but usually has not much effect on the average latency. Further raising concurrently/max_workers increases the latency, but independent of the timeout, so looks like indeed the performance of the system to deal with connections is the limiting factor then.

So while there are still open questions, and it would be better to do a benchmark with server and client(s) running on different systems, I think that 0.05 is sufficiently low to not cause any significant difference on real world high traffic scenarios (high traffic ~ high concurrent traffic) and as well isn't a notable max latency for a single request, so that a single user would recognise a decreased page load performance.

MichaIng added a commit to HTPC-Manager/HTPC-Manager that referenced this pull request Jan 28, 2022
`cheroot` v8.6.0 has been released, used in CherryPy v9.0 and above, which has the high idle CPU load resolved: cherrypy/cheroot#401

Since this was the reason for using a CherryPy<9.0 version dependency, this is hereby removed.

Signed-off-by: MichaIng <micha@dietpi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants