-
-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DIS: Cython 3 perf regressions #27086
Comments
Thanks. We should be able to observe the regression in the continuous benchmark: https://scikit-learn.org/scikit-learn-benchmarks/ |
Actually we don't have any results uploaded since July 12. Maybe @jeremiedbb can help :) |
Also we don't use Cython 3 in the main branch yet as far as I know. We only test Cython 3 in the scipy-dev build which is not benchmarked. |
We updated recently the lock file (during EuroSciPy sprint): https://github.com/scikit-learn/scikit-learn/pull/27024/files and thus we are now using Cython 3.0.
I checked and there is too much variation to resort to the test suite. |
I missed that. Thanks for the heads up. |
Actually since there's no upper version pin for Cython, we might have been using Cython 3 since its release. |
Hm, See ours for reference https://asv-runner.github.io/asv-collection/pandas/#frame_ctor.FromDicts.time_nested_dict |
I think we need to pin What do you think? I can't look into that now, but could someone else do? 🙏 |
@jeremiedbb is actually looking at the ASV issue and will run before and after #27024 that changed the lock files introducing Cython 3.0 |
Actually the use of cython 3 in the benchmarks is not linked to the update of the lock files. The lock files are only for the CI while the asv benchmarks run against the latest version available. From a quick look I can see some severe regressions:
We need more run to check that it does not come from a one time issue with the machine running the benchmarks. Additionnal checks are also needed to make sure cython 3 is the reason of these regressions since the benchmarks were not uploaded for a long time. |
I added additional runs on asv and the slowdown looks persistent https://scikit-learn.org/scikit-learn-benchmarks/#/. I could reproduce for the RandomForest locally on the last commit of main, just changing the version of cython. Although the slowdown is lower on my laptop (still x2). It makes me quite confident that cython 3 is the reason of these slowdowns. I'm also convinced that we should pin the versions of the dependencies in the asv conf to be able to distinguish between regression coming from our code and regression coming from the dependencies. I'm going to open a PR for that. |
I'll take a look at profiling a couple of the slowdowns. Hopefully I can get a sense of what actual changes may be causing this. |
@da-woods have you seen other reports of these slowdowns? I'm wondering if there are any hints as to what may be happening |
General comments - haven't looked into your specific regressions: The most common issue I've seen is due to the switch from function being implicitly This also applies to functions returning a fused type for slightly more complicated reasons (but the same basic idea). I'm working on a) a fix for the fused return type and b) better diagnostics of this issue (i.e. to point out places where it might be an issue). Hopefully those will make it into the next release. Hopefully that gives you a clue where to start looking. If you want I can try to find time to investigate myself (but you'll probably need to give me idiot-proof instructions on how to run the benchmark) |
I think I forgot to say how to fix it (if this is actually the issue). The choices are:
|
I profiled the RandomForestClassifier with py-spy and it shows that with cython 3 there's a lot of overhead coming from pthread locks that were not there before. scikit-learn/sklearn/tree/_criterion.pyx Lines 487 to 554 in a05eb6b
|
From the graph you show it definitely looks like a "waiting on the GIL" issue. Although it isn't obvious quite what's causing it in that function. There's also an issue that's been reported about acquiring the GIL for memoryview reference counting at the end of the function, so it's possibly that. That issue would need a fix on our side. It might go away if you used |
#27266 has been opened to have scikit-learn depend on Cython < 3. |
Would it be possible to remeasure RandomForestClassifier with cython/cython#5678? My semi-informed suspicion is that this might well fix it. (Or to let me know how to run the benchmark and I can measure it) |
FYI, it seems like CircleCI timeout issues are due (for the most part at least) to Cython 3. In particular |
@da-woods looks like cython/cython#5678 improved things a bit but not entirely. I ran the same profile with the master branch of cython: The gil issue at the end of
I'm going to try that, thanks for looking into it. |
That does look helpful thanks. There's two related problems:
I'll try to fix it in the next couple of days |
See scikit-learn/scikit-learn#27086 Essentially: * I'd made functions raising an exception require refnanny even since prange/parallel block exception handling required it * I actually don't think this is sufficient and that there's other ways of raising an exception within a parallel block. * The upshot is that *any* function with a call to a function with a checked exception now requires refnanny. * For functions with object arguments (i.e. any cdef class method) that leads to the GIL always being acquired around the refnanny setup (even when refnanny is disabled). I've fixed it by avoiding using refnanny in parallel blocks unless we know it's available. I think it's too hard to tell reliably ahead of time (at least for me). I think the `have_object_args and has_with_gil_block` criteria for acquiring the GIL around function definitions might be excessive, and possibly unnecessary some of the time. It's possibly mainly reassigned object args. This might be worth refactoring but I don't think it's necessary to fix this particular performance regression.
Originally reported in scikit-learn/scikit-learn#27086 Essentially: * I'd made functions raising an exception require refnanny even since prange/parallel block exception handling required it * I actually don't think this is sufficient and that there's other ways of raising an exception within a parallel block. * The upshot is that *any* function with a call to a function with a checked exception now requires refnanny. * For functions with object arguments (i.e. any cdef class method) that leads to the GIL always being acquired around the refnanny setup (even when refnanny is disabled). I've fixed it by avoiding using refnanny in parallel blocks unless we know it's available and has actually been used in the generated code. I think it's too hard to tell reliably ahead of time.
Fix committed, will be resolved in Cython 3.0.3. |
Good news, I ran some benchmarks regarding the estimators for which we identified slowdowns coming from cython 3 ( It should be confirmed in the asv benchmarks when we update them to run against cython 3.0.3 (when it's released) |
Can we close? |
@jeremiedbb merged added Cython 3.0.3 in the ASV benchmark. We should be able to check that we don't have remaining regressions. |
I had a look at https://scikit-learn.org/scikit-learn-benchmarks/#regressions?sort=3&dir=desc and the last commits are still from early September, so the upload of the results does not seem to work and does not allow to conclude if the Cython regressions have been fixed in 3.0.3. |
Actually the results are uploaded, see: https://github.com/scikit-learn/scikit-learn-benchmarks/commits/master . However the plots do not show them... |
Actually, they show up on when you navigate to the results from the homepage grid, for instance to: which does indeed show that the regression was fixed for that model. However when navigating from the regressions list, they do not show up because the environment has changed and we only see the points from the environment that experienced the regression but not the new points that where subsequently collected with the new environment. This is quite unfortunate. It's a UX problem in the ASV report. I will manually check the other estimators that experienced a regression to see if they are all fast again. |
Actually, it's simpler: when you come from the regressions list view, you can click on Cython 3.0.3 in the left side panel to add the new point and see that the problem was fixed. |
Ok I checked and apparently all the big regressions that appeared in July and August (after 1.3.0) when switching to Cython 3.0.0 have been fixed after the update to Cython 3.0.3. So let's close! Note: there are still some smaller and much older regressions for benchmarks other benchmarks but they are unrelated (regressions that happened around 0.24.x). |
I will open a PR to unpin cython in our builds. |
This is taking more time than expected because I faced a mamba bug with virtual packages and conda-lock that was quite counter intuitive to debug because of the hidden error in a nested subprocess that makes it challenging to launch a debugger. Anyways, here is the PR to fix in mamba: I will proceed with the unpinning tomorrow. |
It looks like scikit-learn hasn't updated to Cython 3 yet, but when you do, can you keep an eye out for performance regressions?
On pandas, we updated to Cython 3 but then reverted, since we had issues with some of our benchmarks regressing, and it would be nice to know if this is a bigger problem in Cython.
The text was updated successfully, but these errors were encountered: