Skip to content

gh-120496: Use CAS approach for rangeiter_next #120534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Jun 15, 2024

nit
corona10 added 5 commits June 15, 2024 10:23
fix

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@corona10 corona10 marked this pull request as ready for review June 15, 2024 03:12
@corona10 corona10 added needs backport to 3.13 bugs and security fixes and removed awaiting core review labels Jun 15, 2024
@Fidget-Spinner
Copy link
Member

Thanks for doing this! Do you have any benchmark results on how much the perf impact is? versus say using Py_BEGIN_CRITICAL_SECTION ?

@corona10
Copy link
Member Author

Thanks for doing this! Do you have any benchmark results on how much the perf impact is? versus say using

I will try to prepare :)

@corona10
Copy link
Member Author

corona10 commented Jun 15, 2024

@Fidget-Spinner @colesbury

I am not sure this will be a fair benchmark, but it shows different strengths in terms of technique and size of N.

Single thread

import pyperf

runner = pyperf.Runner()
runner.timeit(name="bench iter single",
              stmt="""
data = list(it)
""",
              setup = """
N = 100
it = iter(range(N))
"""
              )
  • pyperf compare_to single_base.json single_cas.json
    Benchmark hidden because not significant (1): bench iter single

Mutl thread

import pyperf

runner = pyperf.Runner()
runner.timeit(name="bench iter multi",
              stmt="""
for _ in range(100):
    it = iter(range(N))
    with concurrent.futures.ThreadPoolExecutor() as executor:
        data = list(executor.map(lambda _: next(it), range(N)))
""",
              setup = """
import concurrent.futures
N = 100
"""
              )
  • Mean +- std dev: [multi_baseline] 284 ms +- 11 ms -> [multi_cas] 277 ms +- 20 ms: 1.03x faster

@Fidget-Spinner
Copy link
Member

Hmmm the perf difference is not that large. That might imply that we should just use critical section for simplicity, and hope that someone on PyPI publishes a "prange()" or something like that.

@corona10
Copy link
Member Author

That might imply that we should just use critical section for simplicity, and hope that someone on PyPI publishes a "prange()" or something like that.

Yeah, I prefer this one.

if (len <= 0) {
return NULL;
}
long result = _Py_atomic_load_long_relaxed(&r->start);
Copy link
Contributor

@eendebakpt eendebakpt Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have a range object with len=1 and many threads simultaneously at this point? The check on len has already been done, so the threads each increment the value of start and we end up with a value for result larger than end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe fail at _Py_atomic_compare_exchange_long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. yes understood what you pointed out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a slight timing issue. Then yes, let's just use the critical section that @Fidget-Spinner also preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prepare for the overcommitted issue, we can validate the value of the result by comparing the end value.
But let's just use critical section approach.

@corona10
Copy link
Member Author

See: #120540

@corona10 corona10 closed this Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants