Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce function calling overhead by ~25% in ReferencePool::update_counts: #1608

Merged
merged 1 commit into from
May 15, 2021

Conversation

alex
Copy link
Contributor

@alex alex commented May 15, 2021

  1. Place both increfs and decrefs behind a single mutex, rather than two. Even uncontended mutexes aren't free to acquire.
  2. Keep a dirty tracking bool to avoid acquiring any mutexes at all in the common case of no modifications (because everything happened with the GIL held)

Benchmarks

Before

(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 63.6 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 63.7 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 64.1 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 63.6 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 63.5 nsec per loop

With single lock

(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 51.9 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 52 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 52.9 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 51.8 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 54.7 nsec per loop

With both single lock and dirty flag

(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 47.2 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 46.7 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 46.6 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 46.6 nsec per loop
(.venv) alex@media-01:~/p/pyo3$ python3 -mtimeit -s "from pyo3_benchmarks import no_args" "no_args()"
5000000 loops, best of 5: 47.1 nsec per loop

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! These are some really great optimizations which many people will be happy with. I think there's a potential deadlock though (which we can avoid), see comment below.

src/gil.rs Outdated
unsafe { ffi::Py_INCREF(ptr.as_ptr()) };
}

for ptr in swap_vec_with_lock!(self.pointers_to_decref) {
for ptr in swap_vec!(ops.1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful here. Decrementing references can run arbitrary Python drop code, which can lead to GIL release which may eventually deadlock because we hold the lock here. (I think this mutex is also non reentrant so may deadlock on a single thread.)

After swapping ops.1 we should release the lock before decreasing any references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, should be good now.

src/gil.rs Outdated Show resolved Hide resolved
…nts:

1) Place both increfs and decrefs behind a single mutex, rather than two. Even uncontended mutexes aren't free to acquire.
2) Keep a dirty tracking bool to avoid acquiring any mutexes at all in the common case of no modifications (because everything happened with the GIL held)
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks very much 😌

@davidhewitt davidhewitt merged commit c48e5b0 into PyO3:main May 15, 2021
@alex alex deleted the opt-no-args branch May 15, 2021 23:53
@kngwyu
Copy link
Member

kngwyu commented May 16, 2021

Looks like I was late for the party... Very interesting to see this optimization works, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants