Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support free-threaded Python #165

Merged
merged 14 commits into from
Nov 26, 2024
Merged

support free-threaded Python #165

merged 14 commits into from
Nov 26, 2024

Conversation

davidhewitt
Copy link
Collaborator

Add freethreaded Python support.

Pushing a bit early so that I can see results of benchmarks & CI.

Copy link

codecov bot commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 72.22222% with 5 lines in your changes missing coverage. Please review.

Project coverage is 88.74%. Comparing base (f970f0b) to head (afc89e6).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
crates/jiter/src/py_string_cache.rs 61.53% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #165      +/-   ##
==========================================
- Coverage   88.92%   88.74%   -0.19%     
==========================================
  Files          13       13              
  Lines        2195     2203       +8     
  Branches     2195     2203       +8     
==========================================
+ Hits         1952     1955       +3     
- Misses        148      153       +5     
  Partials       95       95              
Files with missing lines Coverage Δ
crates/jiter-python/src/lib.rs 93.22% <100.00%> (ø)
crates/jiter/src/py_string_cache.rs 92.98% <61.53%> (-4.19%) ⬇️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f970f0b...afc89e6. Read the comment docs.

Copy link

codspeed-hq bot commented Nov 19, 2024

CodSpeed Performance Report

Merging #165 will improve performances by 11.93%

Comparing dh/free-threaded (afc89e6) with main (dbf0c52)

Summary

⚡ 1 improvements
✅ 72 untouched benchmarks

Benchmarks breakdown

Benchmark main dh/free-threaded Change
unicode_jiter_iter 8.6 µs 7.7 µs +11.93%

@@ -86,28 +85,34 @@ impl StringMaybeCache for StringNoCache {
}
}

static STRING_CACHE: GILOnceCell<GILProtected<RefCell<PyStringCache>>> = GILOnceCell::new();
static STRING_CACHE: OnceLock<Mutex<PyStringCache>> = OnceLock::new();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the use of a mutex here, the single-threaded benchmark is not meaningfully impacted.

We can worry about multithreaded performance in the far future if highly parallel uses of jiter should arise; and if users hit a pathological case before we do anything fancy they can always turn off the string cache.

Comment on lines +355 to +369
def test_multithreaded_parsing():
"""Basic sanity check that running a parse in multiple threads is fine."""
expected_datas = [json.loads(data) for data in JITER_BENCH_DATAS]

def assert_jiter_ok(data: bytes, expected: Any) -> bool:
return jiter.from_json(data) == expected

with ThreadPoolExecutor(8) as pool:
results = []
for _ in range(1000):
for data, expected_result in zip(JITER_BENCH_DATAS, expected_datas):
results.append(pool.submit(assert_jiter_ok, data, expected_result))

for result in results:
assert result.result()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm with this simple test that the Mutex basically stops parallelism when the cache is enabled, using jiter.from_json(data, cache_mode="none") leads to about 8x speedup on my machine.

I still don't mind that really, this PR just gets the freethreaded mode working and we can worry about that sort of optimization later.

@davidhewitt davidhewitt marked this pull request as ready for review November 19, 2024 20:33
@davidhewitt
Copy link
Collaborator Author

@samuelcolvin I think this is good to ship; main thing for you to be aware of is the mutex on the cache as per the above, so parallelism is not as good as it could be with the cache.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Please can we add something to the readme explaining that you'll want to set the string cache to none if you want free-threading support.

Also an example in the readme of how free threading can speed things up might be nice.

@davidhewitt
Copy link
Collaborator Author

Ok, so in the process of writing the README I wrote a more complete measurement:

Measurement script
from concurrent.futures import ThreadPoolExecutor
import time
import jiter

from pathlib import Path

JITER_BENCH_DIR = Path(__file__).parent.parent / 'jiter' / 'benches'

JITER_BENCH_SAMPLES = [
    (JITER_BENCH_DIR / 'bigints_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'floats_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'massive_ints_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'medium_response.json').read_bytes(),
    (JITER_BENCH_DIR / 'pass1.json').read_bytes(),
    (JITER_BENCH_DIR / 'pass2.json').read_bytes(),
    (JITER_BENCH_DIR / 'sentence.json').read_bytes(),
    (JITER_BENCH_DIR / 'short_numbers.json').read_bytes(),
    (JITER_BENCH_DIR / 'string_array_unique.json').read_bytes(),
    (JITER_BENCH_DIR / 'string_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'true_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'true_object.json').read_bytes(),
    (JITER_BENCH_DIR / 'unicode.json').read_bytes(),
    (JITER_BENCH_DIR / 'x100.json').read_bytes(),
]

# warmup run, deliberately don't fill cache
with ThreadPoolExecutor(8) as pool:
    results = []
    for _ in range(1000):
        for sample in JITER_BENCH_SAMPLES:
            results.append(pool.submit(jiter.from_json, sample, cache_mode='none'))

del results[:]

# run without cache
with ThreadPoolExecutor(8) as pool:
    results = []
    start = time.monotonic()
    for _ in range(1000):
        for sample in JITER_BENCH_SAMPLES:
            results.append(pool.submit(jiter.from_json, sample, cache_mode='none'))
    no_cache_duration = time.monotonic() - start

del results[:]

# run with cache
with ThreadPoolExecutor(8) as pool:
    results = []
    start = time.monotonic()
    for _ in range(1000):
        for sample in JITER_BENCH_SAMPLES:
            results.append(pool.submit(jiter.from_json, sample, cache_mode='all'))
    with_cache_duration = time.monotonic() - start

del results[:]

print('ratio:', with_cache_duration / no_cache_duration)

On my M1 Mac, this suggests that cache_mode='all' is still about 2x faster than cache_mode='none' even with the Mutex on the free-threading build.

Also, testing against the GIL-enabled Python I see a similar total execution time, but a lot more total CPU consumed. Initial free-threaded Python release is slow :)

Overall, I think this means my previous observation that the cache was a problem on the free-threaded build is premature, and we can leave it out of the README for now.

@davidhewitt davidhewitt merged commit 72fc9ef into main Nov 26, 2024
57 of 59 checks passed
@davidhewitt davidhewitt deleted the dh/free-threaded branch November 26, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants