support free-threaded Python #165

davidhewitt · 2024-11-19T11:57:53Z

Add freethreaded Python support.

Pushing a bit early so that I can see results of benchmarks & CI.

codecov · 2024-11-19T11:59:41Z

Codecov Report

Attention: Patch coverage is 72.22222% with 5 lines in your changes missing coverage. Please review.

Project coverage is 88.74%. Comparing base (f970f0b) to head (afc89e6).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/jiter/src/py_string_cache.rs	61.53%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #165      +/-   ##
==========================================
- Coverage   88.92%   88.74%   -0.19%     
==========================================
  Files          13       13              
  Lines        2195     2203       +8     
  Branches     2195     2203       +8     
==========================================
+ Hits         1952     1955       +3     
- Misses        148      153       +5     
  Partials       95       95

Files with missing lines	Coverage Δ
crates/jiter-python/src/lib.rs	`93.22% <100.00%> (ø)`
crates/jiter/src/py_string_cache.rs	`92.98% <61.53%> (-4.19%)`	⬇️

... and 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f970f0b...afc89e6. Read the comment docs.

codspeed-hq · 2024-11-19T12:02:34Z

CodSpeed Performance Report

Merging #165 will improve performances by 11.93%

_{Comparing dh/free-threaded (afc89e6) with main (dbf0c52)}

Summary

⚡ 1 improvements
✅ 72 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`dh/free-threaded`	Change
⚡	`unicode_jiter_iter`	8.6 µs	7.7 µs	+11.93%

davidhewitt · 2024-11-19T13:20:04Z

crates/jiter/src/py_string_cache.rs

@@ -86,28 +85,34 @@ impl StringMaybeCache for StringNoCache {
    }
 }

-static STRING_CACHE: GILOnceCell<GILProtected<RefCell<PyStringCache>>> = GILOnceCell::new();
+static STRING_CACHE: OnceLock<Mutex<PyStringCache>> = OnceLock::new();


Despite the use of a mutex here, the single-threaded benchmark is not meaningfully impacted.

We can worry about multithreaded performance in the far future if highly parallel uses of jiter should arise; and if users hit a pathological case before we do anything fancy they can always turn off the string cache.

davidhewitt · 2024-11-19T14:57:17Z

crates/jiter-python/tests/test_jiter.py

+def test_multithreaded_parsing():
+    """Basic sanity check that running a parse in multiple threads is fine."""
+    expected_datas = [json.loads(data) for data in JITER_BENCH_DATAS]
+
+    def assert_jiter_ok(data: bytes, expected: Any) -> bool:
+        return jiter.from_json(data) == expected
+
+    with ThreadPoolExecutor(8) as pool:
+        results = []
+        for _ in range(1000):
+            for data, expected_result in zip(JITER_BENCH_DATAS, expected_datas):
+                results.append(pool.submit(assert_jiter_ok, data, expected_result))
+
+        for result in results:
+            assert result.result()


Can confirm with this simple test that the Mutex basically stops parallelism when the cache is enabled, using jiter.from_json(data, cache_mode="none") leads to about 8x speedup on my machine.

I still don't mind that really, this PR just gets the freethreaded mode working and we can worry about that sort of optimization later.

davidhewitt · 2024-11-19T20:34:16Z

@samuelcolvin I think this is good to ship; main thing for you to be aware of is the mutex on the cache as per the above, so parallelism is not as good as it could be with the cache.

crates/jiter-python/src/lib.rs

samuelcolvin

LGTM

Please can we add something to the readme explaining that you'll want to set the string cache to none if you want free-threading support.

Also an example in the readme of how free threading can speed things up might be nice.

davidhewitt · 2024-11-26T14:20:41Z

Ok, so in the process of writing the README I wrote a more complete measurement:

Measurement script

from concurrent.futures import ThreadPoolExecutor
import time
import jiter

from pathlib import Path

JITER_BENCH_DIR = Path(__file__).parent.parent / 'jiter' / 'benches'

JITER_BENCH_SAMPLES = [
    (JITER_BENCH_DIR / 'bigints_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'floats_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'massive_ints_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'medium_response.json').read_bytes(),
    (JITER_BENCH_DIR / 'pass1.json').read_bytes(),
    (JITER_BENCH_DIR / 'pass2.json').read_bytes(),
    (JITER_BENCH_DIR / 'sentence.json').read_bytes(),
    (JITER_BENCH_DIR / 'short_numbers.json').read_bytes(),
    (JITER_BENCH_DIR / 'string_array_unique.json').read_bytes(),
    (JITER_BENCH_DIR / 'string_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'true_array.json').read_bytes(),
    (JITER_BENCH_DIR / 'true_object.json').read_bytes(),
    (JITER_BENCH_DIR / 'unicode.json').read_bytes(),
    (JITER_BENCH_DIR / 'x100.json').read_bytes(),
]

# warmup run, deliberately don't fill cache
with ThreadPoolExecutor(8) as pool:
    results = []
    for _ in range(1000):
        for sample in JITER_BENCH_SAMPLES:
            results.append(pool.submit(jiter.from_json, sample, cache_mode='none'))

del results[:]

# run without cache
with ThreadPoolExecutor(8) as pool:
    results = []
    start = time.monotonic()
    for _ in range(1000):
        for sample in JITER_BENCH_SAMPLES:
            results.append(pool.submit(jiter.from_json, sample, cache_mode='none'))
    no_cache_duration = time.monotonic() - start

del results[:]

# run with cache
with ThreadPoolExecutor(8) as pool:
    results = []
    start = time.monotonic()
    for _ in range(1000):
        for sample in JITER_BENCH_SAMPLES:
            results.append(pool.submit(jiter.from_json, sample, cache_mode='all'))
    with_cache_duration = time.monotonic() - start

del results[:]

print('ratio:', with_cache_duration / no_cache_duration)

On my M1 Mac, this suggests that cache_mode='all' is still about 2x faster than cache_mode='none' even with the Mutex on the free-threading build.

Also, testing against the GIL-enabled Python I see a similar total execution time, but a lot more total CPU consumed. Initial free-threaded Python release is slow :)

Overall, I think this means my previous observation that the cache was a problem on the free-threaded build is premature, and we can leave it out of the README for now.

davidhewitt added 3 commits November 19, 2024 13:13

support free-threaded Python

37c94aa

fix benches

9bf932a

adjust job names

aa38b16

davidhewitt force-pushed the dh/free-threaded branch from e4ab0b7 to aa38b16 Compare November 19, 2024 13:13

davidhewitt commented Nov 19, 2024

View reviewed changes

davidhewitt added 2 commits November 19, 2024 13:55

clean up .expect() calls

f4d306f

add multithreaded test

27d50ab

davidhewitt force-pushed the dh/free-threaded branch from 98b54fd to 787d2b0 Compare November 19, 2024 14:54

davidhewitt added the Full Build label Nov 19, 2024

davidhewitt commented Nov 19, 2024

View reviewed changes

try add PGO build for 3.13t

58dace4

davidhewitt force-pushed the dh/free-threaded branch from 787d2b0 to 58dace4 Compare November 19, 2024 15:01

davidhewitt added 6 commits November 19, 2024 15:02

use quansight labs setup python

64f01b8

remove --interpreter flag from PGO builds

861a787

fix maturin --interpreter option

6dd046a

help maturin-action find freethreaded Python

7593606

fixup

910cafb

fixup interpreter for macos

0f9d876

davidhewitt mentioned this pull request Nov 19, 2024

Python 3.13t support PyO3/maturin-action#300

Closed

add maturin issue link

8646585

davidhewitt marked this pull request as ready for review November 19, 2024 20:33

davidhewitt commented Nov 19, 2024

View reviewed changes

crates/jiter-python/src/lib.rs Outdated Show resolved Hide resolved

Update crates/jiter-python/src/lib.rs

afc89e6

rostan-t mentioned this pull request Nov 22, 2024

Plan to support free-threaded Python pydantic/pydantic-core#1555

Open

samuelcolvin approved these changes Nov 26, 2024

View reviewed changes

davidhewitt merged commit 72fc9ef into main Nov 26, 2024
57 of 59 checks passed

davidhewitt deleted the dh/free-threaded branch November 26, 2024 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support free-threaded Python #165

support free-threaded Python #165

davidhewitt commented Nov 19, 2024

codecov bot commented Nov 19, 2024 •

edited

Loading

codspeed-hq bot commented Nov 19, 2024 •

edited

Loading

davidhewitt Nov 19, 2024

davidhewitt Nov 19, 2024

davidhewitt commented Nov 19, 2024

samuelcolvin left a comment

davidhewitt commented Nov 26, 2024

support free-threaded Python #165

support free-threaded Python #165

Conversation

davidhewitt commented Nov 19, 2024

codecov bot commented Nov 19, 2024 • edited Loading

Codecov Report

codspeed-hq bot commented Nov 19, 2024 • edited Loading

CodSpeed Performance Report

Merging #165 will improve performances by 11.93%

Summary

Benchmarks breakdown

davidhewitt Nov 19, 2024

Choose a reason for hiding this comment

davidhewitt Nov 19, 2024

Choose a reason for hiding this comment

davidhewitt commented Nov 19, 2024

samuelcolvin left a comment

Choose a reason for hiding this comment

davidhewitt commented Nov 26, 2024

codecov bot commented Nov 19, 2024 •

edited

Loading

codspeed-hq bot commented Nov 19, 2024 •

edited

Loading