Fix strategy validation thread-safety #4473

tybug · 2025-07-15T05:51:19Z

Part of #4451. This one is pretty rare, only triggering about once per full test suite run under --parallel-threads 2. There might be a way to write do_validate that takes recursion into account and only tracks validate_called = True at the end, but that's a bigger rewrite.

I'm somewhat worried about the unconditional lock overhead in the singlethreaded case. It's ~pure lock/unlock overhead, I expect ~zero contention. Here's a claude benchmark for .validate for lambda i: st.integers().map(lambda x: x + i)), where the lambda i and .map are cache-busters so each strategy really gets .validate called.

master        : 0.000080612s per call (12405 calls/sec)
pr        : 0.000086325s per call (11584 calls/sec)

Details

from hypothesis import given, settings, HealthCheck, strategies as st
from hypothesis.strategies import integers, text
from hypothesis import settings
import pytest
import timeit
import statistics

def benchmark_integers_validate():
    """Benchmark st.integers().validate() performance with cached strategies"""

    # Test different integer strategy configurations with unique mapped functions
    test_cases = [
        ("default", lambda i: st.integers().map(lambda x: x + i)),
        ("bounded", lambda i: st.integers(min_value=0, max_value=1000).map(lambda x: x * i)),
        ("min_only", lambda i: st.integers(min_value=0).map(lambda x: x - i)),
        ("max_only", lambda i: st.integers(max_value=1000).map(lambda x: x // (i + 1))),
        ("large_range", lambda i: st.integers(min_value=-1000000, max_value=1000000).map(lambda x: x % (i + 1))),
    ]

    results = {}

    for name, strategy_func in test_cases:
        print(f"\nBenchmarking {name} integers strategy with unique mapped functions:")

        # Benchmark creating and validating unique strategies
        def run_validate_with_unique_strategy():
            # Create a unique strategy for each call using a different function
            import random
            i = random.randint(1, 1000)  # Random value to make function unique
            strategy = strategy_func(i)
            return strategy.validate()

        # Run multiple times for accuracy
        times = timeit.repeat(run_validate_with_unique_strategy, repeat=100, number=100)

        # Calculate statistics
        mean_time = statistics.mean(times)
        median_time = statistics.median(times)
        min_time = min(times)
        max_time = max(times)
        std_dev = statistics.stdev(times) if len(times) > 1 else 0

        results[name] = {
            'mean': mean_time,
            'median': median_time,
            'min': min_time,
            'max': max_time,
            'std_dev': std_dev,
            'total_calls': 100 * 100
        }

        print(f"  Mean time: {mean_time:.6f} seconds per 100 calls")
        print(f"  Median time: {median_time:.6f} seconds per 100 calls")
        print(f"  Min time: {min_time:.6f} seconds per 100 calls")
        print(f"  Max time: {max_time:.6f} seconds per 100 calls")
        print(f"  Std dev: {std_dev:.6f} seconds")
        print(f"  Per call: {mean_time/100:.9f} seconds")
        print(f"  Calls per second: {100/mean_time:.0f}")

    # Summary comparison
    print("\n" + "="*50)
    print("SUMMARY COMPARISON")
    print("="*50)

    sorted_results = sorted(results.items(), key=lambda x: x[1]['mean'])

    for name, stats in sorted_results:
        print(f"{name:15s}: {stats['mean']/100:.9f}s per call ({100/stats['mean']:.0f} calls/sec)")

    return results

def benchmark_strategy_caching_comparison():
    """Compare performance of reused vs unique strategies"""

    print("\n" + "="*60)
    print("STRATEGY CACHING COMPARISON")
    print("="*60)

    # Test 1: Reusing the same strategy
    print("\nTest 1: Reusing same strategy")
    same_strategy = st.integers().map(lambda x: x + 1)

    def run_same_strategy():
        return same_strategy.validate()

    times_same = timeit.repeat(run_same_strategy, repeat=50, number=1000)
    mean_same = statistics.mean(times_same)

    print(f"  Mean time (same strategy): {mean_same:.6f} seconds per 1000 calls")
    print(f"  Per call: {mean_same/1000:.9f} seconds")

    # Test 2: Creating unique strategies each time
    print("\nTest 2: Creating unique strategies each time")

    def run_unique_strategies():
        import random
        i = random.randint(1, 1000000)
        strategy = st.integers().map(lambda x: x + i)
        return strategy.validate()

    times_unique = timeit.repeat(run_unique_strategies, repeat=50, number=100)
    mean_unique = statistics.mean(times_unique)

    print(f"  Mean time (unique strategies): {mean_unique:.6f} seconds per 100 calls")
    print(f"  Per call: {mean_unique/100:.9f} seconds")

    # Compare
    print(f"\nComparison:")
    print(f"  Same strategy per call: {mean_same/1000:.9f}s")
    print(f"  Unique strategy per call: {mean_unique/100:.9f}s")
    print(f"  Overhead ratio: {(mean_unique/100) / (mean_same/1000):.2f}x")
    print(f"  Caching saves: {((mean_unique/100) - (mean_same/1000)) / (mean_unique/100) * 100:.1f}% per call")

if __name__ == "__main__":
    print("Benchmarking st.integers().validate() performance with cached strategies...")
    benchmark_integers_validate()
    benchmark_strategy_caching_comparison()

Around 7% slower. Not great, since I've seen .validate be a performance hotspot before.

The following command reproduces the relevant failure on master: counter=1; while pytest hypothesis-python/tests/ --parallel-threads 2 -k test_invalid_args; do ((counter++)); done;

jobh · 2025-07-15T08:44:23Z

hypothesis-python/src/hypothesis/strategies/_internal/strategies.py

+
+        with validate_lock:
+            try:
+                self.validate_called = True


Should validate_called = True be moved below do_validate(), given the early return outside lock on l.486?

That would leave open the possibility of do_validate called multiple time in an initialization race (unless explicitly checked inside), probably harmless though?

Hm... per comment, this isn't sufficient. Maybe the early return needs to move inside critical region, or if that is too expensive possibly a two-stage process, i.e. early-return like

if self.validate_called and not self.validate_in_progress: return

I'm not thinking clearly right now, so please consider this a hint not a recipe

if self.validate_called and not self.validate_in_progress: is also going to run into an infinite recursion, but possibly what we could do is set validate_calleds: dict[int, bool] which tracks thread_id: validate_called, and only return if validate has been called on this thread before. This would be lock-free and allow for concurrent validates. If threading.get_ident() is not expensive then this could work. I'll test it

The downside of this is that all threads rerun validation for all strategies. It's a tradeoff between singlethreaded and multithreaded performance. I'm defaulting to prioritizing singlethreaded performance, but I could see us changing this in the future (or now, if people have preferences)

I'd prefer to keep prioritizing single-threaded perf for now, at least when it's not super-lopsided.

Potential caveat in this case: is it safe to concurrently run validation for a strategy in multiple threads? If not, we should probably go with the smallish cost of a lock.

concurrent validation should be safe, yeah. (in theory, and I've tested a full run in practice)

tybug · 2025-07-15T19:35:24Z

I went with the approach described in #4473 (comment). Benchmark:

validate: master
validate2: pr-threading-ident
validate3: pr-locks

In the next version of `hypothesis` subclasses of `hypothesis.strategies.SearchStrategy` will be required to call `super().__init__()` in their `__init__` method (HypothesisWorks/hypothesis#4473). This PR addresses this in the two subclasses in our codebase: `CFTimeStrategy` and `CFTimeStrategyISO8601`. Apparently this kind of subclassing is not actually part of the public API ([link](https://github.com/HypothesisWorks/hypothesis/pull/4473/files#diff-9abc0311b216f25f0b71cfff6b7043b22071d09a58cb949f6bc5022ddeaa8e7f)), so maybe we should adjust the approach here long term, but this at least gets the tests passing for now. - [x] Closes #10541

fix free threading strategy validation

42299bb

tybug force-pushed the free-threading-strategy-validation branch from 9378c6f to 42299bb Compare July 15, 2025 05:52

jobh reviewed Jul 15, 2025

View reviewed changes

tybug added 2 commits July 15, 2025 15:34

change tuples validate override to do_validate

751a089

lock-free validation approach

b64c36d

remove locks, move label

6837479

Zac-HD approved these changes Jul 15, 2025

View reviewed changes

tybug merged commit ad9d60c into HypothesisWorks:master Jul 15, 2025
120 of 121 checks passed

tybug deleted the free-threading-strategy-validation branch July 15, 2025 23:20

spencerkclark mentioned this pull request Jul 17, 2025

Call super().__init__() in st.SearchStrategy subclasses pydata/xarray#10543

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix strategy validation thread-safety #4473

Fix strategy validation thread-safety #4473

Uh oh!

tybug commented Jul 15, 2025 •

edited

Loading

Uh oh!

jobh Jul 15, 2025

Uh oh!

jobh Jul 15, 2025

Uh oh!

tybug Jul 15, 2025 •

edited

Loading

Uh oh!

tybug Jul 15, 2025

Uh oh!

Zac-HD Jul 15, 2025

Uh oh!

tybug Jul 15, 2025

Uh oh!

tybug commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Fix strategy validation thread-safety #4473

Fix strategy validation thread-safety #4473

Uh oh!

Conversation

tybug commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jobh Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

jobh Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

tybug Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tybug Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Zac-HD Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

tybug Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

tybug commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

tybug commented Jul 15, 2025 •

edited

Loading

tybug Jul 15, 2025 •

edited

Loading