Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Research: what can we test with Hypothesis? #107862

Open
sobolevn opened this issue Aug 11, 2023 · 10 comments
Open

[Meta] Research: what can we test with Hypothesis? #107862

sobolevn opened this issue Aug 11, 2023 · 10 comments
Labels
tests Tests in the Lib/test dir type-feature A feature request or enhancement

Comments

@sobolevn
Copy link
Member

sobolevn commented Aug 11, 2023

We now have both the code and CI job to run hypothesis tests.

In this issue I invite everyone to propose ideas: what can we test with it?
Criteria:

  1. Tests should be suitable for property-based nature of hypothesis
  2. The action under test should be fast and reliable (since hypothesis will produce a lot of cases to tests, we cannot do any slow / network related things)
  3. Strategies for data generation should be rather simple (I think that later we can get to more complex strategies, but for now for the sake of simplicity and maintability I propose not to get too crazy with them)

What else?

Known applications

Property-based tests are great when there are certain patterns:

  • decode and encode
  • dumps and loads

There are also some hidden patterns as (example):

  • If 'a' in s then
  • s.index('a') must return an integer

Existing work

Good examples of exising stuff:

Linked PRs

@sobolevn sobolevn added type-feature A feature request or enhancement tests Tests in the Lib/test dir labels Aug 11, 2023
@sobolevn
Copy link
Member Author

Right now I've ported and modified several binascii tests from https://github.com/Zac-HD/stdlib-property-tests/blob/master/tests/test_encode_decode.py :)

@Zac-HD
Copy link
Contributor

Zac-HD commented Aug 12, 2023

I'd suggest looking at PyPy's property-based tests, since they're known to be useful for developers of a Python implementation.

I'd also reconsider (3). While complicated strategies are obviously more work to develop and use, in my experience they're also disproportionately likely to find bugs - precisely because code working with complicated data structures are even more difficult to test without Hypothesis, and so there are usually weird edge cases.

Examples of this effect include #84838, #86384, #89901, and #83134. The hypothesmith source-code strategies are at a pre-alpha / proof-of-concept stage and have more third-party dependencies so I'm not sure they'd be a good fit for CPython at this time, but you get the idea. I don't have capacity to implement it myself, but could supervise a volunteer or contractor to build a production-ready strategy if there's interest.

@rhettinger
Copy link
Contributor

Personally, I don't think we should go down this path. Hypothesis has a certain amount of randomness to it. In general, our tests should be specifically designed to cover every path of interest.

When someone is designing new code, it is perfectly reasonable to use Hypothesis. However, for existing code, we "let's use Hypothesis" isn't a goal directed as a specific problem. One issue that we've had with people pursuing a "let's use Hypothesis" goal is that they are rarely working on modules that they fully understand, that the invariants they mentally invent aren't the actual invariants and reflect bugs in their own understanding rather than actual bugs in the code. For example, we got reports on colorsys conversions not being exactly invertible; however, due to color gamut limitations they can't always be inverted (information is lost) and the conversion tables were dictated by published standards that were invertible. We also got false report on the random module methods by people who just let Hypothesis plug in extreme values without any thought of what the method was actually supposed to do in real use cases. There may have been one actual (but very minor) bug in the standard library found by Hypothesis, but everything else was just noise.

It would be perfectly reasonable to use Hypothesis outside of our test suite and then report an actual bug if found. Otherwise, I think actually checking in the h-tests would just garbage-up our test suite, make it run less deterministically, and no explicitly write-out all the cases being covered.

@brandonardenwalli

This comment was marked as off-topic.

@sobolevn

This comment was marked as off-topic.

@brandonardenwalli

This comment was marked as off-topic.

@sobolevn
Copy link
Member Author

@rhettinger

It would be perfectly reasonable to use Hypothesis outside of our test suite and then report an actual bug if found. Otherwise, I think actually checking in the h-tests would just garbage-up our test suite, make it run less deterministically, and no explicitly write-out all the cases being covered.

But, we already have Hypothesis as a part of your test suite and CI. It already has some tests to it. Link:

test_hypothesis:
name: "Hypothesis tests on Ubuntu"
runs-on: ubuntu-20.04
timeout-minutes: 60
needs: check_source
if: needs.check_source.outputs.run_tests == 'true' && needs.check_source.outputs.run_hypothesis == 'true'
env:
OPENSSL_VER: 1.1.1v
PYTHONSTRICTEXTENSIONBUILD: 1
steps:
- uses: actions/checkout@v3
- name: Register gcc problem matcher
run: echo "::add-matcher::.github/problem-matchers/gcc.json"
- name: Install Dependencies
run: sudo ./.github/workflows/posix-deps-apt.sh
- name: Configure OpenSSL env vars
run: |
echo "MULTISSL_DIR=${GITHUB_WORKSPACE}/multissl" >> $GITHUB_ENV
echo "OPENSSL_DIR=${GITHUB_WORKSPACE}/multissl/openssl/${OPENSSL_VER}" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=${GITHUB_WORKSPACE}/multissl/openssl/${OPENSSL_VER}/lib" >> $GITHUB_ENV
- name: 'Restore OpenSSL build'
id: cache-openssl
uses: actions/cache@v3
with:
path: ./multissl/openssl/${{ env.OPENSSL_VER }}
key: ${{ runner.os }}-multissl-openssl-${{ env.OPENSSL_VER }}
- name: Install OpenSSL
if: steps.cache-openssl.outputs.cache-hit != 'true'
run: python3 Tools/ssl/multissltests.py --steps=library --base-directory $MULTISSL_DIR --openssl $OPENSSL_VER --system Linux
- name: Add ccache to PATH
run: |
echo "PATH=/usr/lib/ccache:$PATH" >> $GITHUB_ENV
- name: Configure ccache action
uses: hendrikmuhs/ccache-action@v1.2
- name: Setup directory envs for out-of-tree builds
run: |
echo "CPYTHON_RO_SRCDIR=$(realpath -m ${GITHUB_WORKSPACE}/../cpython-ro-srcdir)" >> $GITHUB_ENV
echo "CPYTHON_BUILDDIR=$(realpath -m ${GITHUB_WORKSPACE}/../cpython-builddir)" >> $GITHUB_ENV
- name: Create directories for read-only out-of-tree builds
run: mkdir -p $CPYTHON_RO_SRCDIR $CPYTHON_BUILDDIR
- name: Bind mount sources read-only
run: sudo mount --bind -o ro $GITHUB_WORKSPACE $CPYTHON_RO_SRCDIR
- name: Restore config.cache
uses: actions/cache@v3
with:
path: ${{ env.CPYTHON_BUILDDIR }}/config.cache
key: ${{ github.job }}-${{ runner.os }}-${{ needs.check_source.outputs.config_hash }}
- name: Configure CPython out-of-tree
working-directory: ${{ env.CPYTHON_BUILDDIR }}
run: |
../cpython-ro-srcdir/configure \
--config-cache \
--with-pydebug \
--with-openssl=$OPENSSL_DIR
- name: Build CPython out-of-tree
working-directory: ${{ env.CPYTHON_BUILDDIR }}
run: make -j4
- name: Display build info
working-directory: ${{ env.CPYTHON_BUILDDIR }}
run: make pythoninfo
- name: Remount sources writable for tests
# some tests write to srcdir, lack of pyc files slows down testing
run: sudo mount $CPYTHON_RO_SRCDIR -oremount,rw
- name: Setup directory envs for out-of-tree builds
run: |
echo "CPYTHON_BUILDDIR=$(realpath -m ${GITHUB_WORKSPACE}/../cpython-builddir)" >> $GITHUB_ENV
- name: "Create hypothesis venv"
working-directory: ${{ env.CPYTHON_BUILDDIR }}
run: |
VENV_LOC=$(realpath -m .)/hypovenv
VENV_PYTHON=$VENV_LOC/bin/python
echo "HYPOVENV=${VENV_LOC}" >> $GITHUB_ENV
echo "VENV_PYTHON=${VENV_PYTHON}" >> $GITHUB_ENV
./python -m venv $VENV_LOC && $VENV_PYTHON -m pip install -U hypothesis
- name: 'Restore Hypothesis database'
id: cache-hypothesis-database
uses: actions/cache@v3
with:
path: ./hypothesis
key: hypothesis-database-${{ github.head_ref || github.run_id }}
restore-keys: |
- hypothesis-database-
- name: "Run tests"
working-directory: ${{ env.CPYTHON_BUILDDIR }}
run: |
# Most of the excluded tests are slow test suites with no property tests
#
# (GH-104097) test_sysconfig is skipped because it has tests that are
# failing when executed from inside a virtual environment.
${{ env.VENV_PYTHON }} -m test \
-W \
-o \
-j4 \
-x test_asyncio \
-x test_multiprocessing_fork \
-x test_multiprocessing_forkserver \
-x test_multiprocessing_spawn \
-x test_concurrent_futures \
-x test_socket \
-x test_subprocess \
-x test_signal \
-x test_sysconfig
- uses: actions/upload-artifact@v3
if: always()
with:
name: hypothesis-example-db
path: .hypothesis/examples/

I propose adding more cases, where it makes sense.

In general, our tests should be specifically designed to cover every path of interest.

I agree that our regular tests should cover all paths, but there are more to it. Path coverage is only as good as 100%. But, we are obviously limited by the number of data we can provide. We cannot come up with lots of data, our current test suites proove my point.

But, Hypothesis can. This is exactly what it is good at: proving lots of correctly structured data.

Hypothesis has a certain amount of randomness to it

There are different way on how we control the randomness. First, we use a database with examples:

from hypothesis.database import (
GitHubArtifactDatabase,
MultiplexedDatabase,
ReadOnlyDatabase,
)
hypothesis.settings.register_profile(
"cpython-local-dev",
database=MultiplexedDatabase(
hypothesis.settings.default.database,
ReadOnlyDatabase(GitHubArtifactDatabase("python", "cpython")),
),
)
hypothesis.settings.load_profile("cpython-local-dev")
Plus, Hypothesis itself controls how data is generated to be repeatable.

One issue that we've had with people pursuing a "let's use Hypothesis" goal is that they are rarely working on modules that they fully understand, that the invariants they mentally invent aren't the actual invariants and reflect bugs in their own understanding rather than actual bugs in the code

This is a valid concern, I hope that collaboration among developers can solve this.
And this is exactly why I created this issue: to figure out what is worth testing and what not.

There may have been one actual (but very minor) bug in the standard library found by Hypothesis, but everything else was just noise.

I don't think that this is actually correct. First, we cannot know what bugs were found by Hypothesis, because people might just post the simplest repro without naming the tool.

Second, we have these bugs that do mention Hypothesis:

Refs #86275

@blaisep
Copy link
Contributor

blaisep commented May 21, 2024

btw, I'm new to Hypothesis and the test_zoneinfo from @pganssle has lots of different use cases, the location is
https://github.com/pganssle/zoneinfo/blob/master/tests/test_zoneinfo_property.py


Edit: Thanks for all your kind warnings, I didn't actually try to use it. I just stared into the abyss and it stared back at me. It's fascinating in an academic sense. The way that some people read about https://en.wikipedia.org/wiki/Australian_funnel-web_spider

@pganssle
Copy link
Member

pganssle commented May 21, 2024

I do think we should be expanding hypothesis tests. There was no stipulation in the original acceptance that they would only be used for zoneinfo, the idea was to use property tests more widely.

We should at least start by migrating whichever tests in Zac's repo still make sense (probably most of them).

Since they don't run on all CI, they should always come with @examples, in which case they are essentially parametrized tests (the stubs run the examples).

@Zac-HD
Copy link
Contributor

Zac-HD commented May 22, 2024

@encukou and I have been working on adding some more tests + adding a devguide section about Hypothesis. #119345 will also be nice to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants