Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare sampled normal distribution to PDF #1121

Merged
merged 14 commits into from
May 12, 2021
Merged

Conversation

vks
Copy link
Collaborator

@vks vks commented May 7, 2021

This tests that we are sampling the normal distribution correctly. For now, this is limited to the normal distribution, but it can be extended to other distributions as well. I'm using sparklines to make debugging easier:

Sampled normal distribution:
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▃▃▄▄▅▅▆▆▇▇▇▇█▇▇▇▇▇▆▆▅▅▄▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Expected normal distribution:
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▃▃▄▄▅▅▆▆▇▇▇▇█▇▇▇▇▇▆▆▅▅▄▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Difference:
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▃▅▂▂▄▁▁▆▄▁▃▂▁▁█▃▅▂▂▂▁▂▄▂▃▅▂▁▁▂▂▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
max diff: 0.006058964254376686
max expected_error: 0.0035582208162928813

(This looks a bit better in the terminal.)

Refs #357.

vks added 4 commits May 7, 2021 17:44
Those will only be used for testing distributions.
`Normal` is sampled many times into a histogram, which is then compared
to the expected probability density function. To make debugging easier,
sparklines of the expected distribution, the histogram, and their
difference are printed.

Currently, the test fails if the difference is significantly larger than
the expected error of the histogram bin. However, the error estimate
does not take the error in the normalization due to the finite width of
the histogram into account. This should not be a problem, as long as the
distribution is almost zero outside the range covered by the histogram.

In principle, this approach can be generalized to other distributions.
@dhardy
Copy link
Member

dhardy commented May 8, 2021

Excellent!

One million samples: what is the run-time, and is this accurate enough to be really useful? Should we run it in all test-suite runs or put it behind a feature flag and limit when it runs?

@TheIronBorn
Copy link
Collaborator

We could verify other qualities as well. Mean, standard deviation, skewness, kurtosis, etc. See https://github.com/miloyip/normaldist-benchmark#procedure

@vks
Copy link
Collaborator Author

vks commented May 8, 2021

@dhardy

One million samples: what is the run-time, and is this accurate enough to be really useful?

One million samples was quite accurate already for the chosen bins (as shown above, the absolute maximum error per bin was ca. 0.006 in probability units). I should probably calculate the relative errors though to make sure. 100 000 samples also gives you a perfect sparkline, but for 10 000 samples you get visible differences.

Should we run it in all test-suite runs or put it behind a feature flag and limit when it runs?

On my CPU the runtime was unnoticeable. I don't think it will be a problem, but let me check the runtime on the CI after I fixed the tests.

@TheIronBorn

We could verify other qualities as well. Mean, standard deviation, skewness, kurtosis, etc.

We could, and this is already supported by average, but this is somewhat redundant to testing the histogram and may require even more samples to get a reasonable precision.

@vks
Copy link
Collaborator Author

vks commented May 9, 2021

The new tests run in about half a second, which I think is acceptable even if extended to all distributions.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good (some minor comments).

The test runtime sounds very acceptable :+1

rand_distr/tests/pdf.rs Outdated Show resolved Hide resolved
rand_distr/tests/pdf.rs Outdated Show resolved Hide resolved
src/distributions/bernoulli.rs Show resolved Hide resolved
@vks
Copy link
Collaborator Author

vks commented May 9, 2021

@dhardy I think I addressed your comments?

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vks vks merged commit 6d236fd into rust-random:master May 12, 2021
@vks vks deleted the distr-test branch May 12, 2021 16:28
@vks vks mentioned this pull request May 15, 2021
@vks vks mentioned this pull request Apr 26, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants