Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distribution plots to rand_distr documentation #1434

Merged
merged 30 commits into from
Jul 8, 2024
Merged

Add distribution plots to rand_distr documentation #1434

merged 30 commits into from
Jul 8, 2024

Conversation

MichaelOwenDyer
Copy link
Member

@MichaelOwenDyer MichaelOwenDyer commented Apr 9, 2024

  • Added a CHANGELOG.md entry

Summary

This PR adds some graphs to the documentation of different distributions in rand_distr to illustrate their behavior.

Motivation

Make the distributions in this crate easier to understand via visual aids.

Details

The diagrams will be hosted in https://github.com/rust-random/charts (see this PR) and links will be inserted into the documentation.

Closes #131.

Documentation Progress:

  • binomial::Binomial
  • cauchy::Cauchy
  • dirichlet::Dirichlet
  • exponential::Exp
  • exponential::Exp1
  • frechet::Frechet
  • gamma::Gamma
  • gamma::ChiSquared
  • gamma::FisherF
  • gamma::StudentT
  • gamma::Beta
  • geometric::Geometric
  • geometric::StandardGeometric
  • gumbel::Gumbel
  • hypergeometric::Hypergeometric
  • inverse_gaussian::InverseGaussian
  • normal::Normal
  • normal::StandardNormal
  • normal::LogNormal
  • normal_inverse_gaussian::NormalInverseGaussian
  • pareto::Pareto
  • pert::Pert
  • poisson::Poisson
  • skew_normal::SkewNormal
  • triangular::Triangular
  • unit_ball::UnitBall
  • unit_circle::UnitCircle
  • unit_disc::UnitDisc
  • unit_sphere::UnitSphere
  • weibull::Weibull
  • zipf::Zeta
  • zipf::Zipf

@MichaelOwenDyer
Copy link
Member Author

Need to look into it more, but maybe whatever solution we decide on here could also be used for #278

@dhardy
Copy link
Member

dhardy commented Apr 10, 2024

However, this also adds a transitive dependency to syn, which is not great.

Not an issue: you already feature-gated it.

The diagrams are generated with Python code, which is included in the PR for reference.

Good to have in the same repo, but it doesn't need to be in the generated crate package. Could you try investigating why this happens? We already have an explicit include rule in the Cargo.toml, which seems like it should be enough.

We should also talk about the best place to put the diagrams (currently there is one assets folder for rand and another for rand_distr, but this might not work).

We don't package any of these files except by embedding, so we don't need an assets dir and we don't need these separated by crate, but I prefer that the generated resources are next to the generating script. Having the assets in crate-specific dirs but not the script is especially weird. Maybe just move diagrams/src to assets/src, also splitting by crate?

Or, simpler: forget Bernoulli (doesn't really need it) and put all the rest under rand_distr/plots. (Also solves my nit below.)

Plots

Bernoulli

X-axis values should be false, true; not 0, 1.

Binomial

Can we add a second parameterisation, e.g. n=6, p=0.5?

Cauchy

👍 (straight copy from WP, I see)

@newpavlov
Copy link
Member

I think SVG would be a better fit (separate files, not embedded into HTML). You should be able to generate the same plots with Python without issues.

@dhardy
Copy link
Member

dhardy commented Apr 10, 2024

@newpavlov you missed the discussions in #131 (specifically, how to make this work on docs.rs).

SVG appears larger in some cases, smaller in others. It does scale better.

@newpavlov
Copy link
Member

newpavlov commented Apr 10, 2024

Ah, I misread the discussion and thought it was about embedding SVG into HTML. I don't think the terrible diff objection is important, plus we probably should use compression either way (i.e. use SVGZ instead of plain SVG). Though there may be some issues with browser support of SVGZ files.

In RustCrypto we use links to images which reside in the RustCrypto/media repository, e.g. see docs for the ctr crate. It means that we rely on raw.githubusercontent.com instead of packaging the image together with crate, but since the images are purely for illustrative purposes, I think it's a reasonable risk to take.

@MichaelOwenDyer
Copy link
Member Author

Good to have in the same repo, but it doesn't need to be in the generated crate package. Could you try investigating why this happens? We already have an explicit include rule in the Cargo.toml, which seems like it should be enough.

I think it may be because the folder was named src. I've moved it and renamed it to py, could you please check again?

forget Bernoulli (doesn't really need it) and put all the rest under rand_distr/plots

Got it. That also removes the need for the added dependency on rand, now it only remains in rand_distr.

Binomial
Can we add a second parameterisation, e.g. n=6, p=0.5?

I went ahead and added n=10, p=0.6 because I think it looks better next to n=10, p=0.2. If you think having a demonstration of a different n is important, then I could experiment some more. Maybe something like n=20, p=0.5 would look better.

I think SVG would be a better fit (separate files, not embedded into HTML). You should be able to generate the same plots with Python without issues.

I prefer SVG as well. However, I just did the switch and the diff count exploded to +11,000. Wowza.

@newpavlov
Copy link
Member

newpavlov commented Apr 10, 2024

The Python script folder should probably be explicitly excluded from packaged crate (check if it gets included by running cargo package).

It also may be worth to consider the separate repo approach described above. It will be a simpler solution than relying on embed-doc-image and will result in a smaller package size. I will leave the final decision to @dhardy.

@MichaelOwenDyer
Copy link
Member Author

The Python script folder should probably be explicitly excluded from packaged crate (check if it gets included by running cargo package).

Ah, thanks for the tip! According to this include and exclude should not be used at the same time. But I found the problem, had to change src/ to /src :)

It also may be worth to consider the separate repo approach described above. It will be a simpler solution than relying on embed-doc-image and will result in a smaller package size. I will leave the final decision to @dhardy.

Certainly. I'm not attached to embedding them if we find a better solution.

@dhardy
Copy link
Member

dhardy commented Apr 10, 2024

I prefer SVG as well. However, I just did the switch and the diff count exploded to +11,000. Wowza.

Diffs normally ignore binary files; this is just because SVG is text based? If so, switching to SVGZ may avoid this?

had to change src/ to /src

Aha!

It also may be worth to consider the separate repo approach described above.

The only real issue is versioning. I see usage of commit hashes in the URLs that ctr uses; this or tags works.

There is also the minor issue that repo clones don't include all docs, but we already have that with the book.

The other issue is that anyone trying to make themselves a copy of the docs for offline usage likely won't have the plots, but that's not critical.

Considering these plots aren't directly tied to the Rust code in any way, a separate repo probably is the way to go.

@MichaelOwenDyer
Copy link
Member Author

Diffs normally ignore binary files; this is just because SVG is text based? If so, switching to SVGZ may avoid this?

Yes, SVGZ is better in this regard. However embed-doc-image doesn't support it.

a separate repo probably is the way to go.

Okay then! Want to create it?

@dhardy
Copy link
Member

dhardy commented Apr 10, 2024

Okay then! Want to create it?

https://github.com/rust-random/charts

@MichaelOwenDyer
Copy link
Member Author

Can you authorize me to push in that repo, or should I make a fork?

@dhardy
Copy link
Member

dhardy commented Apr 10, 2024

@MichaelOwenDyer you should already have admin privileges on that repo.

@MichaelOwenDyer
Copy link
Member Author

MichaelOwenDyer commented Apr 10, 2024

Hm, strange, I am getting a 403 whenever I try to push.

Edit: Nevermind, I logged in using a GitHub token and then it worked.

@vks
Copy link
Collaborator

vks commented Apr 26, 2024

Diffs normally ignore binary files; this is just because SVG is text based?

We can tell Git via attributes to treat the files as binary if necessary.

@dhardy
Copy link
Member

dhardy commented Jun 19, 2024

@dhardy, could you take a look at the docs for the Geometric distribution and tell me if you think the plot is consistent?

The way Wikipedia puts it is that there are two definitions of the distribution. The code and the PMF count the number of failures, while your plot and description count until the first success.

@dhardy
Copy link
Member

dhardy commented Jun 19, 2024

Also, is there a reason we use σ for the scale parameter in Gumbel instead of β which I see in the literature?

That's presumably an error in the documentation since the code just uses scale internally.

@MichaelOwenDyer
Copy link
Member Author

The way Wikipedia puts it is that there are two definitions of the distribution.

Okay, yes I saw that too. I will adjust the x-axis of the plot to start at 0.

That's presumably an error in the documentation since the code just uses scale internally.

I'll fix it then 🤠

@MichaelOwenDyer
Copy link
Member Author

Another thing I noticed: We are using a in the documentation and implementation for the Zeta distribution, but s is what I am seeing online. Any reason for this?

Also, should I make an effort to embed some more links to Wikipedia across the documentation? Some already exist.

@dhardy
Copy link
Member

dhardy commented Jun 20, 2024

Another thing I noticed: We are using a in the documentation and implementation for the Zeta distribution, but s is what I am seeing online. Any reason for this?

@vks implemented this (#1136). @vks?

Also, should I make an effort to embed some more links to Wikipedia across the documentation? Some already exist.

WP is a useful resource; I guess this would be convenient for some people but not important.

@vks
Copy link
Collaborator

vks commented Jun 20, 2024

@MichaelOwenDyer Can you be more specific? The notation and naming are not very consistent across the literature. For example, what we call Zeta is sometimes called Zipf, or vice versa.

@MichaelOwenDyer
Copy link
Member Author

@vks I'm referring to the struct at zipf::Zeta. zipf::Zipf uses s already, but zipf::Zeta uses a. I didn't see anything online which lists the Zeta parameter as a, see wikipedia: https://en.wikipedia.org/wiki/Zeta_distribution

@vks
Copy link
Collaborator

vks commented Jun 20, 2024

I guess the a comes from the reference, but I no longer have access to it to check. I think I did not want to change it in the implementation to avoid getting confused when implementing the algorithm from the paper.

I'm fine with changing it to s now, as you say it's more consistent with our Zipf docs, and the usual parameter name chosen for the Riemann zeta function.

@MichaelOwenDyer
Copy link
Member Author

Okay. We may want to do that in a different PR since ZetaError::ATooSmall is pub, and I believe renaming that would be a breaking change.

@dhardy
Copy link
Member

dhardy commented Jul 8, 2024

@MichaelOwenDyer I think this just needs a cargo fmt and then can be merged.

@MichaelOwenDyer
Copy link
Member Author

Thanks for the nudge, I've had a lot on my plate recently. Will finalize this today

- Change parameter of Gumbel from σ to β
- Skew normal now uses the correct symbols ξ, ω, α
- Other tweaks
@MichaelOwenDyer
Copy link
Member Author

Okay @dhardy, I've made my final changes. I added Wikipedia links to the documentation and tried to make it look as consistent as possible. I would appreciate you taking one last look at it and if it looks good to you then I would say its ready to merge :)

Comment on lines 9 to 26
//! The Pareto distribution.
//! The Pareto distribution `Pareto(α, xₘ)`.

use crate::{Distribution, OpenClosed01};
use core::fmt;
use num_traits::Float;
use rand::Rng;

/// The Pareto distribution `Pareto(scale, shape)`.
/// The Pareto distribution `Pareto(α, xₘ)`.
///
/// The Pareto distribution is a continuous probability distribution with
/// parameters `scale` (`α`) and `shape` (`x`<sub>`m`</sub> or `k`).
/// parameters `scale` (`α`) and `shape` (`xₘ` or `k`).
///
/// # Plot
///
/// The following plot shows the Pareto distribution with various values of
/// `scale` and `shape`.
/// Note how the scale parameter `α` corresponds to the height of the jump
/// in density at `x = x`<sub>`m`</sub>, and to the rate of decay in the tail.
/// in density at `x = xₘ`, and to the rate of decay in the tail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the parameter names got confused somewhere — looks like α is shape. Also, match argument order of the constructor: scale comes first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I fixed it; please take another look. The plot still lists them in the other order (α, xₘ) but I guess that's okay.

rand_distr/src/skew_normal.rs Outdated Show resolved Hide resolved
rand_distr/src/frechet.rs Outdated Show resolved Hide resolved
@MichaelOwenDyer
Copy link
Member Author

Okay, I'm going to go ahead and merge this :D

@MichaelOwenDyer MichaelOwenDyer merged commit 17746a1 into rust-random:master Jul 8, 2024
14 checks passed
@MichaelOwenDyer MichaelOwenDyer deleted the dist-diagrams-in-docs branch July 9, 2024 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate or include graphs for distributions in docs
4 participants