You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently PySpark provide only two methods of generating random numbers:
rand which generates numbers from $\sim U(0, 1)$
randn which generates numbers from $\sim N(0, 1)$
What I want to have?
Random integers generation
Samples from $\sim U(\alpha, \beta)$
Samples from $\sim N(\alpha, \beta)$
Samples from $\sim L(\alpha, \beta)$ (Laplace distribution)
Samples from $\sim \Gamma (\alpha, \beta)$ (Gamma distribution)
Motivation
Random integers is important for a lot of reasons and re-partitioning is one of them.
Gaussian, Laplace and Gamma distributions are a key part to implement Differential Privacy Driven aggregations (for example, Additive noise mechanism and especially Laplace mechanism).
How to do it?
All distribution can be generated from the uniform. The idea is to provide a top level functions.
The text was updated successfully, but these errors were encountered:
@SemyonSinchenko - yep, I like this idea. Can you please propose the function names => expected interfaces at a high level? e.g. "I think we should have a quinn.laplace(col: Column) function that returns a Leplace number". These should be useful in a variety of contexts.
What we have
Currently PySpark provide only two methods of generating random numbers:
rand
which generates numbers fromrandn
which generates numbers fromWhat I want to have?
Motivation
How to do it?
All distribution can be generated from the uniform. The idea is to provide a top level functions.
The text was updated successfully, but these errors were encountered: