Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and improve performance of rand(Char) #11178

Merged
merged 1 commit into from
May 7, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions base/random.jl
Original file line number Diff line number Diff line change
Expand Up @@ -253,14 +253,10 @@ rand(r::MersenneTwister, ::Type{Int128}) = reinterpret(Int128, rand(r, UInt128)
rand{T<:Real}(r::AbstractRNG, ::Type{Complex{T}}) = complex(rand(r, T), rand(r, T))

# random Char values
# use simple rejection sampling over valid Char codepoint range
# returns a random valid Unicode scalar value (i.e. 0 - 0xd7ff, 0xe000 - # 0x10ffff)
function rand(r::AbstractRNG, ::Type{Char})
while true
c = rand(0x00000000:0x0010fffd)
if is_valid_char(c)
return reinterpret(Char,c)
end
end
c = rand(0x00000000:0x0010f7ff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need

c = rand(r, 0x00000000:0x0010f7ff)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed what the code did before (it just had the values incorrect). Why would the r be needed, it actually is not even used (I don't know why the function even has the argument, it seems rather misleading)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no expert on Julia's RNG code, but I think this might have been an oversight in the original code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way rand is meant to be used, here, is the 1-argument form, i.e.:
rand(Char), and that it is simply using the 1-argument form there also:

julia> typeof(0:0x10f7ff)
UnitRange{Int64}

julia> @which rand(0:0x10f7ff)
rand(r::AbstractArray{T,N}) at random.jl:203

I'm pretty sure it is correct both in the original code, and in my update.
Thanks for reviewing the code though, the dropping of r bothers me too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, and hi david ;-) rand(r, 0:0x10f7ff) gives reproducible random numbers and has side effects on r , rand(0:0x10f7ff)is not reproducible in a reasonable way and has side effects on GLOBAL_RNG . rand(r::AbstractRNG, ::Type{Char}) should not affect GLOBAL_RNG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daviddelaat is right, the rand Range needs to be parameterized by the AbstractRNG. I'll add the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain... I really know nothing about the way abstract ranges work, or GLOBAL_RNG...
I was just fixing the Unicode issues!
Maybe @jakebolewski should fix the range issues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I totally understand your question, but the change Jake made in 7efbc01 isn't really specific to ranges or the RNG. It's an idiom for functions that interact with global state (like the GLOBAL_RNG or STDOUT). For example, the print function has one fallback method:

print(x) = print(STDOUT, x)

And then all custom types define methods print(io::IO, x::MyType), and within those methods they should only call the two-arg print methods, explicitly using io instead of defaulting back to STDOUT. This allows you to easily collect the output in, e.g., an IOBuffer without worrying about stuff leaking out to STDOUT.

The same applies to RNGs, since they are also a global that mutates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks, now that makes sense!

(c < 0xd800) ? Char(c) : Char(c+0x800)
end

## Arrays of random numbers
Expand Down