-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and improve performance of rand(Char) #11178
Conversation
return reinterpret(Char,c) | ||
end | ||
end | ||
c = rand(0x00000000:0x0010f7ff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need
c = rand(r, 0x00000000:0x0010f7ff)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just followed what the code did before (it just had the values incorrect). Why would the r be needed, it actually is not even used (I don't know why the function even has the argument, it seems rather misleading)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm no expert on Julia's RNG code, but I think this might have been an oversight in the original code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way rand is meant to be used, here, is the 1-argument form, i.e.:
rand(Char), and that it is simply using the 1-argument form there also:
julia> typeof(0:0x10f7ff)
UnitRange{Int64}
julia> @which rand(0:0x10f7ff)
rand(r::AbstractArray{T,N}) at random.jl:203
I'm pretty sure it is correct both in the original code, and in my update.
Thanks for reviewing the code though, the dropping of r bothers me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, and hi david ;-) rand(r, 0:0x10f7ff)
gives reproducible random numbers and has side effects on r
, rand(0:0x10f7ff)
is not reproducible in a reasonable way and has side effects on GLOBAL_RNG
. rand(r::AbstractRNG, ::Type{Char})
should not affect GLOBAL_RNG
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daviddelaat is right, the rand
Range
needs to be parameterized by the AbstractRNG
. I'll add the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain... I really know nothing about the way abstract ranges work, or GLOBAL_RNG...
I was just fixing the Unicode issues!
Maybe @jakebolewski should fix the range issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I totally understand your question, but the change Jake made in 7efbc01 isn't really specific to ranges or the RNG. It's an idiom for functions that interact with global state (like the GLOBAL_RNG or STDOUT). For example, the print function has one fallback method:
print(x) = print(STDOUT, x)
And then all custom types define methods print(io::IO, x::MyType)
, and within those methods they should only call the two-arg print methods, explicitly using io
instead of defaulting back to STDOUT. This allows you to easily collect the output in, e.g., an IOBuffer without worrying about stuff leaking out to STDOUT.
The same applies to RNGs, since they are also a global that mutates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks, now that makes sense!
Simplify and improve performance of rand(Char)
This also fixes a minor issue that two valid code points (0x10fffe and 0x10ffff) would not be produced.
[They are valid code points or scalar values, even though they are part of the set of 66 "NonCharacters",
and per the Unicode standard, need to be allowed].
On testing on my laptop, this took about 83% of the time of the original code...
[kudos to @jakebolewski for #11033 in the first place!]