-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use UUID's to generate random tempname on Windows #33785
Conversation
base/file.jl
Outdated
@assert ccall((:RpcStringFreeW, :Rpcrt4), Cint, (Ref{Ptr{Cwchar_t}},), nameptr) == 0 | ||
|
||
filename = joinpath(parent, name) | ||
if !ispath(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ok with this. I feel like this should just error (maybe?), since it will be an incredible rare scenario. In the previous code, we did keep looping until a unique name was generated. Here, in the unlikely scenario it is a path, we recursively call the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe if cleanup
is false we could not call ispath
at all? It would be nice if this function would actually not touch the file system at all.
I tend to think that just assuming we'll never run into a conflict is ok here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe if cleanup is false we could not call ispath at all?
Agreed, in fact it's not guaranteed to return a unique filename (even if very unlikely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since @StefanKarpinski worked on this, could you comment on ?
It would be nice if this function would actually not touch the file system at all.
I don't see how it would be possible to do this in the scenario, since if we don't check !ispath(filename)
, temp_cleanup_later
could accidentally remove the file which we did not generate.
base/file.jl
Outdated
|
||
id = Ref{GUID}() | ||
r = ccall((:UuidCreate,:Rpcrt4), Cint, (Ref{GUID},), id) | ||
r == 0 || (r == RPC_S_UUID_LOCAL_ONLY && (@warn "UIUD guaranteed to be unique to this computer only"; true)) || (r == RPC_S_UUID_NO_ADDRESS && error("Cannot get Ethernet or token-ring hardware address for this computer.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the warning? It is not clear when that case would actually ever happen (no network card installed?), but even if it does, it seems that a machine unique filename is good enough and doesn't need to emit a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine not to emit this warning to me, but I'm not an expert in this platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UuidCreate
For security reasons, it is often desirable to keep ethernet addresses on networks from becoming available outside a company or organization. The UuidCreate function generates a UUID that cannot be traced to the ethernet address of the computer on which it was generated. It also cannot be associated with other UUIDs created on the same computer. If you do not need this level of security, your application can use the UuidCreateSequential function, which behaves exactly as the UuidCreate function does on all other versions of the operating system.
UuidCreateSequential
The UuidCreateSequential function returns RPC_S_UUID_LOCAL_ONLY when the originating computer does not have an ethernet/token ring (IEEE 802.x) address. In this case, the generated UUID is a valid identifier, and is guaranteed to be unique among all UUIDs generated on the computer. However, the possibility exists that another computer without an ethernet/token ring address generated the identical UUID. Therefore you should never use this UUID to identify an object that is not strictly local to your computer. Computers with ethernet/token ring addresses generate UUIDs that are guaranteed to be globally unique.
In my PR I use UuidCreate
Reading things over again, the intention of UuidCreate/Sequential are more general than the intended purpose here which is to use them to generate a random temporary filename on the local computer. Therefore is probably more appropriate to simply use UuidCreateSequential
and forgo these warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think UuidCreate
is what is used by default to create GUIDS on Windows, so I think I would just stick with that, and treat a return value of RPC_S_UUID_LOCAL_ONLY
as fine and not worth a warning.
base/file.jl
Outdated
|
||
@assert ccall((:RpcStringFreeW, :Rpcrt4), Cint, (Ref{Ptr{Cwchar_t}},), nameptr) == 0 | ||
|
||
filename = joinpath(parent, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we prefix the filename with something like jl_
? I think there is already a constant defined in the code somewhere. That way one could easily identify the temp files created by Julia.
Also pinging @JeffBezanson, who wrote the current version of |
Seems unlikely—probably just an artifact of him being the person to move the code to the current location. |
It looks to me as if he really wrote it: 5e2ff12 |
we have unrelated failures here |
package_win32 is a segfault in tempname |
oops your right. Might be a fluke. The previous build both ran into the same unrelated repl kill test errors |
test/file.jl
Outdated
@@ -50,6 +50,20 @@ end | |||
|
|||
using Random | |||
|
|||
@testset "that temp names are actually unique" begin | |||
temps = [tempname(cleanup=false) for _ = 1:1000] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has an expected failure rate on our CI of 0.001% (1 - e^(-1000^2 / 62^6 / 2)
) which means 1 in 100k CI runs (or about 1 in 10k commits) would be expected to fail this test (over the current lifetime of this project, that would mean there'd have been about 2-3 failures)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. The filenames that are generated in this PR are 128 bit UUIDs, which, as far as I can tell, will lead to a way, way, way lower collision rate than what you compute here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the point that Jameson is making is that this test is unnecessarily strict, which I agree (I only used 1000 to test the PR). Something like 100, is more than sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least this does not use Julia's RNG, which gets reset on every @testset
.
base/file.jl
Outdated
return filename | ||
else | ||
# in the unlikley scenario filename is a path, call this function again regenerate a unique filename | ||
return !ispath(filename) ? (temp_cleanup_later(filename); filename) : tempname(parent; cleanup=cleanup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to get rid of ispath
if possible. Is it ok to replace these last statements with
cleanup && temp_cleanup_later(filename)
return filename
Like in the non-windows version?
Ok 32-bit windows is now fixed (calling convention should've been |
This is now ready for review |
a0fec89
to
96e6dac
Compare
Note, this PR does lead to a bigger difference than what we currently have between the linux and windows behavior for |
I think the |
I sometimes go into my temp folder to delete stuff, and I find it quite helpful in those cases to be able to tell which file originated from what application, so that I can do select deletes, so I kind of like the |
Ok just wondered if someone had thoughts about that. This PR from my standpoint is ready. Tests are all passing. |
All green on the Western front....edit: apparently not, looks like still running tests... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach seems good. Some open-ended questions:
- Is it OK to have filenames with
-
in them? Seems like it should be a pretty well supported character, but if not, we could addname = filter(!=('-'), name)
(although, for that matter we could also use the existing code in uuid.jl for turning these into strings, as that definition of struct UUID is valid as a drop-in replacement for this struct GUID usage) - Is it OK for the filenames to be so long? Since the tempdir may already be pretty deep in the path (
C:\Documents and Settings\Your Full Username Here\AppData\Local\Temp\
), this'll reduce the maximum supported username length from 201 to 174 utf16 code points. - Is it OK to use filenames without suffixes?
All of these seem OK to me, but I just wanted to voice them.
Triage thinks this is a bug fix and so does not block feature freeze. |
Seems like this didn't work on win64? |
It was passing before I made the change to use underscores. |
I’ll repeat: putting a UUID in the temp name is too long. This should not be longer that than 10 or so chars. |
import Base.StringVector
function _rand_string()
nchars = 10
A = Vector{UInt8}(undef, nchars)
ccall((:SystemFunction036, :Advapi32), stdcall, UInt8, (Ptr{Cvoid}, UInt32), A, sizeof(A))
slug = StringVector(10)
chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
for i = 1:nchars
slug[i] = chars[(A[i] % length(chars)) + 1]
end
return name = String(slug)
end which approach is better to use the above, or the one in the last commit? |
The implementation in the comment seems better to me. It's slightly biased because 62 doesn't divide 256, but that seems fine. @vtjnash, any opinion on the matter? |
Does that actually create a bias? |
Anyways I tend to agree that it is probably a better approach, and is similar in approach to that used in the .Net Core that @davidanthoff linked. |
Yes: since |
function _rand_string() | ||
nchars = 10 | ||
A = Vector{UInt8}(undef, nchars) | ||
ccall((:SystemFunction036, :Advapi32), stdcall, UInt8, (Ptr{Cvoid}, UInt32), A, sizeof(A)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can fail. The return value should be checked, but there's not much we can do to recover to just throw an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we do the same thing in the stdlib, so I'm gonna merge this and then fix them both at once.
* Use UIUD to create random tempname on Windows * Use underscores and remove extension * Truncate to 10 chars the UUID * Generate the random name from a random byte array * Update file.jl (cherry picked from commit d759b5b)
Given that this implementation is biased, in a non-problematic way, wouldn't if be fine to even use |
wouldn't hurt to open a PR @rfourquet if you have something in mind, thanks. |
* Use UIUD to create random tempname on Windows * Use underscores and remove extension * Truncate to 10 chars the UUID * Generate the random name from a random byte array * Update file.jl
* Use UIUD to create random tempname on Windows * Use underscores and remove extension * Truncate to 10 chars the UUID * Generate the random name from a random byte array * Update file.jl (cherry picked from commit d759b5b)
cc @StefanKarpinski @vtjnash @davidanthoff
I wrote up this quick implementation that use UIUD's to generate random file names as recommended in the MSFT docs. Note, we already load the library
"Rpcrt4.dll"
so it shouldn't be an issue to use it here. (According toLibdl.dll()
this library is already loaded on Windows when using Julia. )