Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows mmap test failures #89

Closed
tkelman opened this issue Apr 30, 2014 · 22 comments
Closed

Windows mmap test failures #89

tkelman opened this issue Apr 30, 2014 · 22 comments

Comments

@tkelman
Copy link
Contributor

tkelman commented Apr 30, 2014

Was trying to set up AppVeyor Windows CI for this package and uncovered this issue in the process:

  | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2742 (2014-04-25 02:49 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 4229d30* (5 days old master)
|__/                   |  x86_64-w64-mingw32

julia> Pkg.test("HDF5")
INFO: Testing HDF5
WARNING: read(s::IOStream,a::Array) is deprecated, use read!(s,a) instead.
 in depwarn at deprecated.jl:36
WARNING: Error reading A
ERROR: could not create file mapping
 in mmap_array at mmap.jl:134
while loading C:\Users\Tony\.julia\HDF5\test\jld.jl, in expression starting on line 136
while loading C:\Users\Tony\.julia\HDF5\test\runtests.jl, in expression starting on line 5

This is probably an issue in base Julia's mmap on Windows, but git blame tells me you wrote that section of code.

An AppVeyor build here https://ci.appveyor.com/project/tkelman/hdf5-jl/build/1.0.2/job/04fw9ibhdhlv8tup shows the same issue (except with a messed up backtrace pointing to process_options at client.jl:282 for some reason?) with the most recent 32 bit Windows binary too.

@timholy
Copy link
Member

timholy commented Apr 30, 2014

I agree with you that it seems most likely that this is a problem in base julia.

What's funny about that code is that I wrote it without having a working build system on a Windows machine. I managed to install one of the binaries, and then by "inserting" my functions into Base (i.e., function Base.mmap(...)) I was able to get something that seemed to work. Given the development history and that I haven't used Julia on Windows in months (I doubt even 0.2 is installed on that machine), I am not surprised that something is amiss.

I will try to get to this at some point, but it may take a while.

@tkelman
Copy link
Contributor Author

tkelman commented Apr 30, 2014

Okay, no rush at all. I don't have a pressing need for this, I was just on a kick looking for packages that already had Travis set up and where Windows-specific testing would be valuable.

I don't know enough of the details to be of much help on fixing this one (we could ping a few people for help who would, but probably best to wait until after 0.3 is out), but I'm happy to test things out if you come up with any ideas and don't have access to a Windows machine. You can also turn on AppVeyor and use their machines if that's easier, see https://github.com/tkelman/HDF5.jl/blob/appveyor/appveyor.yml on second thought that isn't quite as helpful for making changes to base, nevermind.

tkelman referenced this issue in JuliaLang/julia May 10, 2014
this can happen in the Windows mmap code and it would be much nicer if it
just worked.
@tkelman
Copy link
Contributor Author

tkelman commented Jun 13, 2014

I wrote JuliaLang/julia#7242 to partially address this problem. Now mmap_array doesn't fail any more if I do include(Pkg.dir("HDF5","test","runtests.jl")), but the data that gets read back into arrays when mmaparrays=true looks like random junk.

And oddly I still get ERROR: could not create file mapping when I run Pkg.test("HDF5"), or call julia C:\Users\Tony\.julia\HDF5\test\runtests.jl from cmd. I've got no idea why mmap_array would work differently depending whether I'm in the REPL or a non-interactive julia process.

The data corruption when I'm in the REPL could maybe have something to do with byte endianness of the HDF5 library on Windows? Not sure, but reading works fine and consistently when mmaparrays=false.

@timholy
Copy link
Member

timholy commented Jun 13, 2014

Nice work! Two thoughts:

  • Your endianness point is a good one. How about writing an array of Uint32 with value 0x01020304 and seeing what comes back?
  • I wonder if it would make sense to have a @show offset in your mmap_array code. One of the things about the test you wrote in Fix mmap_array with offset on Windows JuliaLang/julia#7242 is that offset_page is presumably zero; I'm wondering if the odd results you're getting here might be triggered for cases where offset_page > 0?

As far as the difference between the REPL and cmd, that's odd. I don't have any ideas.

@tkelman
Copy link
Contributor Author

tkelman commented Jun 13, 2014

I traced through the implementations to get the offset for A in test/jld.jl, wound up looking like this:

fidr = jldopen(fn, "r", mmaparrays=true)
obj = fidr["A"].plain
dims = size(obj)
prop = HDF5.h5d_get_access_plist(HDF5.checkvalid(obj).id)
ret = Ptr{Cint}[0]
HDF5.h5f_get_vfd_handle(obj.file.id, prop, ret)
fd = unsafe_load(ret[1])
offset = HDF5.h5d_get_offset(obj.id)

The offset for A is only 0x0000000000000a68, or 2664 - less than the 65536 allocation granularity.

Oh, but hm, if I add another array to the end of the file that is past 65536, then I get a failure to create mapping view. So yeah there are likely some issues.

@tkelman
Copy link
Contributor Author

tkelman commented Jun 14, 2014

Thanks to Jameson's super-useful extra error message info, the difference between REPL and cmd is something to do with permissions:

WARNING: Error reading A
ERROR: could not create file mapping: Access is denied.

And on writing [0x01020304] and reading back with mmap, I get a result that makes no sense.

julia> fid = jldopen("test.jld", "w")
Julia data file version 0.0.2: test.jld

julia> a = [0x01020304]
1-element Array{Uint32,1}:
 0x01020304

julia> @write fid a

julia> close(fid)

julia> fidr = jldopen("test.jld", "r", mmaparrays=true)
Julia data file version 0.0.2: test.jld

julia> read(fidr, "a")
1-element Array{Uint32,1}:
 0x2c746e49

Here's what the resulting test.jld looks like in binary: https://gist.github.com/tkelman/d1f99cde5c143aefc7a9

@timholy
Copy link
Member

timholy commented Jun 15, 2014

Since the 0x01020304 is stored in the last 4 bytes of the file, what happens if you ignore the fact that these are HDF5 files and just try

s = open("test-windows.jld", "r")
a = mmap_array(Uint32, (1,), s, offset)

where (if I've calculated correctly) offset = 2656.

If it works when you treat it as a regular file but not within HDF5/JLD, then can I ask you to wrap this function in

using Debug
@debug begin
function readmmap...
end # of function
end # of @debug

and step through it, examining all the different variables. (See instructions if you've never used Debug.jl before.) If you do that on both Linux and Windows, hopefully the problem would quickly become apparent.

@tkelman
Copy link
Contributor Author

tkelman commented Jun 15, 2014

Comes back right as a regular file. Will try stepping through with debug.

@tkelman
Copy link
Contributor Author

tkelman commented Jun 15, 2014

What is prop in that function? I get slightly different numbers each time, seems to start at 167772179 the first call, then goes up by 3 each time?

fd ends up as 14 on Linux, 3 on Windows. ret is [Ptr{Int32} @0x0000000002ea5318] on Windows after the call to h5f_get_vfd_handle, or [Ptr{Int32} @0x0000000022a6e8b0] on Linux. Perhaps the vfd_handle (whatever that is?) is pointing to the wrong file or something?

@timholy
Copy link
Member

timholy commented Jun 15, 2014

prop is basically an opaque identifier; it should only be manipulated through the [H5P API].(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html), which as you can see is not tiny 😄. I don't know enough about the inner workings of HDF5 to understand the behavior you observe, but I'm not (currently) worried about it.

I take it that offset is something sensible on both platforms? If so, that suggests that your core mmap functionality in base Julia is now working, and that PR could be merged. That would also isolate the problem to fd.

One little thing I noticed: ret is defined as ret = Ptr{Cint}[0], but the last argument of H5Fget_vfd_handle is void **file_handle. I wonder if we changed it to ret = Ptr{Ptr{Void}}[0], would it help? CC @simonster.

@timholy
Copy link
Member

timholy commented Jun 15, 2014

Wait, my suggestion was borked, and had one too many Ptrs in it.

@tkelman
Copy link
Contributor Author

tkelman commented Jun 15, 2014

offset is 0x0000000000000a60, or 2656, on both platforms. I agree the wrong-data problem is due to fd. Not so sure on the permissions part when not in the REPL, that's something else and most likely unrelated to HDF5 (I see something similar in Requests.jl).

@timholy
Copy link
Member

timholy commented Jun 15, 2014

Thanks for the confirmation that offset is OK.

http://hdf-forum.184993.n3.nabble.com/getting-back-the-full-with-path-filename-from-a-hid-t-td914238.html is interesting, but it suggests that this should work.

@timholy
Copy link
Member

timholy commented Jun 15, 2014

Actually, can you comment out the try, finally, end? Errors within that block may be being masked. We should probably also initialize fd = convert(Cint, -1) (EDIT: rather than using local fd) and check for this value before calling mmap_array.

@tkelman
Copy link
Contributor Author

tkelman commented Jun 15, 2014

commenting out try, finally, end doesn't make any difference

edit: no difference from replacing local fd with fd = convert(Cint, -1) either

@timholy
Copy link
Member

timholy commented Jun 15, 2014

The initialization idea was just simply to check for errors, but (duh) try/finally allows errors to happen (I was temporarily thinking of catch), so my suggestion was dumb anyway.

I confess I'm a bit stuck. I'm not sure how one goes about finding out whether that file descriptor is really the right one. I guess within Julia you could could run a test something like this:

s = open("somefile")
println(read(s, Uint8, (5,)))
seek(s, 0)
s2 = fdio(fd(s))
println(read(s2, Uint8, (5,)))

just to check whether going from IOStream->file descriptor->IOStream gives you the right answer.

@timholy
Copy link
Member

timholy commented Jun 15, 2014

I guess you could also check that the value of fd in readmmap increases if you first open (and leave open) another file. But this is starting to grasp at straws.

@tkelman
Copy link
Contributor Author

tkelman commented Jun 15, 2014

fd in readmmap is always 3. Maybe the unsafe_load is the problem? At some point we could give up and disable mmap'ing on Windows for HDF5 purposes.

Your little test with fdio(fd(s)) looks to give consistent results.

@timholy
Copy link
Member

timholy commented Jun 15, 2014

At some point we could give up and disable mmap'ing on Windows for HDF5 purposes.

Works for me.

@timholy
Copy link
Member

timholy commented Dec 6, 2014

Not sure if this is still an issue?

@tkelman
Copy link
Contributor Author

tkelman commented Dec 7, 2014

I think it is, the mmap tests are still disabled for Windows https://github.com/timholy/HDF5.jl/blob/d8300406ed619b5cca754919a32edb87a94982cb/test/jld.jl#L350

@simonbyrne
Copy link
Collaborator

Fixed by #545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants