Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eio.Path.load loads only first 4gb on big files #335

Closed
EduardoRFS opened this issue Oct 1, 2022 · 10 comments · Fixed by #343
Closed

Eio.Path.load loads only first 4gb on big files #335

EduardoRFS opened this issue Oct 1, 2022 · 10 comments · Fixed by #343
Labels
bug Something isn't working

Comments

@EduardoRFS
Copy link

I'm using the Luv backend and when trying to load a file bigger than 4gb's the result string always ends up being the first 4gb's of the file.

@haesbaert
Copy link
Contributor

Can confirm:

$ dune build && EIO_BACKEND=luv ./_build/default/bench/foo.exe
len=4294967296
[0]=A
[2^32 - 1]=A
Fatal error: exception Invalid_argument("index out of bounds")

$ ./_build/default/bench/foo.exe
len=4294967297
[0]=A
[2^32 - 1]=A
[2^32]=B

My first suspicion was not using 64bits off_t, but it seems it is, at some point someone is storing it in 32bit.

@haesbaert haesbaert added the bug Something isn't working label Oct 1, 2022
@haesbaert
Copy link
Contributor

haesbaert commented Oct 1, 2022

The bug seems to be internal to libuv:

uv_fs_read:2006
uv__fs_read:473
uv__fs_read:481 result=2147479552 len=2147483648
+got = 2147479552
uv_fs_read:2006
uv__fs_read:473
uv__fs_read:481 result=4096 len=4096
+got = 4096
uv_fs_read:2006
uv__fs_read:473
uv__fs_read:481 result=0 len=0  <--- here libuv is calling read(2) with a buffer of zero length, so it returns 0.

the caller is src/unix/fs.c:uv__fs_read():

      result = read(req->file, req->bufs[0].base, req->bufs[0].len);
      printf("%s:%d result=%zd len=%zd\n", __func__, __LINE__, result, req->bufs[0].len);

@haesbaert
Copy link
Contributor

haesbaert commented Oct 1, 2022

We're actually passing a 2^32 buffer, somehow it turns into 0, somewhere must be wrapping around, anyway it's late.

uv_fs_read:2006
uv__fs_read:473
uv__fs_read:481 result=4096 len=4096
+got = 4096
+our cstruct len=4294967296 balen=4294967296
uv_fs_read:2006
uv__fs_read:473
uv__fs_read:481 result=0 len=0
+>>EOF<<

10 bucks it's a literal int overflow

@haesbaert
Copy link
Contributor

haesbaert commented Oct 1, 2022

So the bug seems to be in luv, we pass the correct number, but libuv gets a very likely overflowed int.
I need to learn ctypes to continue.

uv_fs_read:2010 bufs[0] base=0x7f0987ff8010 len = 0

@haesbaert
Copy link
Contributor

Found it:
https://github.com/aantron/luv/blob/0ed668e87ca55d65f4f84422b0dd3381701c71b0/src/helpers.ml#L53

utop # Unsigned.UInt.of_int 4294967296;;
- : Unsigned.uint = <uint 0>

@anmonteiro
Copy link
Contributor

It might be time to start discussing a community fork of luv if we ever want to see that fixed.

@haesbaert
Copy link
Contributor

haesbaert commented Oct 2, 2022

It might be time to start discussing a community fork of luv if we ever want to see that fixed.

Yes, we also have an unstable ABI with uring (I've hit two kernel bugs the last week). I also don't think luv offers much other than portability with windows, there had been talks about just writing a pure backend that is at least POSIX and windows can eat cake until IOCP is done.

haesbaert added a commit to haesbaert/luv that referenced this issue Oct 5, 2022
Did some quick tests in Eio and it fixes that case:
See ocaml-multicore/eio#335

There are probably more spots (outside iovec) where a (s)size_t is wrongly
expressed as an Unsigned.UInt, ideally it should always be a Long.
@talex5
Copy link
Collaborator

talex5 commented Oct 7, 2022

We can work around this in eio_luv quite easily by limiting the amount we ask libuv to read in one go to 2GB. It's unlikely it would do more than that anyway, and the caller will handle it like any other short read.

@haesbaert
Copy link
Contributor

We can work around this in eio_luv quite easily by limiting the amount we ask libuv to read in one go to 2GB. It's unlikely it would do more than that anyway, and the caller will handle it like any other short read.

It's done here: e2f38d3, needs some ironing out and maybe I should pull that out of that PR.

@talex5
Copy link
Collaborator

talex5 commented Oct 7, 2022

A separate PR would be great!

talex5 pushed a commit to talex5/eio that referenced this issue Oct 10, 2022
Luv uses a 32 bit int for buffer sizes and wraps if the value passed is
too big. In particular, a request for to read 4GB of data is interpreted
as a request for 0 bytes.

Fixes ocaml-multicore#335.
talex5 pushed a commit to talex5/eio that referenced this issue Oct 10, 2022
Luv uses a 32 bit int for buffer sizes and wraps if the value passed is
too big. In particular, a request for to read 4GB of data is interpreted
as a request for 0 bytes.

Fixes ocaml-multicore#335.
talex5 pushed a commit to talex5/eio that referenced this issue Oct 11, 2022
Luv uses a 32 bit int for buffer sizes and wraps if the value passed is
too big. In particular, a request for to read 4GB of data is interpreted
as a request for 0 bytes.

Fixes ocaml-multicore#335.
@talex5 talex5 mentioned this issue Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants