Fix off-by-one in NextFreeFileDescriptor #202

okayzed · 2018-10-21T02:44:40Z

https://github.com/oilshell/oil/blob/master/core/process.py#L54

If i'm understanding this code correctly, the intended behavior is: loop over file descriptors until we find one that throws an error when we try to access it. when we find it, we return that filedescriptor.

The actual behavior: return that filedescriptor+1

This could come up as a problem if we load 2 file descriptors then close the first one. the impl would find the first one as available and then try to dup over the 2nd one.

Feel free to close if not an issue

andychu · 2018-10-21T03:27:51Z

Hm yes good point. That explains the comment...

The bug was that other libraries in the process were using file descriptors. So this fix was wrong. But I fixed it another way by not using those libraries (import random). But yes this function should probably be fixed just in case.

andychu · 2018-10-21T03:28:47Z

And that was the "hardest bug" I mentioned in the appendix here:

http://www.oilshell.org/blog/2018/10/11.html

I already hit that bug twice!

okayzed · 2018-10-26T15:13:20Z

If you can add a test case (or put one here) that repros the filedescriptor bug, it will make it easier to fix

andychu · 2018-10-27T07:35:06Z

I don't think this issue is observable right now. If nobody else is opening file descriptors in the process, there's no conflict.

But I'll just leave this open since it's probably a good cleanup.

okayzed · 2018-10-27T13:48:19Z

It might not get fixed properly without a repro or way of checking that it's fixed, even if this line of code is fixed. It's easy enough to change this seemingly wrong line, but unknown if that was causing issues in the first place

andychu · 2019-01-26T21:16:17Z

@okayzed

Oh man I'm in the middle of debugging a problem here and the root cause was in this code. The observable behavior was that somehow stdout was closed when running OSH unit tests under OSH, but not when running OSH unit tests under bash!

I think there are probably other conditions where it could trigger, but OSH inheriting its own file descriptors seems to be one of them.

I have made the bug go away by fixing this line, but it also pointed to me the fact that I'm using the syscalls somewhat wrong.

If you look at man fcntl, it points out that fcntl(..., F_DUPFD, ...) is different than dup2(). I was wondering what the difference was. Basically the former does the linear search for you -- it returns the next free descriptor. I was doing it in user space, which is suboptimal (and was wrong).

Good eye on this one!

In retrospect, it may or may not have made sense to fix earlier... it would have saved some time, but I probably wouldn't have understood the syscall issue as deeply. Sometimes you just have to bang your head into things to understand ...

The symtom was that running 'test/unit.sh all' under OSH failed. In core/completion_test.py, we run the line: echo "$@" >&2 under test_lib.EvalCode(), and the two OSH interpreters conflicted somehow, resulting in stdout being permanently closed. Then any remaining print() calls failed with EBADF. I got rid of the self.next_fd counter since it's not correct. Instead we do a linear search from fd #10. Now the tests pass. However now I realize I should be using fnctl(..., F_DUPFD, ...) and looking at the return value. It does the linear search for you in the kernel! Looking at dash source code clued me into this. Addresses issue #202.

okayzed · 2019-01-27T16:57:50Z

I'm glad something came of this!

re: fcntl: to clarify for my own understanding, it seems like using dup2 can lead to some race condition, where you check if a FD is available and it gets taken before you use it (if there is multithreading), while fcntl would prevent that situation

actually, i'm unsure of the above. if dup2 is returning the error, that would mean that the call to dup2 is what reserves the FD slot.

andychu · 2019-01-27T20:10:29Z

If there were two threads doing that at the same time, it would definitely lead to a race condition. But almost all shells are single-threaded, since they date back to the time before threads!

There is one area where I might want to use threads (to make completion responsive against slow user plugins), so that is something to watch out for.

andychu · 2019-02-04T18:47:18Z

closing in favor of #223

andychu changed the title ~~Question on NextFreeFileDescriptor impl.~~ Fix off-by-one in NextFreeFileDescriptor Oct 27, 2018

andychu mentioned this issue Jan 26, 2019

Test that file descriptor state is clean (Overhaul file descriptor handling) #223

Open

2 tasks

andychu closed this as completed Feb 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix off-by-one in NextFreeFileDescriptor #202

Fix off-by-one in NextFreeFileDescriptor #202

okayzed commented Oct 21, 2018

andychu commented Oct 21, 2018 •

edited

Loading

andychu commented Oct 21, 2018

okayzed commented Oct 26, 2018

andychu commented Oct 27, 2018

okayzed commented Oct 27, 2018

andychu commented Jan 26, 2019

okayzed commented Jan 27, 2019 •

edited

Loading

andychu commented Jan 27, 2019

andychu commented Feb 4, 2019

Fix off-by-one in NextFreeFileDescriptor #202

Fix off-by-one in NextFreeFileDescriptor #202

Comments

okayzed commented Oct 21, 2018

andychu commented Oct 21, 2018 • edited Loading

andychu commented Oct 21, 2018

okayzed commented Oct 26, 2018

andychu commented Oct 27, 2018

okayzed commented Oct 27, 2018

andychu commented Jan 26, 2019

okayzed commented Jan 27, 2019 • edited Loading

andychu commented Jan 27, 2019

andychu commented Feb 4, 2019

andychu commented Oct 21, 2018 •

edited

Loading

okayzed commented Jan 27, 2019 •

edited

Loading