Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in clearing write buffer bubbles up as an array IndexError #452

Closed
niceking opened this issue Aug 30, 2021 · 13 comments
Closed

Error in clearing write buffer bubbles up as an array IndexError #452

niceking opened this issue Aug 30, 2021 · 13 comments

Comments

@niceking
Copy link

Hello, I've got a websocket implementation in a gem, and occasionally on a write I'm seeing a IndexError: negative length -1?

Screen Shot 2021-08-30 at 6 00 42 PM

It seems that the nwrote variable in this method is set to -1 somehow and it's bubbling up as an array error.

I'm on Ruby 2.7.2 and OpenSSL library version 2.1.2 (not quite sure how that version number works as that nwrote var was introduced in 2.2.0 🤷‍♀️)

@rhenium
Copy link
Member

rhenium commented Aug 30, 2021

Is SSLSocket#write (or #flush, #print, etc.) called from multiple threads? Please note that the buffered IO is currently not thread-safe.

However, the way it breaks is unexpected -- SSLSocket#syswrite should never return a negative number. This seems to be because #syswrite isn't properly protecting the string buffer from being modified (emptied) by another thread while waiting for the underlying socket.

@niceking
Copy link
Author

niceking commented Aug 31, 2021

Oh ok I didn't realise about the thread safety. I think there is a possible code path in my gem's implementation which would mean that SSLSocket#write is called from multiple threads, I'll do some debugging and see if I can address it on my end.

In the mean time do you think there'll be a fix for SSLSocket#syswrite returning a negative number?

@niceking
Copy link
Author

Hmm I did some poking around I don't think there's anywhere in the gem that could be calling write from multiple threads. We are using the same socket in multiple threads, but one only reads from it and the other only writes from it. Is there any other scenario that could trigger that -1 being returned?

@ioquatix
Copy link
Member

OpenSSL can read and write from the underlying socket even for only "read" or "write" operations separately. So socket.read might actually write to the socket (at least in my understanding).

Therefore, I don't believe it's safe to read and write from separate threads, at least in this case.

You could confirm this by adding a Thread::Mutex around the read and write operations to guard the socket.

rhenium added a commit to rhenium/ruby-openssl that referenced this issue Sep 1, 2021
Since a blocking SSLSocket#syswrite call allows context switches while
waiting for the underlying socket to be ready, we must freeze the string
buffer to prevent other threads from modifying it.

Reference: ruby#452
@rhenium
Copy link
Member

rhenium commented Sep 1, 2021

#453 should fix the -1 return from #syswrite (and #sysread which had a similar problem).

rhenium added a commit to rhenium/ruby-openssl that referenced this issue Sep 1, 2021
Since a blocking SSLSocket#syswrite call allows context switches while
waiting for the underlying socket to be ready, we must freeze the string
buffer to prevent other threads from modifying it.

Reference: ruby#452
rhenium added a commit to rhenium/ruby-openssl that referenced this issue Sep 1, 2021
Since a blocking SSLSocket#syswrite call allows context switches while
waiting for the underlying socket to be ready, we must freeze the string
buffer to prevent other threads from modifying it.

Reference: ruby#452
@niceking
Copy link
Author

niceking commented Sep 6, 2021

Hey @rhenium thanks for the fix! I read through your PR but my 2 years as a C developer 10 years ago wasn't much help for understanding it 😛

I'm wondering what the change means in the context of the error I experienced? Given that the string buffer will be frozen, if another thread attempts to write to the same socket, will it block and wait? Or will the information it is attempting to write be discarded? Or will it throw a different type of error? And does this bring the SSL Socket read and write ops closer to being thread safe?

I'm a little surprised by your comment @ioquatix that reading and writing from the same socket isn't thread safe. I'm using readpartial not read but I assume your comment still applies? Like given the underlying socket technology is full duplex, and given the asynchronous nature of data being streamed, feels like having one thread to handle reading and another to handle writing to the same socket would be a pretty common use case to support??

@ioquatix
Copy link
Member

ioquatix commented Sep 6, 2021

There is no guarantee that reading and writing to the same OpenSSL socket is thread safe. Sure, the underling OS socket might be... but that's not what you are reading and writing from here.

@ioquatix
Copy link
Member

ioquatix commented Sep 6, 2021

reading and writing from the same socket isn't thread safe

I'm not sure anyone could make this guarantee at the Ruby level. There are a lot of edge cases in Ruby's IO layer that the GVL usually protects against but I've personally noticed some nasty edge cases.

@niceking
Copy link
Author

niceking commented Sep 6, 2021

Yeah fair enough! We have built some retry/confirm stuff as part of this so probably better to ensure that works better than relying on writes to always be successful

@ioquatix
Copy link
Member

ioquatix commented Sep 7, 2021

If you want something which is definitely safe and predictable, you could use async. You can embed async into other programs, but you can't share the same socket between threads (same issues apply). But you can read and write to the same socket using two fibers which is 100% safe.

rhenium added a commit to rhenium/ruby-openssl that referenced this issue Sep 26, 2021
Since a blocking SSLSocket#syswrite call allows context switches while
waiting for the underlying socket to be ready, we must freeze the string
buffer to prevent other threads from modifying it.

Reference: ruby#452
matzbot pushed a commit to ruby/ruby that referenced this issue Oct 16, 2021
Since a blocking SSLSocket#syswrite call allows context switches while
waiting for the underlying socket to be ready, we must freeze the string
buffer to prevent other threads from modifying it.

Reference: ruby/openssl#452

ruby/openssl@aea874bc6e
@JuanitoFatas
Copy link
Member

JuanitoFatas commented Oct 19, 2021

@rhenium Thanks so much for fixing!! Which version of OpenSSL gem could we use your fix? 😊 Is it possible to release a new version?

@rhenium
Copy link
Member

rhenium commented Oct 19, 2021

I pushed openssl gem v2.1.3 and v2.2.1 a few days ago, which contain #453. (changelog: https://github.com/ruby/openssl/blob/2b3b29b973af9ae2433aca6f9a0a7653a48434c2/History.md)

@rhenium
Copy link
Member

rhenium commented Oct 19, 2021

IndexError: negative length -1 error will not appear in v2.1.3/v2.2.1 anymore.

Note that, however, the error showing up means your code is writing to an SSLSocket from multiple threads. This is not a supported operation and you should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants