-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: coalesce buffer instead of calling SSL_write() for each slice. #2680
Changes from 3 commits
eec4ef4
5671737
dc36746
fdea6a2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,6 @@ namespace Ssl { | |
|
||
SslSocket::SslSocket(Context& ctx, InitialState state) | ||
: ctx_(dynamic_cast<Ssl::ContextImpl&>(ctx)), ssl_(ctx_.newSsl()) { | ||
SSL_set_mode(ssl_.get(), SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER); | ||
if (state == InitialState::Client) { | ||
SSL_set_connect_state(ssl_.get()); | ||
} else { | ||
|
@@ -153,57 +152,48 @@ Network::IoResult SslSocket::doWrite(Buffer::Instance& write_buffer, bool end_st | |
} | ||
} | ||
|
||
uint64_t original_buffer_length = write_buffer.length(); | ||
uint64_t bytes_to_write; | ||
if (bytes_to_retry_) { | ||
bytes_to_write = bytes_to_retry_; | ||
bytes_to_retry_ = 0; | ||
} else { | ||
bytes_to_write = std::min(write_buffer.length(), static_cast<uint64_t>(16384)); | ||
} | ||
|
||
uint64_t total_bytes_written = 0; | ||
bool keep_writing = true; | ||
while ((original_buffer_length != total_bytes_written) && keep_writing) { | ||
// Protect against stack overflow if the buffer has a very large buffer chain. | ||
// TODO(mattklein123): See the comment on getRawSlices() for why we have to also check | ||
// original_buffer_length != total_bytes_written during loop iteration. | ||
while (bytes_to_write) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. style: while (bytes_to_write > 0) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
// TODO(mattklein123): As it relates to our fairness efforts, we might want to limit the number | ||
// of iterations of this loop, either by pure iterations, bytes written, etc. | ||
const uint64_t MAX_SLICES = 32; | ||
Buffer::RawSlice slices[MAX_SLICES]; | ||
uint64_t num_slices = write_buffer.getRawSlices(slices, MAX_SLICES); | ||
|
||
uint64_t inner_bytes_written = 0; | ||
for (uint64_t i = 0; (i < num_slices) && (original_buffer_length != total_bytes_written); i++) { | ||
// SSL_write() requires that if a previous call returns SSL_ERROR_WANT_WRITE, we need to call | ||
// it again with the same parameters. Most implementations keep track of the last write size. | ||
// In our case we don't need to do that because: a) SSL_write() will not write partial | ||
// buffers. b) We only move() into the write buffer, which means that it's impossible for a | ||
// particular chain to increase in size. So as long as we start writing where we left off we | ||
// are guaranteed to call SSL_write() with the same parameters. | ||
int rc = SSL_write(ssl_.get(), slices[i].mem_, slices[i].len_); | ||
ENVOY_CONN_LOG(trace, "ssl write returns: {}", callbacks_->connection(), rc); | ||
if (rc > 0) { | ||
inner_bytes_written += rc; | ||
total_bytes_written += rc; | ||
} else { | ||
int err = SSL_get_error(ssl_.get(), rc); | ||
switch (err) { | ||
case SSL_ERROR_WANT_WRITE: | ||
keep_writing = false; | ||
break; | ||
case SSL_ERROR_WANT_READ: | ||
// Renegotiation has started. We don't handle renegotiation so just fall through. | ||
default: | ||
drainErrorQueue(); | ||
return {PostIoAction::Close, total_bytes_written, false}; | ||
} | ||
|
||
// SSL_write() requires that if a previous call returns SSL_ERROR_WANT_WRITE, we need to call | ||
// it again with the same parameters. This is done by tracking last write size, but not write | ||
// data, since linearize() will return the same undrained data anyway. | ||
ASSERT(bytes_to_write <= write_buffer.length()); | ||
int rc = SSL_write(ssl_.get(), write_buffer.linearize(bytes_to_write), bytes_to_write); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. stupid openssl/boringssl question: I don't suppose there is some writev equivalent? Seems a shame we need to linearize here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think these is one? Can we tag off-project people? Maybe @davidben knows There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sanity check: if we linearize multiple times it only does the work one time, correct? I want to make sure there's no corner case where trickling bites gets super expensive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately boringssl doesn't have a built in writev equivalent. You could potentially create a fake FD to use writev with and then dump that into SSL_write, but that probably has more overhead/complexity than linearize. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do have plans to add an in-place API so you don't need to copy things in and out of OpenSSL's goofy fd abstraction, though probably the first iteration will still require it be linearized. writev is a little tricky with block ciphers. (Though that work will make it easier to add something like writev later.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mattklein123 Done. @alyssawilk Yes, linearization will happen only once (for the same |
||
ENVOY_CONN_LOG(trace, "ssl write returns: {}", callbacks_->connection(), rc); | ||
if (rc > 0) { | ||
ASSERT(rc == static_cast<int>(bytes_to_write)); | ||
total_bytes_written += rc; | ||
write_buffer.drain(rc); | ||
bytes_to_write = std::min(write_buffer.length(), static_cast<uint64_t>(16384)); | ||
} else { | ||
int err = SSL_get_error(ssl_.get(), rc); | ||
switch (err) { | ||
case SSL_ERROR_WANT_WRITE: | ||
bytes_to_retry_ = bytes_to_write; | ||
break; | ||
case SSL_ERROR_WANT_READ: | ||
// Renegotiation has started. We don't handle renegotiation so just fall through. | ||
default: | ||
drainErrorQueue(); | ||
return {PostIoAction::Close, total_bytes_written, false}; | ||
} | ||
} | ||
|
||
// Draining must be done within the inner loop, otherwise we will keep getting the same slices | ||
// at the beginning of the buffer. | ||
if (inner_bytes_written > 0) { | ||
write_buffer.drain(inner_bytes_written); | ||
break; | ||
} | ||
} | ||
|
||
if (total_bytes_written == original_buffer_length && end_stream) { | ||
if (write_buffer.length() == 0 && end_stream) { | ||
shutdownSsl(); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattklein123 does this part still apply? With this PR
SslSocket::doWrite
doesn't depend on each element but instead linearize firstbytes_to_write
bytes and pass it toSSL_write
. So the code doesn't assume "we never change existing write_buffer_ chain elements between calls to SSL_write()", just assuming we never drain data from the buffer between calls to SSL_write(), no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can append, but you cannot drain, which is the same as before changes in this PR, AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC before this PR, a buffer
{"abc"}
, couldn't be{"abcde"}
in this method, you can only add one or more slice then make it into{"abc","de"}
. With this PR, either{"abcde"}
or{"abc","de"}
will work since you capturesbytes_to_write
in SslSocket::doWrite.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @davidben, you can pass more data to the retry call (but it must start with the same data as the failed call), so
{"abc"}
->{"abcde"}
was also fine before this PR.Capturing
bytes_to_retry_
is really an optimization to avoid linearizing over data that's going to be sent over multiple TLS records that doesn't need to be combined together.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah after reading @davidben's comment I noticed that
{"abc"} -> {"abcde"}
was also find before this PR, but I think the original intention is not to allow that case, no? @mattklein123There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pointer needs to be the same by default but it's just a silly sanity check. If your buffer may be reallocated or whatever, set the moving buffer option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way to think about it is it's like POSIX write where you either get told the write failed with a retryable error or it finished completely with N bytes written.
The difference is that write allows this:
with the first input never making it on the wire. This shape of API doesn't compose well with TLS if we managed to write half a record. (Writing half a record does not mean you wrote half the plaintext. It means you half-wrote the full plaintext and can't change your mind.)
OpenSSL's API handles this case by returning EWOULDBLOCK (so it looks to you like nothing was written) and then says you are not allowed to change your mind on what you are writing. Once you pass input in, you gave committed to that input. It enforces this with some sanity checks on the length and optionally the pointer, but the actual rule is the "no rewinding" rule. You want to think about that one and not the length business.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattklein123 it doesn't have to be the same pointer and length, but the data needs to be the same, so that comment isn't completely wrong, IMHO. And the only reason why the pointer doesn't need to be the same is because
SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER
is set, but it doesn't look that there is a reason for it, so I'm going to remove it, which will make the comment a bit more accurate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, as long as you checked the comment and you think it's good it's fine with me. Thank you all for the very lively and enlightening discussion on this. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.