Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: coalesce buffer instead of calling SSL_write() for each slice. #2680

Merged
merged 4 commits into from
Mar 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions source/common/network/connection_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -324,9 +324,9 @@ void ConnectionImpl::write(Buffer::Instance& data, bool end_stream) {
// TODO(mattklein123): All data currently gets moved from the source buffer to the write buffer.
// This can lead to inefficient behavior if writing a bunch of small chunks. In this case, it
// would likely be more efficient to copy data below a certain size. VERY IMPORTANT: If this is
// ever changed, read the comment in Ssl::ConnectionImpl::doWriteToSocket() VERY carefully.
// That code assumes that we never change existing write_buffer_ chain elements between calls
// to SSL_write(). That code will have to change if we ever copy here.
// ever changed, read the comment in SslSocket::doWrite() VERY carefully. That code assumes that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 does this part still apply? With this PR SslSocket::doWrite doesn't depend on each element but instead linearize first bytes_to_write bytes and pass it to SSL_write. So the code doesn't assume "we never change existing write_buffer_ chain elements between calls to SSL_write()", just assuming we never drain data from the buffer between calls to SSL_write(), no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can append, but you cannot drain, which is the same as before changes in this PR, AFAIK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC before this PR, a buffer {"abc"}, couldn't be {"abcde"} in this method, you can only add one or more slice then make it into {"abc","de"}. With this PR, either {"abcde"} or {"abc","de"} will work since you captures bytes_to_write in SslSocket::doWrite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @davidben, you can pass more data to the retry call (but it must start with the same data as the failed call), so {"abc"} -> {"abcde"} was also fine before this PR.

Capturing bytes_to_retry_ is really an optimization to avoid linearizing over data that's going to be sent over multiple TLS records that doesn't need to be combined together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah after reading @davidben's comment I noticed that {"abc"} -> {"abcde"} was also find before this PR, but I think the original intention is not to allow that case, no? @mattklein123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer needs to be the same by default but it's just a silly sanity check. If your buffer may be reallocated or whatever, set the moving buffer option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to think about it is it's like POSIX write where you either get told the write failed with a retryable error or it finished completely with N bytes written.

The difference is that write allows this:

write("hello") => EWOULDBLOCK
write("actually I changed my mind")

with the first input never making it on the wire. This shape of API doesn't compose well with TLS if we managed to write half a record. (Writing half a record does not mean you wrote half the plaintext. It means you half-wrote the full plaintext and can't change your mind.)

OpenSSL's API handles this case by returning EWOULDBLOCK (so it looks to you like nothing was written) and then says you are not allowed to change your mind on what you are writing. Once you pass input in, you gave committed to that input. It enforces this with some sanity checks on the length and optionally the pointer, but the actual rule is the "no rewinding" rule. You want to think about that one and not the length business.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 it doesn't have to be the same pointer and length, but the data needs to be the same, so that comment isn't completely wrong, IMHO. And the only reason why the pointer doesn't need to be the same is because SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER is set, but it doesn't look that there is a reason for it, so I'm going to remove it, which will make the comment a bit more accurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, as long as you checked the comment and you think it's good it's fine with me. Thank you all for the very lively and enlightening discussion on this. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

// we never change existing write_buffer_ chain elements between calls to SSL_write(). That code
// might need to change if we ever copy here.
write_buffer_->move(data);

// Activating a write event before the socket is connected has the side-effect of tricking
Expand Down
74 changes: 32 additions & 42 deletions source/common/ssl/ssl_socket.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ namespace Ssl {

SslSocket::SslSocket(Context& ctx, InitialState state)
: ctx_(dynamic_cast<Ssl::ContextImpl&>(ctx)), ssl_(ctx_.newSsl()) {
SSL_set_mode(ssl_.get(), SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
if (state == InitialState::Client) {
SSL_set_connect_state(ssl_.get());
} else {
Expand Down Expand Up @@ -153,57 +152,48 @@ Network::IoResult SslSocket::doWrite(Buffer::Instance& write_buffer, bool end_st
}
}

uint64_t original_buffer_length = write_buffer.length();
uint64_t bytes_to_write;
if (bytes_to_retry_) {
bytes_to_write = bytes_to_retry_;
bytes_to_retry_ = 0;
} else {
bytes_to_write = std::min(write_buffer.length(), static_cast<uint64_t>(16384));
}

uint64_t total_bytes_written = 0;
bool keep_writing = true;
while ((original_buffer_length != total_bytes_written) && keep_writing) {
// Protect against stack overflow if the buffer has a very large buffer chain.
// TODO(mattklein123): See the comment on getRawSlices() for why we have to also check
// original_buffer_length != total_bytes_written during loop iteration.
while (bytes_to_write > 0) {
// TODO(mattklein123): As it relates to our fairness efforts, we might want to limit the number
// of iterations of this loop, either by pure iterations, bytes written, etc.
const uint64_t MAX_SLICES = 32;
Buffer::RawSlice slices[MAX_SLICES];
uint64_t num_slices = write_buffer.getRawSlices(slices, MAX_SLICES);

uint64_t inner_bytes_written = 0;
for (uint64_t i = 0; (i < num_slices) && (original_buffer_length != total_bytes_written); i++) {
// SSL_write() requires that if a previous call returns SSL_ERROR_WANT_WRITE, we need to call
// it again with the same parameters. Most implementations keep track of the last write size.
// In our case we don't need to do that because: a) SSL_write() will not write partial
// buffers. b) We only move() into the write buffer, which means that it's impossible for a
// particular chain to increase in size. So as long as we start writing where we left off we
// are guaranteed to call SSL_write() with the same parameters.
int rc = SSL_write(ssl_.get(), slices[i].mem_, slices[i].len_);
ENVOY_CONN_LOG(trace, "ssl write returns: {}", callbacks_->connection(), rc);
if (rc > 0) {
inner_bytes_written += rc;
total_bytes_written += rc;
} else {
int err = SSL_get_error(ssl_.get(), rc);
switch (err) {
case SSL_ERROR_WANT_WRITE:
keep_writing = false;
break;
case SSL_ERROR_WANT_READ:
// Renegotiation has started. We don't handle renegotiation so just fall through.
default:
drainErrorQueue();
return {PostIoAction::Close, total_bytes_written, false};
}

// SSL_write() requires that if a previous call returns SSL_ERROR_WANT_WRITE, we need to call
// it again with the same parameters. This is done by tracking last write size, but not write
// data, since linearize() will return the same undrained data anyway.
ASSERT(bytes_to_write <= write_buffer.length());
int rc = SSL_write(ssl_.get(), write_buffer.linearize(bytes_to_write), bytes_to_write);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid openssl/boringssl question: I don't suppose there is some writev equivalent? Seems a shame we need to linearize here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these is one? Can we tag off-project people? Maybe @davidben knows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check: if we linearize multiple times it only does the work one time, correct? I want to make sure there's no corner case where trickling bites gets super expensive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately boringssl doesn't have a built in writev equivalent. You could potentially create a fake FD to use writev with and then dump that into SSL_write, but that probably has more overhead/complexity than linearize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have plans to add an in-place API so you don't need to copy things in and out of OpenSSL's goofy fd abstraction, though probably the first iteration will still require it be linearized. writev is a little tricky with block ciphers. (Though that work will make it easier to add something like writev later.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 Done.

@alyssawilk Yes, linearization will happen only once (for the same size argument).

ENVOY_CONN_LOG(trace, "ssl write returns: {}", callbacks_->connection(), rc);
if (rc > 0) {
ASSERT(rc == static_cast<int>(bytes_to_write));
total_bytes_written += rc;
write_buffer.drain(rc);
bytes_to_write = std::min(write_buffer.length(), static_cast<uint64_t>(16384));
} else {
int err = SSL_get_error(ssl_.get(), rc);
switch (err) {
case SSL_ERROR_WANT_WRITE:
bytes_to_retry_ = bytes_to_write;
break;
case SSL_ERROR_WANT_READ:
// Renegotiation has started. We don't handle renegotiation so just fall through.
default:
drainErrorQueue();
return {PostIoAction::Close, total_bytes_written, false};
}
}

// Draining must be done within the inner loop, otherwise we will keep getting the same slices
// at the beginning of the buffer.
if (inner_bytes_written > 0) {
write_buffer.drain(inner_bytes_written);
break;
}
}

if (total_bytes_written == original_buffer_length && end_stream) {
if (write_buffer.length() == 0 && end_stream) {
shutdownSsl();
}

Expand Down
1 change: 1 addition & 0 deletions source/common/ssl/ssl_socket.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ class SslSocket : public Network::TransportSocket,
bssl::UniquePtr<SSL> ssl_;
bool handshake_complete_{};
bool shutdown_sent_{};
uint64_t bytes_to_retry_{};
mutable std::string cached_sha_256_peer_certificate_digest_;
mutable std::string cached_url_encoded_pem_encoded_peer_certificate_;
};
Expand Down