Skip to content

Conversation

@masaori335
Copy link
Contributor

@masaori335 masaori335 commented Jan 20, 2020

Prior to this change, HTTP/2 was almost 30% slower than HTTP/1.1 (over TLS) on downloading a huge file (over 1GB).

Improvements:

  • Avoid unnecessary IOBufferBlock allocation for all type of frame
  • Avoid unnecessary copy on sending DATA frame
  • Adjust IOBufferBlock size of Http2ClientSession::write_buffer

Cleanups:

  • Decouple receiving & sending HTTP/2 Frame
  • Remove unnecessary SCOPED_MUTEX_LOCK

Another approach of #5916.
Some features like adding padding are unimplemented as is.

@randall
Copy link
Contributor

randall commented Jan 20, 2020

[approve ci autest]

@maskit
Copy link
Member

maskit commented Jan 21, 2020

Another approach of #5916.

It seems like #5961 also have an improvement for send_a_data_frame. Does this PR completely replace #5916, or do we need it also?

@masaori335
Copy link
Contributor Author

Another approach of #5916.

It seems like #5961 also have an improvement for send_a_data_frame. Does this PR completely replace #5916, or do we need it also?

IIUC, this covers all that #5916 tries to fix. I'd like @shinrich to make sure.

@masaori335 masaori335 force-pushed the h2-perf-frame branch 2 times, most recently from 3a7ffb0 to 689e8ac Compare January 21, 2020 23:23
Prior to this change, HTTP/2 was almost 30% slower than HTTP/1.1 (over TLS) on downloading a huge file (over 1GB).

Improvements:
- Avoid unnecessary IOBufferBlock allocation for all type of frame
- Avoid unnecessary copy on sending DATA frame
- Adjust IOBufferBlock size of Http2ClientSession::write_buffer

Cleanups:
- Decouple receiving & sending HTTP/2 Frame
- Remove unnecessary SCOPED_MUTEX_LOCK
written += iobuffer->write(this->_reader->start(), read_len);
this->_reader->consume(read_len);
}
len += written;
Copy link
Member

@shinrich shinrich Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that we were playing with different versions of the Data frame writing. One that wrote to contiguous buffers and passed that along to SSL_write(). Another that just passed along pointers from the original buffer (iobuffer in the case) and used the block pointers directly to write to SSL_write. Did you find that performance was better with the extra intermediate copy to hopefully get bigger blocks?

PR #5897 has the logic that was testing writing to the SSL_write directly from the iobuffer blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared these two approaches and the first approach is so much better.
The second one looks cool because of "no copy". However, it didn't improve performance as expected.
SSL_write() is expensive than memcpy(), so reducing SSL_write() calls is key point, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of the memcpy() when SSL/TLS libraries provide lower APIs which slice SSL_write() functionalities.

Actually, our QUIC implementation keeps the IOBufferBlock chain to avoid memcpy(). Unfortunately, we can't take the same approach here, because it's using the EVP cipher functions directly.

bool
QUICPacketPayloadProtector::_protect(uint8_t *cipher, size_t &cipher_len, size_t max_cipher_len, const Ptr<IOBufferBlock> plain,
uint64_t pkt_num, const uint8_t *ad, size_t ad_len, const uint8_t *key, const uint8_t *iv,
size_t iv_len, const EVP_CIPHER *aead, size_t tag_len) const
{
EVP_CIPHER_CTX *aead_ctx;
int len;
uint8_t nonce[EVP_MAX_IV_LENGTH] = {0};
size_t nonce_len = 0;
this->_gen_nonce(nonce, nonce_len, pkt_num, iv, iv_len);
if (!(aead_ctx = EVP_CIPHER_CTX_new())) {
return false;
}
if (!EVP_EncryptInit_ex(aead_ctx, aead, nullptr, nullptr, nullptr)) {
return false;
}
if (!EVP_CIPHER_CTX_ctrl(aead_ctx, EVP_CTRL_AEAD_SET_IVLEN, nonce_len, nullptr)) {
return false;
}
if (!EVP_EncryptInit_ex(aead_ctx, nullptr, nullptr, key, nonce)) {
return false;
}
if (!EVP_EncryptUpdate(aead_ctx, nullptr, &len, ad, ad_len)) {
return false;
}
cipher_len = 0;
Ptr<IOBufferBlock> b = plain;
while (b) {
if (!EVP_EncryptUpdate(aead_ctx, cipher + cipher_len, &len, reinterpret_cast<unsigned char *>(b->buf()), b->size())) {
return false;
}
cipher_len += len;
b = b->next;
}
if (!EVP_EncryptFinal_ex(aead_ctx, cipher + cipher_len, &len)) {
return false;
}
cipher_len += len;
if (max_cipher_len < cipher_len + tag_len) {
return false;
}
if (!EVP_CIPHER_CTX_ctrl(aead_ctx, EVP_CTRL_AEAD_GET_TAG, tag_len, cipher + cipher_len)) {
return false;
}
cipher_len += tag_len;
EVP_CIPHER_CTX_free(aead_ctx);
return true;
}

Copy link
Member

@shinrich shinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm glad that we aren't losing the work we did on HTTP/2 performance improvements. What was the % performance comparison for the 1GB download after making these changes?

@masaori335
Copy link
Contributor Author

HTTP/2 performance becomes almost the same as HTTP/1.1 with this PR.

  master-h1 master-h2 6337-h2
90 %tile 1.018559 1.309419 1.027845
80 %tile 1.016833 1.301963 1.017787

Measured total time of downloading 1GB file from local box like below.

$ cat curlformat
%{time_total}\n
$ for x in {0..10}; do curl --http1.1 -s -k "https://127.0.0.1:4443/1gb" -o /dev/null -w @curlformat >> master-h1-1gb.log; done
$ for x in {0..10}; do curl -s -k "https://127.0.0.1:4443/1gb" -o /dev/null -w @curlformat >> master-h1-1gb.log; done

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@masaori335 masaori335 merged commit 48bcbe6 into apache:master Jan 24, 2020
@zwoop zwoop added the OnDocs This is for PR currently running, or will run, on the Docs ATS server label Feb 21, 2020
@zwoop
Copy link
Contributor

zwoop commented Feb 27, 2020

Cherry-picked to v9.0.x branch.

@zwoop zwoop modified the milestones: 10.0.0, 9.0.0 Feb 27, 2020
@zwoop zwoop removed the OnDocs This is for PR currently running, or will run, on the Docs ATS server label Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants