-
Notifications
You must be signed in to change notification settings - Fork 844
Perf: Optimize sending HTTP/2 frame #6337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1125332 to
29d2048
Compare
|
[approve ci autest] |
3a7ffb0 to
689e8ac
Compare
Prior to this change, HTTP/2 was almost 30% slower than HTTP/1.1 (over TLS) on downloading a huge file (over 1GB). Improvements: - Avoid unnecessary IOBufferBlock allocation for all type of frame - Avoid unnecessary copy on sending DATA frame - Adjust IOBufferBlock size of Http2ClientSession::write_buffer Cleanups: - Decouple receiving & sending HTTP/2 Frame - Remove unnecessary SCOPED_MUTEX_LOCK
689e8ac to
11786c4
Compare
| written += iobuffer->write(this->_reader->start(), read_len); | ||
| this->_reader->consume(read_len); | ||
| } | ||
| len += written; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that we were playing with different versions of the Data frame writing. One that wrote to contiguous buffers and passed that along to SSL_write(). Another that just passed along pointers from the original buffer (iobuffer in the case) and used the block pointers directly to write to SSL_write. Did you find that performance was better with the extra intermediate copy to hopefully get bigger blocks?
PR #5897 has the logic that was testing writing to the SSL_write directly from the iobuffer blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compared these two approaches and the first approach is so much better.
The second one looks cool because of "no copy". However, it didn't improve performance as expected.
SSL_write() is expensive than memcpy(), so reducing SSL_write() calls is key point, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of the memcpy() when SSL/TLS libraries provide lower APIs which slice SSL_write() functionalities.
Actually, our QUIC implementation keeps the IOBufferBlock chain to avoid memcpy(). Unfortunately, we can't take the same approach here, because it's using the EVP cipher functions directly.
trafficserver/iocore/net/quic/QUICPacketPayloadProtector_openssl.cc
Lines 30 to 84 in 504cc9f
| bool | |
| QUICPacketPayloadProtector::_protect(uint8_t *cipher, size_t &cipher_len, size_t max_cipher_len, const Ptr<IOBufferBlock> plain, | |
| uint64_t pkt_num, const uint8_t *ad, size_t ad_len, const uint8_t *key, const uint8_t *iv, | |
| size_t iv_len, const EVP_CIPHER *aead, size_t tag_len) const | |
| { | |
| EVP_CIPHER_CTX *aead_ctx; | |
| int len; | |
| uint8_t nonce[EVP_MAX_IV_LENGTH] = {0}; | |
| size_t nonce_len = 0; | |
| this->_gen_nonce(nonce, nonce_len, pkt_num, iv, iv_len); | |
| if (!(aead_ctx = EVP_CIPHER_CTX_new())) { | |
| return false; | |
| } | |
| if (!EVP_EncryptInit_ex(aead_ctx, aead, nullptr, nullptr, nullptr)) { | |
| return false; | |
| } | |
| if (!EVP_CIPHER_CTX_ctrl(aead_ctx, EVP_CTRL_AEAD_SET_IVLEN, nonce_len, nullptr)) { | |
| return false; | |
| } | |
| if (!EVP_EncryptInit_ex(aead_ctx, nullptr, nullptr, key, nonce)) { | |
| return false; | |
| } | |
| if (!EVP_EncryptUpdate(aead_ctx, nullptr, &len, ad, ad_len)) { | |
| return false; | |
| } | |
| cipher_len = 0; | |
| Ptr<IOBufferBlock> b = plain; | |
| while (b) { | |
| if (!EVP_EncryptUpdate(aead_ctx, cipher + cipher_len, &len, reinterpret_cast<unsigned char *>(b->buf()), b->size())) { | |
| return false; | |
| } | |
| cipher_len += len; | |
| b = b->next; | |
| } | |
| if (!EVP_EncryptFinal_ex(aead_ctx, cipher + cipher_len, &len)) { | |
| return false; | |
| } | |
| cipher_len += len; | |
| if (max_cipher_len < cipher_len + tag_len) { | |
| return false; | |
| } | |
| if (!EVP_CIPHER_CTX_ctrl(aead_ctx, EVP_CTRL_AEAD_GET_TAG, tag_len, cipher + cipher_len)) { | |
| return false; | |
| } | |
| cipher_len += tag_len; | |
| EVP_CIPHER_CTX_free(aead_ctx); | |
| return true; | |
| } |
shinrich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'm glad that we aren't losing the work we did on HTTP/2 performance improvements. What was the % performance comparison for the 1GB download after making these changes?
|
HTTP/2 performance becomes almost the same as HTTP/1.1 with this PR.
Measured total time of downloading 1GB file from local box like below. |
maskit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
Cherry-picked to v9.0.x branch. |
Prior to this change, HTTP/2 was almost 30% slower than HTTP/1.1 (over TLS) on downloading a huge file (over 1GB).
Improvements:
Cleanups:
Another approach of #5916.
Some features like adding padding are unimplemented as is.