Skip to content

Conversation

@bryancall
Copy link
Contributor

No description provided.

Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

@bryancall
Copy link
Contributor Author

I updated it to remove HashMD5.cc and HashMD5.h too. I didn't see those being used.

@bneradt
Copy link
Contributor

bneradt commented Jan 18, 2022

[approve ci autest]

@maskit maskit self-requested a review January 19, 2022 02:17
@maskit
Copy link
Member

maskit commented Jan 19, 2022

@bryancall Did you benchmark deprecated MD5 API and non-deprecated EVP API? I made a similar change for SHA functions, and kept the deprecated ones for better performance.
#7447 (comment)

Also, it seems like I missed the necessity of this change because FIPS was enabled when I tested build with OpenSSL 3. Can you update ☂️ issue #7341?

@bneradt
Copy link
Contributor

bneradt commented Jan 19, 2022

[approve ci autest]

@bryancall
Copy link
Contributor Author

@maskit There is no performance difference between the old and new APIs. I am shocked that there would be a difference with the SHA functions.

11:07:37 zeus:(master)~/files/src/c/test_code$ g++ benchmark_openssl_md5.cc -lbenchmark -lssl -lcrypto && ./a.out
2022-01-19T11:08:04-08:00
Running ./a.out
Run on (24 X 4672.07 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 512 KiB (x12)
  L3 Unified 16384 KiB (x4)
Load Average: 0.17, 0.28, 0.22
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
openssl_1_md5       1238 ns         1236 ns       566867
openssl_3_md5       1240 ns         1238 ns       565212

Here is my benchmark code:

#include <iostream>
#include <string>
#include <string_view>
#include <openssl/evp.h>
#include <openssl/md5.h>
#include <benchmark/benchmark.h>

//-----------------------------------------------------------------------------
class Data {
public:
  Data() {
    _data.reserve(_size);
    for (int i = 0; i < _size; i++) {
      _data.push_back((char)rand() % 256);
    }
  }

  void reset() {
    _position = 0;
  }

  std::string_view read() {
    if (_position + 1024 > _size) {
      _position = 0;
    }
    std::string_view tmp = _data.substr(_position, 1024);
    _position += 1024;
    return tmp;
  }

private:
  std::string _data;
  static constexpr size_t _size = 1 << 20;
  size_t _position = 0;
};

Data data;


//-----------------------------------------------------------------------------
class MD5_openssl1
{
private:
  MD5_CTX _ctx;
  unsigned char _hash[MD5_DIGEST_LENGTH];

public:
  MD5_openssl1()
  {
    MD5_Init(&_ctx);
  }

  bool update(void const *data, int length)
  {
    return 0 != MD5_Update(&_ctx, data, length);
  }

  unsigned char *finalize()
  {
    MD5_Final(_hash, &_ctx);
    return _hash;
  }
};

//-----------------------------------------------------------------------------
class MD5_openssl3
{
protected:
  EVP_MD_CTX *_ctx = nullptr;
  unsigned char _hash[MD5_DIGEST_LENGTH];

public:
  MD5_openssl3()
  {
    _ctx = EVP_MD_CTX_new();
    EVP_DigestInit_ex(_ctx, EVP_md5(), nullptr);
  }

  ~MD5_openssl3() { EVP_MD_CTX_free(_ctx); }

  bool update(void const *data, int length)
  {
    return EVP_DigestUpdate(_ctx, data, length);
  }

  unsigned char *finalize()
  {
    EVP_DigestFinal_ex(_ctx, _hash, nullptr);
    return _hash;
  }
};

//-----------------------------------------------------------------------------
static void openssl_1_md5(benchmark::State& state) {
  MD5_openssl1 md5;
  data.reset();
  for (auto _ : state) {
    std::string_view tmp = data.read();
    md5.update(tmp.data(), tmp.size());
  }
}

//-----------------------------------------------------------------------------
static void openssl_3_md5(benchmark::State& state) {
  MD5_openssl3 md5;
  data.reset();
  for (auto _ : state) {
    std::string_view tmp = data.read();
    md5.update(tmp.data(), tmp.size());
  }
}

//-----------------------------------------------------------------------------
BENCHMARK(openssl_1_md5);
BENCHMARK(openssl_3_md5);

BENCHMARK_MAIN();

@bryancall
Copy link
Contributor Author

I used the same type of benchmark to benchmark sha256 and I didn't see a difference between the old and new APIs:

11:14:46 zeus:(master)~/files/src/c/test_code$ g++ benchmark_openssl_sha.cc -lbenchmark -lssl -lcrypto && ./a.out
2022-01-19T11:15:09-08:00
Running ./a.out
Run on (24 X 4672.07 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 512 KiB (x12)
  L3 Unified 16384 KiB (x4)
Load Average: 0.02, 0.08, 0.14
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------
Benchmark                 Time             CPU   Iterations
-----------------------------------------------------------
openssl_1_sha256        513 ns          512 ns      1364991
openssl_3_sha256        513 ns          513 ns      1364948

Updated benchmark code:

#include <iostream>
#include <string>
#include <string_view>
#include <openssl/evp.h>
#include <openssl/sha.h>
#include <benchmark/benchmark.h>

//-----------------------------------------------------------------------------
class Data {
public:
  Data() {
    _data.reserve(_size);
    for (int i = 0; i < _size; i++) {
      _data.push_back((char)rand() % 256);
    }
  }

  void reset() {
    _position = 0;
  }

  std::string_view read() {
    if (_position + 1024 > _size) {
      _position = 0;
    }
    std::string_view tmp = _data.substr(_position, 1024);
    _position += 1024;
    return tmp;
  }

private:
  std::string _data;
  static constexpr size_t _size = 1 << 20;
  size_t _position = 0;
};

Data data;


//-----------------------------------------------------------------------------
class sha256_openssl1
{
private:
  SHA256_CTX _ctx;
  unsigned char _hash[SHA256_DIGEST_LENGTH];

public:
  sha256_openssl1()
  {
    SHA256_Init(&_ctx);
  }

  bool update(void const *data, int length)
  {
    return 0 != SHA256_Update(&_ctx, data, length);
  }

  unsigned char *finalize()
  {
    SHA256_Final(_hash, &_ctx);
    return _hash;
  }
};

//-----------------------------------------------------------------------------
class sha256_openssl3
{
protected:
  EVP_MD_CTX *_ctx = nullptr;
  unsigned char _hash[SHA256_DIGEST_LENGTH];

public:
  sha256_openssl3()
  {
    _ctx = EVP_MD_CTX_new();
    EVP_DigestInit_ex(_ctx, EVP_sha256(), nullptr);
  }

  ~sha256_openssl3() { EVP_MD_CTX_free(_ctx); }

  bool update(void const *data, int length)
  {
    return EVP_DigestUpdate(_ctx, data, length);
  }

  unsigned char *finalize()
  {
    EVP_DigestFinal_ex(_ctx, _hash, nullptr);
    return _hash;
  }
};

//-----------------------------------------------------------------------------
static void openssl_1_sha256(benchmark::State& state) {
  sha256_openssl1 sha256;
  data.reset();
  for (auto _ : state) {
    std::string_view tmp = data.read();
    sha256.update(tmp.data(), tmp.size());
  }
}

//-----------------------------------------------------------------------------
static void openssl_3_sha256(benchmark::State& state) {
  sha256_openssl3 sha256;
  data.reset();
  for (auto _ : state) {
    std::string_view tmp = data.read();
    sha256.update(tmp.data(), tmp.size());
  }
}

//-----------------------------------------------------------------------------
BENCHMARK(openssl_1_sha256);
BENCHMARK(openssl_3_sha256);

BENCHMARK_MAIN();

@bryancall
Copy link
Contributor Author

On my AMD 3900x server I am seeing sha256 being about 2x faster than md5. Which is similar to the benchmarks I wrote:

11:19:15 zeus:(master)~/files/src/c/test_code$ openssl speed md5 sha256
Doing md5 for 3s on 16 size blocks: 31318265 md5's in 2.99s
Doing md5 for 3s on 64 size blocks: 17335741 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 7575644 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 2331244 md5's in 2.99s
Doing md5 for 3s on 8192 size blocks: 312094 md5's in 3.00s
Doing md5 for 3s on 16384 size blocks: 156758 md5's in 3.00s
Doing sha256 for 3s on 16 size blocks: 50398283 sha256's in 2.99s
Doing sha256 for 3s on 64 size blocks: 31740526 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 16289859 sha256's in 3.00s
Doing sha256 for 3s on 1024 size blocks: 5539014 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 772385 sha256's in 2.99s
Doing sha256 for 3s on 16384 size blocks: 389558 sha256's in 3.00s
OpenSSL 1.1.1l  FIPS 24 Aug 2021
built on: Wed Sep 15 00:00:00 2021 UTC

@maskit
Copy link
Member

maskit commented Jan 19, 2022

Sounds good. This change should be fine then.

I found a difference between your benchmark and mine. Yours use the Init, Update and Finish functions, but mine use one instant function. That might affect the result. And my test was done with one of alpha versions, so they might have improved internal implementations at some point. I'll run the benchmarks on my box, and will make a PR if we no longer need to keep the deprecated one.

@bryancall bryancall merged commit 219b886 into apache:master Jan 20, 2022
@bryancall bryancall added this to the 10.0.0 milestone Feb 23, 2022
@zwoop
Copy link
Contributor

zwoop commented Mar 9, 2022

I think I'm seeing a memory leak with this patch, either we're leaking the MD5Context (and not calling the destructor), or we're leaking something else. Also, it seems incredibly unfortunate that it has to call malloc() on every cache key generation now, doesn't it ?

==2847515== 1,420,696 (460,896 direct, 959,800 indirect) bytes in 9,602 blocks are definitely lost in loss record 1,518 of 1,528
==2847515==    at 0x4842839: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2847515==    by 0x4C3C99D: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2847515==    by 0x48A7FC4: MD5Context (apache/trafficserver/include/tscore/MD5.h:38)
==2847515==    by 0x48A7FC4: ats::CryptoContext::CryptoContext() (apache/trafficserver/src/tscore/CryptoHash.cc:48)
==2847515==    by 0x666908: url_CryptoHash_get(URLImpl const*, ats::CryptoHash*, long) (apache/trafficserver/proxy/hdrs/URL.cc:1850)
==2847515==    by 0x56EA6D: hash_get (apache/trafficserver/proxy/hdrs/URL.h:495)
==2847515==    by 0x56EA6D: generate_key (apache/trafficserver/iocore/cache/P_CacheInternal.h:1037)
==2847515==    by 0x56EA6D: HttpSM::do_cache_lookup_and_read() (apache/trafficserver/proxy/http/HttpSM.cc:4768)
==2847515==    by 0x554FE1: HttpSM::state_read_client_request_header(int, void*) (apache/trafficserver/proxy/http/HttpSM.cc:0)
==2847515==    by 0x55372D: HttpSM::main_handler(int, void*) (apache/trafficserver/proxy/http/HttpSM.cc:0)
==2847515==    by 0x553A8F: do_api_callout (apache/trafficserver/proxy/http/HttpSM.cc:421)
==2847515==    by 0x553A8F: HttpSM::state_add_to_list(int, void*) (apache/trafficserver/proxy/http/HttpSM.cc:448)
==2847515==    by 0x5544E8: HttpSM::attach_client_session(ProxyTransaction*) (apache/trafficserver/proxy/http/HttpSM.cc:617)
==2847515==    by 0x540664: Http1ClientSession::new_transaction() (apache/trafficserver/proxy/http/Http1ClientSession.cc:472)
==2847515==    by 0x540253: Http1ClientSession::state_keep_alive(int, void*) (apache/trafficserver/proxy/http/Http1ClientSession.cc:381)
==2847515==    by 0x75D4BD: handleEvent (apache/trafficserver/iocore/eventsystem/I_Continuation.h:219)
==2847515==    by 0x75D4BD: read_signal_and_update(int, UnixNetVConnection*) (apache/trafficserver/iocore/net/UnixNetVConnection.cc:82)

I ran about 10k requests through here, and a similar leak happens a few hundred times as well, so fairly certain there's exactly one MD5context related leak for each request.

@maskit
Copy link
Member

maskit commented Apr 27, 2022

Linking the memory leak fix #8719

Because the fix has incompatible label, we should not backport this to 9.x, there's no need for backporting this in the first place though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants