Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS 1.3 connection is unstable in muti-thread env #7979

Closed
Taowyoo opened this issue Jul 24, 2023 · 7 comments
Closed

TLS 1.3 connection is unstable in muti-thread env #7979

Taowyoo opened this issue Jul 24, 2023 · 7 comments
Assignees
Labels
bug component-tls13 size-l Estimated task size: large (2w+)

Comments

@Taowyoo
Copy link
Contributor

Taowyoo commented Jul 24, 2023

Summary

This error found from our rust-wrapper of mbedtls, Ref: fortanix/rust-mbedtls#301

TLS 1.3 connection is unstable(sometimes break) in multi-thread env:

  1. run server in one thread/process
  2. spawn multiple thread/process of client to connect server continuously
  3. Then. sometimes the TLS connection will break, the error is random

System information

Mbed TLS version (number or commit id): 17526a0d168276aa3ba5833cbb1f8fcd69d688ba
Operating system and version: Linux yuxiangcao-ThinkPad-T14-Gen-1 5.15.0-79-generic #86~20.04.2-Ubuntu SMP Mon Jul 17 23:27:17 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Configuration (if not default, please attach mbedtls_config.h):
Compiler and options (if you used a pre-built binary, please indicate how you obtained it):
Additional environment information:

Expected behavior

No error for each requests.

Actual behavior

Random error is returned, including:

  • -0x0001 - ERROR - Generic error
  • -0x006e - ERROR - This is a bug in the library
  • -0x7280 - SSL - The connection indicated an EOF
  • -0x6e00 - SSL - The handshake negotiation failed
  • core dumped

Steps to reproduce

  1. Set config

    scripts/config.py set MBEDTLS_SSL_PROTO_TLS1_3
    scripts/config.py set MBEDTLS_SSL_TLS1_3_COMPATIBILITY_MODE
    scripts/config.py set MBEDTLS_THREADING_PTHREAD
    scripts/config.py set MBEDTLS_THREADING_C
  2. Update ./mbedtls-sys/vendor/programs/ssl/ssl_pthread_server.c with https://gist.github.com/Taowyoo/e2a90ed25bf299500074fdf03e67a050#file-ssl_pthread_server-c

  3. Build and run example, example starts a server with 5 threads and spawn 5 threads of clients to connect it.

    mkdir -p build
    cmake -S . -B ./build -DENABLE_PROGRAMS=ON -DENABLE_TESTING=OFF && cmake --build ./build
    ./build/programs/ssl/ssl_pthread_server

In ~70%, the client/server will failed with error.

Additional information

The script above can run successfully with TLS 1.2 or with TLS 1.3 + one thread for several time, but for TLS 1.3 + multi-thread the script failed in most cases.

@Taowyoo Taowyoo changed the title TLS 1.3 is unstable in muti-thread env TLS 1.3 connection is unstable in muti-thread env Jul 24, 2023
@paul-elliott-arm paul-elliott-arm added bug component-tls13 size-l Estimated task size: large (2w+) labels Jul 25, 2023
@paul-elliott-arm
Copy link
Member

Hi, (better) multi-thread support is something which we are currently looking at, although from a brief look its difficult to tell which part is actually failing here. We are hoping to dramatically improve the situation before the release of 4.0

@paul-elliott-arm paul-elliott-arm self-assigned this Jul 25, 2023
@Taowyoo
Copy link
Contributor Author

Taowyoo commented Jul 25, 2023

Thank you for follow up.
Has just updated the code in issue description for reproducing the error. It setup a server with supporting 5 threads and then spawn 5 threads of clients to connect the server.

@Taowyoo
Copy link
Contributor Author

Taowyoo commented Jul 25, 2023

After some debug, I finally found the type of most error is -0x0001 - ERROR - Generic error , and finally trace back to this code block in psa_destroy_key in library/psa_crypto.c :

mbedtls/library/psa_crypto.c

Lines 1094 to 1104 in 6f37495

/*
* If the key slot containing the key description is under access by the
* library (apart from the present access), the key cannot be destroyed
* yet. For the time being, just return in error. Eventually (to be
* implemented), the key should be destroyed when all accesses have
* stopped.
*/
if (slot->lock_count > 1) {
psa_unlock_key_slot(slot);
return PSA_ERROR_GENERIC_ERROR;
}

This function is used in TLS 1.3 in ssl_tls13_calc_finished_core which is heavily used by mbedtls_ssl_tls13_calculate_verify_data.

Then the problem becomes very clear: current PSA key slot does not works well in multi-thread.

Side note: after searching lock_count in PSA code, the error code is confusing and cannot express that such error is related to thread/lock

@Taowyoo
Copy link
Contributor Author

Taowyoo commented Jul 25, 2023

Link: #3263

@daverodgman
Copy link
Contributor

Hi, (better) multi-thread support is something which we are currently looking at, although from a brief look its difficult to tell which part is actually failing here. We are hoping to dramatically improve the situation before the release of 4.0

@Taowyoo as Paul says, we have scheduled this as something we will work on through Q3 and Q4. This is one of the main features we want to have in 3.6 LTS, which we are planning to release around the end of this year (maybe early next year). @paul-elliott-arm and @yanesca are actively working on this.

I realise this has been open for some time, which is a shame, but hopefully the above timeline is helpful. Thanks for your debug analysis above - it's helpful to get this kind of info.

@paul-elliott-arm
Copy link
Member

We have now implemented thread safety around the key store which should fix the error detailed in this issue, however although we are doing threaded tests of keys in various different configurations, we are not as yet doing multithreaded tests of the full TLS stack, so I am slightly uncomfortable closing this as there may be other issues present which we have not yet addressed. I therefore leave it up to @Taowyoo to decide what to do with this - obviously if we do close this, new issues can still be raised.

@gilles-peskine-arm
Copy link
Contributor

I am closing this issue because we have fixed the known cause of the problem: since Mbed TLS 3.6.0, the PSA key store is thread-safe.

If you find a race condition in TLS code in Mbed TLS ≥3.6, please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component-tls13 size-l Estimated task size: large (2w+)
Projects
None yet
Development

No branches or pull requests

4 participants