Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host (and ESP-IDF) build; less memory usage; edge_nal::TcpConnect impl #59

Merged
merged 32 commits into from
Dec 24, 2024

Conversation

ivmarkov
Copy link
Contributor

@ivmarkov ivmarkov commented Nov 19, 2024

Summary of Changes

===================================================

esp-mbedtls-sys

Host build (changes to esp-mbedtls-sys)

In order to be able to build and run the library on the host machine, the C build has to happen on the machine, with a build.rs script:

  • I've therefore implemented a utility - builder::MbedtlsBuilder - which does the C library build and codegen and which is used by both the existing xtasks as well as from the build.rs of esp-mbedtls-sys
  • The mbedtls GIT submodule is moved to esp-mbedtls-sys. Otherwise, we cannot publish esp-mbedtls-sys to crates.io
  • Ditto for the libs folder (the .a stuff): it needs to be in esp-mbedtls-sys or else we can't publish it
  • NOTE: The C build is now incremental (necessary or else when working on the host machine you'll wait forever). NO mbedtls C code is copied around

ESP-IDF build (changes to esp-mbedtls-sys)

This is also added. It was very easy because for the -espidf targets esp-mbedtls-sys does nothing and delegates the binding generation, the library build, linking etc. etc. to esp-idf-sys.

Support for future baremetal targets with pre-generated bindings and .a libs

Currently not on the table, but adding a new SOC to the xtasks should be even easier now

esp-mbedtls

Removal of async memory buffers (a change to esp-mbedtls)

This is probably the biggest change.

Please look at the new Session async instance, and specifically - at PollCtx. Especially the latter is very extensively documented in terms of what is the idea and how it does it.

While at it, I also implemented a public connect/read/write API on the Session types themselves, so that users don't have to import the Read/Write traits for simple use cases.

Tls and TlsReference (a change to esp-mbedtls).

Tls is a singleton that basically represents the Mbedtls lib itself.

  • For baremetal, it takes the SHA periph on new and optionally - with with_hardware_RSA the RSA perih
  • For the host, ESP IDF and potential future software-only targets, Tls::new does not have parameters.

TlsReference is just like &Tls but allows us to erase the invariant peripheral 'd lifetime of the SHA and RSA periphs that would otherwise contaminate the Session types too.

Note that now, Tls is the only type which does have platform-specific aspects!

Everything else is cross-platform just like the mbedtls C lib itself, as it does not assume esp-hal or anything.

TlsConnector and simpler TlsAcceptor for edge-nal compat (a change to esp-mbedtls)

Now that we don't have or need TLS buffers, the signature of TlsAcceptor is much simpler and has no generics. Note also that I've removed the "bind" logic from TlsAcceptor. Not sure it belongs there.

I also implemented TlsConnector to have a symmetry with TlsAcceptor and because I actually do need it for e.g. the edge-http HTTPS client.

Finally, the above ^^^ types are no longer hard-coded to edge-nal-embassy and can be used with any edge-nal impl (I use them with edge-nal-std for example).

Session::close (a change to esp-mbedtls)

(Only for async sessions, maybe we need it for blocking too?)

The idea of this method is to send asynchronously a notif to the other peer that the TLS session is being closed. I just use the mbedtls facilities for that.
Not sure it works 100%, but if not, we need to fix it, as I think such a method is good to have. A bit like the raw TCP "close" we introduced recently in edge-nal and which is wrapped by Session::close. Session::close also implements the TcpClose trait now.

Next steps

(ASAP) Fix the alignment issues with xtensa

Just locate aligned_calloc and uncomment the commented out code at the top of the function.
The fact that these alignment issues are not hitting us is a pure coincidence and only a matter of time until we start seeing these (I do see these with the mbedtls-3.6-idf branch, but they are also there in the currently used mbedtls-3.4-idf branch).

How to fix:

  • Enable in MbedTLS the malloc/free "custom hooks" as described here. This way MbedTLS would call us for every malloc/free call it does

  • Implement these hooks ourselves, by using GlobalAlloc::alloc for malloc and GlobalAlloc::dealloc for free. This way we decouple ourselves from esp-wifi and just use the Rust global allocator (which is anyway defined on all platforms, including ESP-baremetal)

  • In the hooks, for riscv align the requested memory to 4 bytes and for xtensa - to 8 bytes. The latter would solve the xtensa alignment issues even if we are unsure what object MbedTLS allocates, as I doubt MbedTLS uses u128 anywhere - only up to u64 (the u64s, i.e. their C equivalents are causing us the headaches, as we currently return for structs containing them memory aligned to 4 bytes instead of 8, which is required on xtensa, but not for riscv)

  • (Optional) in Certificates, cluster most or all raw MbedTLS struct pointers into a custom mega-struct which is allocated with a single, safe Box::new() call. This way this code would become shorter and much safer.

(Later) Implement session splitting (full-duplex read and write)

(For web-sockets etc.)

I have an idea, but need to experiment with it a bit. If it works out, the change would be trivial - just one (or two) async mutexes, thanks to PollCtx::poll_mut being actually non-blocking and non-awaiting (even if it is marked as async).

(Later) Get rid of Session::connect

Forgot to mention that already with these changes, Session::connect (a) no longer takes self but &mut self and (b) calling it is completely optional, as read and write call it too.

However: it seems MbedTLS anyway calls mbedtls_ssl_connect from mbedtls_ssl_read and mbedtls_ssl_write so we might not even need all the Session::connect machinery in the first place...

@ivmarkov ivmarkov marked this pull request as ready for review November 19, 2024 20:03
@bjoernQ
Copy link
Collaborator

bjoernQ commented Nov 20, 2024

Awesome and thanks for working on this! Looks all good and sane to me but I leave the real review to @AnthonyGrondin (I haven't done much here besides the first few commits .... time to remove me from the authors in Cargo.toml)

@bjoernQ
Copy link
Collaborator

bjoernQ commented Nov 20, 2024

Maybe the README.md would need some love (not necessarily in this PR)

@ivmarkov
Copy link
Contributor Author

Maybe the README.md would need some love (not necessarily in this PR)

Agreed.
Perhaps indeed let's touch the README after this PR (which is very large already), and also after the "ASAP fix the alignment problems" because fixing the alignment problems also means we won't need esp-wifi anymore, or even esp-alloc. Just "some" global allocator, whatever it is. esp-alloc or other.

@ivmarkov
Copy link
Contributor Author

Removing esp-wifi from the equation means we also need a replacement for the extern "C" random() fn, but that's easy - the Tls singleton would simply expect a RngCore instance at construction (new) time.

Copy link
Contributor

@AnthonyGrondin AnthonyGrondin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your contribution!
I have made a preliminary review, and have the following comments below. Once resolved, I'll test compatibility with an existing project and with reqwless.

  • esp-mbedtls-sys/mbedtls is not initialized by default
    • It should be documented that users should do git submodule update --init --recursive
    • Is the submodule initialized properly when building the first time after pulling from crates.io (in the future)?
  • Binary sizes seems to be slightly larger than the current HEAD. Is this increase reflected in larger binary sizes due to updates?

For me examples seems to fail, when I test on bare-metal esp32s3:

  • Async client fails with WARN - MbedTLS error: -31488 / ffff8500 (MBEDTLS_ERR_SSL_RECEIVED_NEW_SESSION_TICKET)
  • edge-server fails with:
 WARN - MbedTLS error: -30848 / ffff8780 (MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY)
 WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))
  • Sync client fails with:
Call wifi_connect
Wait to get connected
Wait to get an ip address
Got ip Ok(IpInfo { ip: 192.168.69.166, subnet: Subnet { gateway: 192.168.69.1, mask: Mask(24) }, dns: Some(192.168.69.16), secondary_dns: None })
We are connected!
Making HTTP request
Start tls connect


====================== PANIC ======================
panicked at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-wifi-0.10.1/src/wifi_interface.rs:485:41:
already borrowed: BorrowMutError

Backtrace:

0x42015775
0x42015775 - <esp_wifi::wifi_interface::Socket<MODE> as embedded_io::Write>::write
   at ??:??
0x4201c9a3
0x4201c9a3 - mbedtls_ssl_flush_output
   at ??:??
0x4201f8a8
0x4201f8a8 - mbedtls_ssl_handshake_step
   at ??:??
0x4201f98e
0x4201f98e - mbedtls_ssl_handshake
   at ??:??
0x42015479
0x42015479 - esp_mbedtls::Session<T>::connect
   at /esp-mbedtls/esp-mbedtls/src/lib.rs:749
0x42005513
0x42005513 - sync_client::__xtensa_lx_rt_main
   at /esp-mbedtls/examples/sync_client.rs:128
0x4201bbf6
0x4201bbf6 - Reset
   at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/xtensa-lx-rt-0.17.1/src/lib.rs:82
0x4037891a
0x4037891a - ESP32Reset
   at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-hal-0.21.1/src/soc/esp32s3/mod.rs:155

esp-mbedtls/src/lib.rs Outdated Show resolved Hide resolved
@ivmarkov
Copy link
Contributor Author

ivmarkov commented Nov 27, 2024

Thank you for taking the time for reviewing.

Below, I've tried to address the stuff which I know up-front. For re-checking the examples not working, I'll need an extra day or so.

  • esp-mbedtls-sys/mbedtls is not initialized by default

This is necessary only if somebody wants to hack on the esp-mbedtls itself by GIT-cloning it.
It is not necessary if that somebody depends on esp-mbedtls from another crate - either via crates.io (see below) or via GIT.

I have an updated README that - among other things - mentions this as well, but I thought we can delay it until after we have this initial large PR getting through, as I feel the update of the README might raise even more feedback and back-and-forth.

If you feel strongly about this I can include the updated README as part of this PR though. Just let me know.

  • It should be documented that users should do git submodule update --init --recursive

As per above

  • Is the submodule initialized properly when building the first time after pulling from crates.io (in the future)?

Yes even though "yes" is not precise really as crates.io does not have a notion of a "GIT submodule".
How it works is that we are simply supposed to call git submodule update --recursive --init in our "cargo publish" GH workflow, just like we do already for the CI workflow, as part of this PR.

When depending on esp-mbedtls as a GIT dependency, cargo does the magic of initializing the submodule by itself.

  • Binary sizes seems to be slightly larger than the current HEAD. Is this increase reflected in larger binary sizes due to updates?

I need to check this. Will follow up.

For me examples seems to fail, when I test on bare-metal esp32s3:

  • Async client fails with WARN - MbedTLS error: -31488 / ffff8500 (MBEDTLS_ERR_SSL_RECEIVED_NEW_SESSION_TICKET)
  • edge-server fails with:
 WARN - MbedTLS error: -30848 / ffff8780 (MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY)
 WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))

Also I need to check here again and will follow up. The status quo with this PR was that:

  • All the examples without the _mTLS suffix do work for me (client and server).
  • The exampls with the _mTLS suffix do not work for me with similar errors, but as I mentioned to you in the chat, they don't work for me also when using your pristine esp-mbedtls repo either, failing with mbedtls errors. That's why I asked you if you've recently checked the _mTLS examples actually?
  • Sync client fails with:
Call wifi_connect
Wait to get connected
Wait to get an ip address
Got ip Ok(IpInfo { ip: 192.168.69.166, subnet: Subnet { gateway: 192.168.69.1, mask: Mask(24) }, dns: Some(192.168.69.16), secondary_dns: None })
We are connected!
Making HTTP request
Start tls connect


====================== PANIC ======================
panicked at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-wifi-0.10.1/src/wifi_interface.rs:485:41:
already borrowed: BorrowMutError

Backtrace:

0x42015775
0x42015775 - <esp_wifi::wifi_interface::Socket<MODE> as embedded_io::Write>::write
   at ??:??
0x4201c9a3
0x4201c9a3 - mbedtls_ssl_flush_output
   at ??:??
0x4201f8a8
0x4201f8a8 - mbedtls_ssl_handshake_step
   at ??:??
0x4201f98e
0x4201f98e - mbedtls_ssl_handshake
   at ??:??
0x42015479
0x42015479 - esp_mbedtls::Session<T>::connect
   at /esp-mbedtls/esp-mbedtls/src/lib.rs:749
0x42005513
0x42005513 - sync_client::__xtensa_lx_rt_main
   at /esp-mbedtls/examples/sync_client.rs:128
0x4201bbf6
0x4201bbf6 - Reset
   at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/xtensa-lx-rt-0.17.1/src/lib.rs:82
0x4037891a
0x4037891a - ESP32Reset
   at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-hal-0.21.1/src/soc/esp32s3/mod.rs:155

Ditto. Will check again but same as with async - _mTLS sync not working, non-_mTLS sync working.
Two more things, perhaps interesting:

  • I actually did not change the sync code in any significant way, so that's a bit weird?
  • A borrow-mut error in esp-wifi itself sounds more like a bug in that crate. I don't think this should just happen when we are just innocently calling read or write on the socket.

@ivmarkov
Copy link
Contributor Author

@AnthonyGrondin Good news! I was re-examining the PR today and I was able to resolve the issues in the examples you pointed out.
Therefore, I updated the PR.

But let's go one by one thru the issues not addressed in my previous comment:

  • Binary sizes seems to be slightly larger than the current HEAD. Is this increase reflected in larger binary sizes due to updates?

Can you provide the exact commands you are using to track the binary size?

What I'm observing with esp32c3 (I used that target as it used to already have the DEBUG_C func enabled) is that the binary size decreased, rather than increased.

Here are my tests:

# ============ Binary size after my changes ============
ivan@m800:~/dev/esp-mbedtls$ cargo build --features esp32c3,examples-async --example async_client --target riscv32imc-unknown-none-elf --release
ivan@m800:~/dev/esp-mbedtls$ /home/ivan/.rustup/toolchains/esp/xtensa-esp32-elf-clang/esp-18.1.2_20240912/esp-clang/bin/llvm-size -A ./target/riscv32imc-unknown-
none-elf/release/examples/async_client
./target/riscv32imc-unknown-none-elf/release/examples/async_client  :
section                     size         addr
.text                     729338   1107296288
.text_dummy               786464   1006632960
.rodata                   162284   1007419424
.rodata.wifi               26816   1007581708
.trap                       1118   1077411840
.rwtext                     1144   1077412960
.rwtext.wifi               28088   1077414104
.rwdata_dummy              30350   1070071808
.data                       4548   1070102160
.bss                      146968   1070106712
.noinit                        0   1070253680
.data.wifi                   492   1070253680
.rtc_fast.text                 0   1342177280
.rtc_fast.data                 0   1342177280
.rtc_fast.bss                  0   1342177280
.rtc_fast.persistent           0   1342177280
.stack                         0   1070254172
.debug_loc                885487            0
.debug_abbrev              95739            0
.debug_info              2229850            0
.debug_aranges             23216            0
.debug_ranges             209872            0
.debug_str               1124847            0
.comment                     278            0
.riscv.attributes             68            0
.debug_frame               82404            0
.debug_line               875843            0
Total                    7445214

# ============ Binary size before my changes ============
ivan@m800:~/dev/e/esp-mbedtls$ cargo build --features esp32c3,async --example async_client --target riscv32imc-unknown-none-elf --release
ivan@m800:~/dev/e/esp-mbedtls$ /home/ivan/.rustup/toolchains/esp/xtensa-esp32-elf-clang/esp-18.1.2_20240912/esp-clang/bin/llvm-size -A ./target/riscv32imc-unknown-none-elf/release/examples/async_client
./target/riscv32imc-unknown-none-elf/release/examples/async_client  :
section                     size         addr
.text                     771244   1107296288
.text_dummy               786464   1006632960
.rodata                   183644   1007419424
.rodata.wifi               26816   1007603068
.trap                       1118   1077411840
.rwtext                     1144   1077412960
.rwtext.wifi               28088   1077414104
.rwdata_dummy              30350   1070071808
.data                       4564   1070102160
.bss                      157968   1070106728
.noinit                        0   1070264696
.data.wifi                   492   1070264696
.rtc_fast.text                 0   1342177280
.rtc_fast.data                 0   1342177280
.rtc_fast.bss                  0   1342177280
.rtc_fast.persistent           0   1342177280
.stack                         0   1070265188
.debug_loc                806887            0
.debug_abbrev              95726            0
.debug_info              2032801            0
.debug_aranges             22640            0
.debug_ranges             184776            0
.debug_str               1053709            0
.comment                     278            0
.riscv.attributes             68            0
.debug_frame               80308            0
.debug_line               841717            0
Total                    7110802

As you can see, the following ELF sections actually decreased in size:

  • .text
  • New: 729338
  • Old: 771244
  • .rodata
  • New: 162284
  • Old: 183644
  • .bss
  • New: 146968
  • Old: 157968

Sure, the "debug" sections did increase somehow, but I don't think anybody cares about those as they don't end up in the flash?

I.e.

NEW:

ivan@m800:~/dev/esp-mbedtls$ /home/ivan/.rustup/toolchains/esp/xtensa-esp32-elf-clang/esp-18.1.2_20240912/esp-clang/bin/llvm-strip ./target/riscv32imc-unknown-none-elf/release/examples/async_client
ivan@m800:~/dev/esp-mbedtls$ ll ./target/riscv32imc-unknown-none-elf/release/examples/async_client
-rwxr-xr-x 1 ivan ivan ***1760388*** Nov 28 10:21 ./target/riscv32imc-unknown-none-elf/release/examples/async_client*

OLD:

ivan@m800:~/dev/e/esp-mbedtls$ /home/ivan/.rustup/toolchains/esp/xtensa-esp32-elf-clang/esp-18.1.2_20240912/esp-clang/bin/llvm-strip ./target/riscv32imc-unknown-
none-elf/release/examples/async_client
ivan@m800:~/dev/e/esp-mbedtls$ ll ./target/riscv32imc-unknown-none-elf/release/examples/async_client
-rwxr-xr-x 1 ivan ivan ***1820556*** Nov 28 10:19 ./target/riscv32imc-unknown-none-elf/release/examples/async_client*

For me examples seems to fail, when I test on bare-metal esp32s3:

  • Async client fails with WARN - MbedTLS error: -31488 / ffff8500 (MBEDTLS_ERR_SSL_RECEIVED_NEW_SESSION_TICKET)

This is now fixed.
I've been testing with mbedtls 3.6, but did not carefully re-test with the one used in the repo which is 3.4 (right?).
mbedtls < 3.6 does send this annoying MBEDTLS_ERR_SSL_RECEIVED_NEW_SESSION_TICKET which needs to be discarded.

  • Sync client fails with:
Call wifi_connect
Wait to get connected
Wait to get an ip address
Got ip Ok(IpInfo { ip: 192.168.69.166, subnet: Subnet { gateway: 192.168.69.1, mask: Mask(24) }, dns: Some(192.168.69.16), secondary_dns: None })
We are connected!
Making HTTP request
Start tls connect


====================== PANIC ======================
panicked at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-wifi-0.10.1/src/wifi_interface.rs:485:41:
already borrowed: BorrowMutError
...

Also fixed, sorry about that!
There was a remaining addr_of! which needed to be removed, as Session::connect no longer takes self but &mut self.

  • edge-server fails with:
 WARN - MbedTLS error: -30848 / ffff8780 (MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY)
 WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))

Can't reproduce though won't be surprised if our MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY handling code is nort perfect, as it is brand-new BTW (you did not have any such logic before).

What I see:

NEW:

Starting wifi
Wifi started!
About to connect...
Wifi connected!
Waiting to get IP address...
Got IP: 192.168.10.192/24
Point your browser to https://192.168.10.192/
INFO - Creating 2 handler tasks, memory: 4120B
WARN - MbedTLS error: -30592 / ffff8880
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30592))))
WARN - IO error: EOF
WARN - Handler task 0: Error when closing the socket: Error(Eof)
WARN - MbedTLS error: -30592 / ffff8880
WARN - Handler task 1: Error when handling request: Connection(Io(Error(MbedTlsError(-30592))))
WARN - IO error: EOF
WARN - Handler task 1: Error when closing the socket: Error(Eof)
Got new connection
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B
Got new connection
Got new connection
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B
WARN - MbedTLS error: -30592 / ffff8880
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30592))))
WARN - IO error: EOF
WARN - Handler task 0: Error when closing the socket: Error(Eof)
Got new connection
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B
WARN - MbedTLS error: -30592 / ffff8880
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30592))))
WARN - IO error: EOF
WARN - Handler task 0: Error when closing the socket: Error(Eof)

OLD:

start connection task
Device capabilities: Ok(EnumSet(Client))
Starting wifi
Wifi started!
About to connect...
Wifi connected!
Waiting to get IP address...
Got IP: 192.168.10.192/24
Point your browser to https://192.168.10.192/
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
Got new connection
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
Got new connection
Got new connection
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
Got new connection
WARN - Handler task 0: Error when handling request: Connection(Io(Timeout))
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Handler task 0: Error when handling request: Connection(Io(Timeout))
WARN - Handler task 0: Error when closing the socket: Timeout
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))
Fatal message: Please enable the exception for a self-signed certificate in your browser
INFO - Creating 2 handler tasks, memory: 28520B
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-30592))))

Basically, same stuff:

  • -30592 is because of the self-signed server certificate
  • -32512 is a bit more frightening (server going out of memory when trying to handle two simultaneous connections); need to see if we can fix that in future

Also note that with the pristine esp-mbedtls I can't even run the edge_server example in Chrome - it never succeeds. Only Firefox works for me.

With my PR (which also reduces memory and updates to latest edge-http) Chrome also works.

@ivmarkov
Copy link
Contributor Author

Also note that the *_mTLS examples continue to fail for me both before and after the PR.
Could it be because of this? If yes, perhaps we need to fix, but possibly after this PR?

@AnthonyGrondin
Copy link
Contributor

Binary sizes seems to be slightly larger than the current HEAD. Is this increase reflected in larger binary sizes due to updates?

Can you provide the exact commands you are using to track the binary size?

I simply looked at the file diff on Github, where it showed that the binary size of the compiled libs was increased. I know that *.a size doesn't necessarily result in a bigger binary size for the application, hence why I was wondering if it resulted in an increase or not.

So the binaries are actually smaller on the device now, which is great. We can consider this point resolved. Thank you for taking the time to measure this.


esp-mbedtls-sys/mbedtls is not initialized by default

This is necessary only if somebody wants to hack on the esp-mbedtls itself by GIT-cloning it.
It is not necessary if that somebody depends on esp-mbedtls from another crate - either via crates.io (see below) or via GIT.

I have an updated README that - among other things - mentions this as well, but I thought we can delay it until after we have this initial large PR getting through, as I feel the update of the README might raise even more feedback and back-and-forth.

If you feel strongly about this I can include the updated README as part of this PR though. Just let me know.

Thank you for explaining how cargo resolves git submodules. I agree we should delay the update of the README in a subsequent PR, dedicated to it. We can consider this point resolved too.


As for the examples, I'll test them once again later on my side to ensure they are properly working. I would consider the _mTLS already broken previous to this PR, so I'll try to find the cause and see if a simple fix can be added in this PR, or if it should come in a subsequent dedicated PR.

As a sidenote, I've been starting to test edge-server with hyperfine to benchmark connection time, and also resource allocation / cleaning. Sometimes, too many connections in loop would show memory leaks.

The difference between testing with curl and with a browser, is that by default curl closes the connection (with Connection: Close) while a browser tends to generally send Connection: Keep-Alive so testing with both is always a good practice.

hyperfine -i 'curl -k https://<IP>'

It should work with my pristine esp-mbedtls, if not, I'll try to find out why, because that means it's an already existing issue outside of this PR. And it should also work with this PR, once all issues resolved.

@ivmarkov
Copy link
Contributor Author

As a sidenote, I've been starting to test edge-server with hyperfine to benchmark connection time, and also resource allocation / cleaning. Sometimes, too many connections in loop would show memory leaks.

I would be interested to actually see this, because - you know - edge-http itself is no-alloc so not sure how it could leak memory.

Do note that -32512 likely means that mbedtls calls calloc and then calloc returns NULL, as we run out of heap. But even this should not leak. If it does - we need to figure out why and fix it.

The difference between testing with curl and with a browser, is that by default curl closes the connection (with Connection: Close) while a browser tends to generally send Connection: Keep-Alive so testing with both is always a good practice.

Connection:Close likely means you never end up with two TLS connections opened at the same time, while with the browser, it surely tries to open two such connections at least.

hyperfine -i 'curl -k https://<IP>'

It should work with my pristine esp-mbedtls, if not, I'll try to find out why, because that means it's an already existing issue outside of this PR. And it should also work with this PR, once all issues resolved.

If you mean curl should work with esp-mbedtls before my changes - sure. What is interesting is to check if Chrome works...

@AnthonyGrondin
Copy link
Contributor

I would be interested to actually see this, because - you know - edge-http itself is no-alloc so not sure how it could leak memory.

I meant memory leaks specific to esp-mbedtls, independant of the application protocol. I think for now, most are resolved but we carried a few leaks for a long time. (See: dd06cb4)

One thing that could be interesting is to see how much memory a session takes and compare it to HEAD. I see we're now hitting alloc failed errors when running edge-server with 2 handlers. In the current HEAD, it works for bare-metal targets (except esp32) with 2 handlers, albeit, we're right on the edge of the memory limit. Seeing -32512 makes me wonder what changed in term of memory allocation / management.

It might be time to pull out the brand new esp_alloc::HEAP.stats()

@ivmarkov
Copy link
Contributor Author

In the current HEAD, it works for bare-metal targets (except esp32) with 2 handlers, albeit, we're right on the edge of the memory limit. Seeing -32512 makes me wonder what changed in term of memory allocation / management.

I don't think it works in the current HEAD either. :)
If you look at my earlier comment, I'm hitting -32512 with the unmodified esp-mbedtls as well.

@ivmarkov
Copy link
Contributor Author

ivmarkov commented Nov 28, 2024

BTW, I'm prepping another PR, which will reduce the memory (quite?) a bit, specifically with regards to certificates (CA and the private one).

Currently, you are not really sharing the certificates' storages across multiple sessions. My next PR enables that (amongst other things, like getting rid of the esp-wifi dependency via calloc/free).

@AnthonyGrondin
Copy link
Contributor

Sorry about the delays.

  • We should bump the MSRV to at least 1.77 since we use rust c-string literals, which were introduced in this version. That being said, we might have dependencies that require higher MSRV, so we should probably bump it up a bit.
    • About that, I have an issue with compiling for nightly-2024-07-22 for the esp32c3:
    $ SSID=Dummy PASSWORD=Dummy cargo +nightly-2024-07-22 besp32c3 --example sync_client --features="examples"
    Compiling examples v0.1.0 (~/esp-mbedtls)
    error: custom attribute panicked
      --> examples/sync_client.rs:26:1
       |
    26 | #[entry]
       | ^^^^^^^^
       |
       = help: message: Unrecognized literal: c"www.google.com"
    

Earlier I reported an overflow error (-32512) for edge_server.rs when using:

hyperfine -i 'curl -k https://<IP>'

It turns out that my cURL doesn't send Connection: Close by default now. It works if I do:

hyperfine -i 'curl -k https://<IP> -H "Connection: Close"'

If I omit it, I get the following loop:

Got IP: 192.168.32.44/24
Point your browser to https://192.168.32.44/
INFO - Creating 2 handler tasks, memory: 4120B
Got new connection
WARN - MbedTLS error: -30848 / ffff8780
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B
Got new connection
WARN - MbedTLS error: -30848 / ffff8780
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B

Considering that -30848 is for MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY this doesn't seem to be an error in itself, but rather we should close the connection and free the resources. We previously used to handle this by catching this error code, and setting the session as EOF, and then flush Ok(0)

if res == MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY {
self.eof = true;
return Ok(0);
}

@ivmarkov
Copy link
Contributor Author

Sorry about the delays.

  • We should bump the MSRV to at least 1.77 since we use rust c-string literals, which were introduced in this version. That being said, we might have dependencies that require higher MSRV, so we should probably bump it up a bit.

    • About that, I have an issue with compiling for nightly-2024-07-22 for the esp32c3:
    $ SSID=Dummy PASSWORD=Dummy cargo +nightly-2024-07-22 besp32c3 --example sync_client --features="examples"
    Compiling examples v0.1.0 (~/esp-mbedtls)
    error: custom attribute panicked
      --> examples/sync_client.rs:26:1
       |
    26 | #[entry]
       | ^^^^^^^^
       |
       = help: message: Unrecognized literal: c"www.google.com"
    

OK, so if I understand you correctly, you are OK with just raising the version to - what? - latest espup one? (1.82)?

  • If yes, I'll just update the PR to raise the version.
  • If not, well, we can always use (in the examples, as this is what breaks) b"XXX\0" which is equivalent to c"XXX"?

Earlier I reported an overflow error (-32512) for edge_server.rs when using:

hyperfine -i 'curl -k https://<IP>'

It turns out that my cURL doesn't send Connection: Close by default now. It works if I do:

hyperfine -i 'curl -k https://<IP> -H "Connection: Close"'

If I omit it, I get the following loop:

Got IP: 192.168.32.44/24
Point your browser to https://192.168.32.44/
INFO - Creating 2 handler tasks, memory: 4120B
Got new connection
WARN - MbedTLS error: -30848 / ffff8780
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B
Got new connection
WARN - MbedTLS error: -30848 / ffff8780
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B

Considering that -30848 is for MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY this doesn't seem to be an error in itself, but rather we should close the connection and free the resources. We previously used to handle this by catching this error code, and setting the session as EOF, and then flush Ok(0)

if res == MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY {
self.eof = true;
return Ok(0);
}

Thanks for the hint.
Indeed, I've missed that in the new PollCtx-based IO. Let me quickly re-add it. It should be exactly as you say:

  • setting the session to EOF
  • start returning 0-sized reads

@AnthonyGrondin
Copy link
Contributor

OK, so if I understand you correctly, you are OK with just raising the version to - what? - latest espup one? (1.82)?

Yes that should be good. I've looked through dependencies, and we need at minimum 1.80 for smoltcp

@ivmarkov
Copy link
Contributor Author

Sorry about the delays.

  • We should bump the MSRV to at least 1.77 since we use rust c-string literals, which were introduced in this version. That being said, we might have dependencies that require higher MSRV, so we should probably bump it up a bit.

    • About that, I have an issue with compiling for nightly-2024-07-22 for the esp32c3:
    $ SSID=Dummy PASSWORD=Dummy cargo +nightly-2024-07-22 besp32c3 --example sync_client --features="examples"
    Compiling examples v0.1.0 (~/esp-mbedtls)
    error: custom attribute panicked
      --> examples/sync_client.rs:26:1
       |
    26 | #[entry]
       | ^^^^^^^^
       |
       = help: message: Unrecognized literal: c"www.google.com"
    

OK, so if I understand you correctly, you are OK with just raising the version to - what? - latest espup one? (1.82)?

  • If yes, I'll just update the PR to raise the version.
  • If not, well, we can always use (in the examples, as this is what breaks) b"XXX\0" which is equivalent to c"XXX"?

Earlier I reported an overflow error (-32512) for edge_server.rs when using:

hyperfine -i 'curl -k https://<IP>'

It turns out that my cURL doesn't send Connection: Close by default now. It works if I do:

hyperfine -i 'curl -k https://<IP> -H "Connection: Close"'

If I omit it, I get the following loop:

Got IP: 192.168.32.44/24
Point your browser to https://192.168.32.44/
INFO - Creating 2 handler tasks, memory: 4120B
Got new connection
WARN - MbedTLS error: -30848 / ffff8780
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B
Got new connection
WARN - MbedTLS error: -30848 / ffff8780
WARN - Handler task 0: Error when handling request: Connection(Io(Error(MbedTlsError(-30848))))
WARN - Server processing loop quit abruptly: Err(Io(Error(MbedTlsError(-32512))))
ERROR - Io(Error(MbedTlsError(-32512)))
INFO - Creating 2 handler tasks, memory: 4120B

Considering that -30848 is for MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY this doesn't seem to be an error in itself, but rather we should close the connection and free the resources. We previously used to handle this by catching this error code, and setting the session as EOF, and then flush Ok(0)

if res == MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY {
self.eof = true;
return Ok(0);
}

Thanks for the hint. Indeed, I've missed that in the new PollCtx-based IO. Let me quickly re-add it. It should be exactly as you say:

  • setting the session to EOF
  • start returning 0-sized reads

FYI MSRV is now raised to 1.82 everywhere. I also implemented a fix for the MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY not being handled properly.

Copy link
Contributor

@AnthonyGrondin AnthonyGrondin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great work! I've adapted reqwless locally and an app I'm working on, and this is a great improvement. I've noticed a big increase in bandwidth and decrease in latency. With the current head, a connection takes around 200 ms in Firefox after a session has been established, with edge_server.rs, with this PR, I've seen the time decrease to around 100 ms in total load time, after a session has been established.

use critical_section::Mutex;

use esp_hal::peripheral::Peripheral;
use esp_hal::peripherals::{RSA, SHA};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep these publicly exported? We used to expose them so that libraries using esp-mbedtls wouldn't need to depend on esp_hal directly.

This shouldn't be an issue after #61 where we move away from strictly using esp_hal types and dependency.

Copy link
Contributor Author

@ivmarkov ivmarkov Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. To "not depend" on the esp-hal crate in practice don't you actually need to re-export more from esp-hal, not just Peripheral, RSA and SHA? For example, I'm not sure if esp-hal uses the RSA and SHA types directly, or has some traits on these that need to be re-exported too?

... and then in practice if you plan to use esp-hal with esp-mbedtls you anyway will use a wider surface of esp-hal in your real-life app as you probably need quite a lot from it for the other aspects of your embedded project. And moreover, the esp-hal version used by esp-mbedtls probably needs to be the exact same version as the one that the user would directly depend on in their Cargo.toml (due to peripheral singletons etc.)

In such circumstances I usually do one of two things:

  • Option A: Re-export ALL of the dependent crate under some sort of module. As in pub mod esp_mbedtls::esp_hal { pub use esp_hal::*; }. This is what we do with esp-idf-svc -> esp-idf-hal -> esp-idf-sys where each "more top-level" crate re-exports the lower-level crate under a module, e.g. esp_idf_svc::hal. Yes, I did this in esp-mbedtls in that it re-exports the embedded-io crates, but these have a much smaller surface than esp-hal and for those it might happen (a) that the user depends on multiple versions of those which is OK (b) the user does not have explicit dependency on embedded-io elsewhere in their project (though unlikely, but then again, multiple versions of embedded-io is OK, unlike esp-hal). (BTW, whatever we do, we probsably need to re-export esp-mbedtls-sys as ::sys)
  • Option B: Do not re-export anything

I think for esp-mbedtls given that it is having a more niche use-case than esp-hal it should rather not re-export anything, i.e. Option B?

Copy link
Contributor Author

@ivmarkov ivmarkov Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if you plan to move away from esp-hal (and this PR is in fact doing this already, as it supports other environments) shouldn't you rename it actually?

edge-mbedtls might be a good name, and if you wish, we can merge it inside edge-net and you'll get commiter privs there :p

Or you just pick up another name and keep it separate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if you plan to move away from esp-hal (and this PR is in fact doing this already, as it supports other environments) shouldn't you rename it actually?

edge-mbedtls might be a good name, and if you wish, we can merge it inside edge-net and you'll get commiter privs there :p

Or you just pick up another name and keep it separate.

Haha, I'm just a contributor here. It's up to the maintainers at @esp-rs to decide what is the future of this crate here, and where it should go. Currently, I see it as a bandaid patch, until we get a pure-rust solution working, like rustls.

@ivmarkov
Copy link
Contributor Author

Overall great work! I've adapted reqwless locally and an app I'm working on, and this is a great improvement. I've noticed a big increase in bandwidth and decrease in latency. With the current head, a connection takes around 200 ms in Firefox after a session has been established, with edge_server.rs, with this PR, I've seen the time decrease to around 100 ms in total load time, after a session has been established.

Well, fixing latency issues was not planned at all - only memory issues. So this is some sort of weird side effect that latency did improve too.

@AnthonyGrondin
Copy link
Contributor

We should decrease the verbosity for EOF, this is what happens when a client with Connection: Close makes multiple requests

WARN - IO error: EOF
WARN - Handler task 1: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 1: Error when closing the socket: Error(Eof)
WARN - IO error: EOF
WARN - Handler task 0: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 0: Error when closing the socket: Error(Eof)
WARN - IO error: EOF
WARN - Handler task 1: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 1: Error when closing the socket: Error(Eof)
WARN - IO error: EOF
WARN - Handler task 0: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 0: Error when closing the socket: Error(Eof)
WARN - IO error: EOF
WARN - Handler task 1: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 1: Error when closing the socket: Error(Eof)

@ivmarkov
Copy link
Contributor Author

We should decrease the verbosity for EOF, this is what happens when a client with Connection: Close makes multiple requests

WARN - IO error: EOF
WARN - Handler task 1: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 1: Error when closing the socket: Error(Eof)
WARN - IO error: EOF
WARN - Handler task 0: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 0: Error when closing the socket: Error(Eof)
WARN - IO error: EOF
WARN - Handler task 1: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 1: Error when closing the socket: Error(Eof)
WARN - IO error: EOF
WARN - Handler task 0: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 0: Error when closing the socket: Error(Eof)
WARN - IO error: EOF
WARN - Handler task 1: Error when handling request: Connection(Io(Error(Eof)))
WARN - Handler task 1: Error when closing the socket: Error(Eof)

Hmmm, on a second thought, I'm not sure where it needs to be decreased? Perhaps edge-http is a better place?

@ivmarkov
Copy link
Contributor Author

@AnthonyGrondin Just a status check - what - from your POV - are the main show-stoppers for this getting merged?
I am in the meantime relying on this code for a pre-production firmware, so this sitting in a PR for too long is not giving me a warm fuzzy feeling that I can release the firmware to production this way, Also, there is further memory reduction coming down the line, and I want to enable (conditionally) the ESP IDF certificate bundle, but these are a bit stuck on this being merged.

If you are too busy or don't like the direction where this PR is going, let me know - I have no prob with that and will probably fork the crate under a new name and own it then.

@AnthonyGrondin
Copy link
Contributor

I have no objection with this PR, and I'm looking forward to see it merged. I've personally tested the compatibility of the new API with reqwless, and the new TlsReference pattern simplifies things for inter-dependency use. I've also tested with a personal project to ensure everything still work as it should.

All matter addressed in this PR has been resolved for me. Subsequent PRs will come to fix mTLS examples and etc.

Since I'm only a community contributor, I don't have write access to this repo.

@bjoernQ I've completed my review

@ivmarkov
Copy link
Contributor Author

Since I'm only a community contributor, I don't have write access to this repo.

@AnthonyGrondin sorry, I did not realize this!

@bjoernQ What would be your (and Espressif's) take on this? :-) I see multiple options:

  • Espressif is willing to put effort to maintain this => you guys remain the contributors but probably @AnthonyGrondin also becomes one in the short/near future
  • Espressif is not willing to put effort to maintain this project => we can perhaps still keep the project under the esp-rs umbrella, but then Anthony does need contributor permissions, and possibly me in the longer run too. This is the esp-idf-* "community maintained" path
  • Espressif is not willing to put effort to maintain this project and does not want it in the esp-rs umbrella. Say, because it is / will out-grow its "esp" specificity => then we need to rename the project and maybe move somewhere else?

In any case it might help if we get extra contributors here in the short run. I don't see this as a big issue given that it is outside the esp-hal mono-repo anyway. What do you think?

@bjoernQ
Copy link
Collaborator

bjoernQ commented Dec 24, 2024

Just speaking for myself here but we should end up with something between 1 and 2 - closer to 2. A few weeks ago, all my bets were on Rustls but there are good reasons to have two (well, three including embedded-tls) solutions - each with their own advantages.

I already suggested giving necessary permissions and we agreed we want to do that - I think technically I could do that right now but let's wait for the others to return from their well-deserved vacations.

In any case it might help if we get extra contributors here in the short run.

I totally agree here!

@bjoernQ bjoernQ merged commit 1431835 into esp-rs:main Dec 24, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants