Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LWM2M: DTLS session resumption in practice #25935

Closed
JusbeR opened this issue Jun 3, 2020 · 12 comments · Fixed by #65223
Closed

LWM2M: DTLS session resumption in practice #25935

JusbeR opened this issue Jun 3, 2020 · 12 comments · Fixed by #65223
Assignees
Labels
area: LWM2M area: Networking Enhancement Changes/Updates/Additions to existing features

Comments

@JusbeR
Copy link

JusbeR commented Jun 3, 2020

Is your enhancement proposal related to a problem? Please describe.

It is impossible to do low power app with NB-IoT radio and zephyr LWM2M protocol stack

OMA spec says that

the time the LwM2M Client has been sleeping the IP address assigned to it may have been released and / or existing
NAT bindings may have been released. If this is the case, then the client needs to re-run the DTLS handshake with
the LwM2M Server since an IP address and/or port number change will destroy the existing security context. For
performance and efficiency reasons it is RECOMMENDED to utilize the DTLS session resumption.

In practice, using NB-IoT radio technology it looks like this NAT bindings are released in 60s(Tested with 2 Finnish operators)

Now, I could be wrong, but here is what I think that happens when using Nordic NRF91 + offloaded sockets and UDP+DTLS+COAP+LWM2M stack. pcap attached

ip_port_changes_90s.zip

  • Device registers to LWM2M server, stuff happens and data sending is seized for e.g. 90s.(packets 22-36)
  • NAT bindings are released in operator NW
  • Device wakes up to do registration update
  • LWM2M uses the existing connection/socket to send the data (packet 37)
  • Server does not recognize the connection as it comes from different source port -> trash
  • LWM2M retries few times and finally gives up and closes the connection(packets 38-41)
  • LWM2M registers to server again as update failed, this time DTLS resumption works(packet 42 onwards)

Describe the solution you'd like

There should be a way to configure the may have been released time to LWM2M stack or somekind of API to to inform the stack about this. LWM2M should then close the socket and reopen to cause DTSL resumption to kick in, while still keeping the lwm2m context alive.

Describe alternatives you've considered

  • Ping constantly to keep the NAT bindings valid -> Kills the device batt + uses NW in vain
  • Re-register every time you have packet to be sent -> Kills the device batt + uses NW in vain + DOS attacks the server
  • I first thought that this is DTLS problem, but now I believe that it might be impossible to resume something that is still open -> This is upper layer problem. Still not too sure about this, I am not very familiar with all the protocols involved here

Additional context
Issue when I was still accusing DTLS

@JusbeR JusbeR added the Enhancement Changes/Updates/Additions to existing features label Jun 3, 2020
@JusbeR
Copy link
Author

JusbeR commented Jun 4, 2020

To try out my theory, I made a hack to close and reopen sockets like this: loopshore/sdk-zephyr@a35f03c
At least now I can do registration update e.g. in every 10 minutes and it still works.

In server side pcaps I can see ALERT from device and then client hello + resumption + payload. I guess that is as lean as it can get with these protocols.

@rlubos
Copy link
Contributor

rlubos commented Jun 4, 2020

Hi @JusbeR,
I've followed the discussion on devzone and analyzed the pcap you provided. What surprises me though is that the underlying socket port in the client is changed silently (between packet 36 and 37). This cause the server to ignore any further communication from the client side. This should not happen silently, not only because it disturbs the DTLS session, but it also renders any pending observations on the server side invalid. I've observed this behavior with Leshan, after I've closed/reopened socket without re-registering it. It ignored any notifications sent by the client from the "new" port (probably because a CoAP endpoint is identified by IP/port pair, so a new port value made it a different endpoint). That's why I've decided to re-register in case of socket errors instead of simply re-opening the socket.

IMO a socket that can no longer use the original port number should return an error and not silently modify it. That would trigger a proper re-registration procedure, instead of sending a Register Update and wating for CoAP to time-out. What do we need a full registration I've explained before. If we don't bother about observations, we could probably introduce some simplified procedure to re-open a socket w/o re-registration, just a as you did.

@JusbeR
Copy link
Author

JusbeR commented Jun 4, 2020

I remember reading from somewhere(maybe califonium repo) that COAP observation/notifications/whaever-they-are-called are specified to be only valid inside "epoch" and "epoch" is started from scratch always when DTLS connection is resumed. This will effectively make observations not working in real life just as you said. I believe this is fundamental specification bug and I have no idea how to fix it without changing the specs. Personally, I decided to just not even try to make anything on top of notifications.

a socket that can no longer use the original port number should return an error and not silently modify it

The problem here is that socket has absolutely no idea if it uses the same port or not. It is up to the operator NAT rules and application can only quess or indirectly detect/test how long it has the same port.

Edit: It was this https://github.com/eclipse/leshan/wiki/LWM2M-Observe#for-dtls
and apparently there is a server side workaround for observation problem in leshan

@rlubos
Copy link
Contributor

rlubos commented Jun 4, 2020

Thanks for the link. While they seem to relax this for DTLS in Leshan, the default for UDP is still "strict" matching (so change in a port number invalidates the observation). Good to know you can relax this requirement if you put your own server instance up.

The problem here is that socket has absolutely no idea if it uses the same port or not. It is up to the operator NAT rules and application can only quess or indirectly detect/test how long it has the same port.

Indeed, after educating myself a bit more about the NAT rules in cellular networks, they don't seem to be very user-friendly. To be honest, for now I don't see a clean way how such NAT timeouts could be handled. As a workaround, your proposal looks fine. But probably for a proper fix we would need to look into DTLS Connection ID extension (already proposed in #23424). This should help to maintain DTLS session even after IP/port values change. I understand though that since you use bsdlib, it should be first added there.

@JPSELC
Copy link
Contributor

JPSELC commented Jun 4, 2020

Yes, this is the issue. The modem has no indication that the IP address and port have changed.
A new session is created by the Network operator when the old session has timed out, (typically after somewhere between 20 seconds and 120 seconds depending on network), with a new IP address and Port. You need to be able to set a timeout parameter in Zephyr to suit your network, to optimize the use of the "session time".

@JPSELC
Copy link
Contributor

JPSELC commented Jun 4, 2020

To try out my theory, I made a hack to close and reopen sockets like this: loopshore/sdk-zephyr@a35f03c
At least now I can do registration update e.g. in every 10 minutes and it still works.

@JusbeR This looks good, and is similar to a hack I used for a demo a few months ago, before any session resumption was available. I need to try this.
Obviously, the 55 seconds value needs to be a parameter, maybe disabling when 0 or -1 or something.

@endian-benjamin
Copy link
Contributor

Anything but decoupling DTLS session from ip/port will be an insufficient solution IMO.

Check out the RFC for "DTLS connection id" - last I checked it was still in draft stage but both mbedTLS and Leshan had implemented it in later versions I believe. With session IDs, the session epoch won't be reset, the ip/port will be irrelevant and everything will be beautiful.

https://datatracker.ietf.org/doc/draft-ietf-tls-dtls-connection-id/

@rlubos
Copy link
Contributor

rlubos commented Jul 5, 2021

I've submitted a draft PR with DTLS Connection ID socket option, see #36738. I think however that we should postpone the integration of this feature until some problems on the mbed TLS side are resolved (see the PR, where I've described in a more detail the issues I've faced).

There are no LwM2M changes yet that'd utilize this new option, it should be pretty straightforward to add though (just a single setsockopt() call.

@SeppoTakalo
Copy link
Collaborator

DTLS session resumption on QUEUE mode have been fixed already: #45065

DTLS connection ID is (I belive) not yet supported.

@kartben kartben changed the title LWM2M: DTSL session resumption in practice LWM2M: DTLS session resumption in practice Feb 28, 2023
@carlescufi
Copy link
Member

@SeppoTakalo Do you have plans to implement Connection ID in upstream Zephyr for LWM2M?

@SeppoTakalo
Copy link
Collaborator

@carlescufi I did not have a plan, but then I just enabled it in Interoperability test, and it just works.

CID support is enabled in Zephyr when MbedTLS stack is used.
It just need to be enabled in one socket option, like this:

if (IS_ENABLED(CONFIG_MBEDTLS_SSL_DTLS_CONNECTION_ID) && ctx->use_dtls) {
int ret;
/* Enable CID */
int cid = TLS_DTLS_CID_ENABLED;
ret = zsock_setsockopt(ctx->sock_fd, SOL_TLS, TLS_DTLS_CID, &cid,
sizeof(cid));
if (ret) {
ret = -errno;
LOG_ERR("Failed to enable TLS_DTLS_CID: %d", ret);
}
}
return lwm2m_set_default_sockopt(ctx);

Maybe I can create a PR to set that option on in the LwM2M client,

@carlescufi
Copy link
Member

Maybe I can create a PR to set that option on in the LwM2M client,

Thank you @SeppoTakalo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: LWM2M area: Networking Enhancement Changes/Updates/Additions to existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants