-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DTLS connection/sessions lost after 10 minutes #542
Comments
Did you consider to use LWM2M Queue Mode ? This is adapted for Nat Environment. Anyway, I don't see any recent changes which could explain this new behavior ... 🤔 Could you tell us the commit linked to this snapshot release or at least the californium version used (as this is probably a change in californium more than in Leshan) ?
I think discarding should be the right behavior else breaking open DTLS session is too easy with ip address spoofing method. I tried this on my side sending empty packet or packet with only 1 byte \0, and it works for me ... I'm not able to reproduce this. Do you have any logs or capture ?
The rfc5246 explains that helloRequest is about renegociation which is not supported by scandium and renegociation will not help in this case. So I don't think HelloRequest can be a solution. About the fact LWM2M server could send CLIENT_HELLO and so act as a DTLS client, this is the expected behavior.
I know there is a DTLS heartbeat extension. This is not implemented in Scandium but we could consider to add it if this makes senses. I also know there is a ping in CoAP but it would be more expensive... (and not sure this is really appropriate..) |
Queue mode is not really an option since the device needs to be online all the time but due to the 3G modem it is in a NAT environment. Tests have shown (depending on the operator), a keep-alive packet once in a minute is required to keep connection. I will check in the next days which version causes this behavior, this could show when / what was changed either in Leshan or Californium. Attached is a WireShark log file. After successful registration (packets 7 and 8), data is requested from the device (packets 9 and 10). Then keep-alive messages are sent by the device to the server every minute. The log was recorded with "leshan-server-demo-1.0.0-M8-jar-with-dependencies.jar". I also agree that the server should silently discard invalid DTLS packets to reduce the risk of DoS attacks. Yes DTLS Heartbeat could be a solution, but if this is sent every minute, the amount of data strongly increases! |
I tested again on my side and I'm still not able to reproduce this. (Sending 0 bytes packets or 1 bytes (\0) packets) Reading/Debugging the DTLSConnector class I can't see anything which could explain this behavior... At this point I would bet that the issue is not the keepAlive but something else. 🤔 Just to be sure, the capture is recorded at server, isn't it ? And so, you test with only one device ? Could you active logs at server side ? maybe we will see something ... <?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d %p %C{0} - %m%n</pattern>
</encoder>
</appender>
<root level="WARN">
<appender-ref ref="STDOUT" />
</root>
<logger name="org.eclipse.leshan" level="INFO"/>
<logger name="org.eclipse.leshan.server.security.SecurityCheck" level="DEBUG"/>
<logger name="org.eclipse.leshan.core.model.LwM2mModel" level="TRACE"/>
<!-- All above is the default config, the line below is to search something in DTLS stack -->
<logger name="org.eclipse.californium.scandium" level="TRACE"/>
</configuration> |
I ran version 1.0.0-M8 with the XML file you attached on a raspberry pi. The server receives 10 times a single \0 byte and discards it correctly (see debug message below). server logs
|
in version 1.0.0-M2 the behavior is normal |
I retried to reproduce it on my side and still not able ... leshan 1.0.0-M5 uses californium 2.0.0-M6 I took a look at the code to see if I found something suspicious .. I saw nothing... If the server try to send a Client_Hello that means that the connection/session was lost or can not be retrieved. I do not understand what could cause that in your case. Maybe @boaks would have an idea ? |
Maybe, the "DtlsConnectorConfig.Builder.setAutoResumptionTimeoutMillis" is used? |
I don't think we use it in Leshan. (I suppose the default value is no auto resumption?) If autoResumption was involved we should see SESSIONID in Client_Hello, right ? |
OK, the CLIENT_HELLO is no resumption, so the "AutoResumptionTimeout" is not the cause. |
The default would be
|
It seems that the cause of this ClientHello messages is the 10 minutes timeout and not the 10 malformatted keep-alive packets! |
There must be a change in the behavior between M5 and M6! It does not make sens to set the inactivity period lower than the lifetime (which is specified by the client during registration process). |
If the leshan demo server still uses a "Californium.properties", you may easily adjust the value of MAX_PEER_INACTIVITY_PERIOD to a proper value. This should help as short time workaround. |
Registration lifetime is dynamic and different for each device. Registration lifetime and dtls connection lifetime is not exactly linked. I must confess that I totally miss-understood the way this was used in californium. I thought that DTLS connection was only removed if the store was full and the peer_inactivity_period was reached... |
8 months ago, I "fixed" the behaviour of "If the cache contains the key but the value is stale the entry is removed from the cache."
|
consider to use the
To set the inactive period to a more proper default value. |
@bernhard-seifert, thx a lot to reporting that ! We go back to the previous behavior in californium(see eclipse-californium/californium#709) Waiting, for leshan-server-demo, you can use an higher value for If you are you using Leshan as library, you can change the coapConfig using and higher value for coapConfig = LeshanServerBuilder.createDefaultNetworkConfig();
coapConfig.setLong(Keys.MAX_PEER_INACTIVITY_PERIOD,yourHigherValue);
LeshanServerBuilder builder = new LeshanServerBuilder();
builder.setCoapConfig(coapConfig); I let this opened waiting we integrate the new Californium version. |
thanks for the note. |
#567 should fix this issue. @bernhard-seifert, could you retest with it ? |
So I can integrate #567 and close this bug, right ? |
yes, for be the bug is fixed with your update! thanks for the fast fix! |
I integrated the fix in master (#567) Thx a lot for reporting this and double check it works now ! |
I've been using snapshot version 1.0.0 for a while and also compared the behavior to 1.0.0-M8. My embedded system does the following:
After that, the device sends one byte (\0) every minute. This is important because otherwise the GPRS NAT would delete port information and the leshan server would not be able anymore to send data to the device.
As these packets are invalid in terms of properly defined DTLS encrypted packets, these are silently discarded. It has to be noted, that intentionally just one byte is sent, because data is expensive (in particular during roaming) and sums up if being transferred every minute.
Snapshot version 1.0.0 also behaved like that, and once, when data has to be read from the device (pressing the "read" button in the leshan html interface), the corresponding request was sent to the device.
In version 1.0.0-M8 this behavior is different: if the server receives less than 10 keep-alive packets, the behavior is the same but once more than 10 keep-alive packets have been received by the server, the next time a "read" command is triggered, the server does not send the request to the device, but sends a "ClientHello" message.
It seems that the server believes that the communication is broken (due to 10 "malformated" packets) and requests a new handshake. In this case it should send a "HelloRequest" message according to rfc6347, page 23.
Either case, this would require a new handshake which in turn consumes also a large amount of data.
Can this behavior be changed so that malformated packets are just silently discarded, like before?
If not, how would you propose to make proper keep-alive packets. As I already pointed out, theses are essential for GPRS NAT-based systems.
The text was updated successfully, but these errors were encountered: