Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L2CAP/SMP: Race condition possible in native posix central when bonding. #22086

Closed
ChristofferSchroeder opened this issue Jan 22, 2020 · 13 comments · Fixed by #22334
Closed

L2CAP/SMP: Race condition possible in native posix central when bonding. #22086

ChristofferSchroeder opened this issue Jan 22, 2020 · 13 comments · Fixed by #22334
Assignees
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug

Comments

@ChristofferSchroeder
Copy link

Describe the bug
In a native posix central app, when bonding, it is possible (99% reproducible on my system) that SMP code 0x08 is handled before it is added to allowed_cmds by bt_smp_encrypt_change.
This happens quite often, as stated, but can happen in the correct order (after bt_smp_encrypt_change is called).
It is related to commit sha 0a925bd (Bluetooth: L2CAP: Process fixed channels in the RX thread), as the problem appears using this exact commit.

To Reproduce
Steps to reproduce the behavior:

  1. Build a native posix central and modify to call bt_conn_set_security(conn, BT_SECURITY_L2).
  2. Bond to a Zephyr client.
  3. Hopefully you will see security changed callbacks with BT_SMP_ERR_UNSPECIFIED errors.
  4. If you activate CONFIG_BT_DEBUG_SMP you will be able to see "Unexpected SMP code" warnings, which results in the BT_SMP_ERR_UNSPECIFIED errors in the security changed callback.

Expected behavior
SMP codes should be dealt with in the correct order and at the correct time to avoid "Unexpected SMP code" warnings/ BT_SMP_ERR_UNSPECIFIED errors.

Impact
Showstopper.

Screenshots or console output
issue.txt

Environment (please complete the following information):

  • OS: Linux Debian 9
  • Toolchain (e.g Zephyr SDK, ...)
  • Commit SHA 0a925bd or newer (master branch).

Additional context
N/A

@ChristofferSchroeder ChristofferSchroeder added the bug The issue is a bug, or the PR is fixing a bug label Jan 22, 2020
@aescolar
Copy link
Member

CC @jhedberg @joerchan

@joerchan joerchan self-assigned this Jan 24, 2020
@ChristofferSchroeder
Copy link
Author

The BTMON log below shows the issue quite well (here with SC enabled). The order of the SMP Indentity Information ACL packet and the HCI Security Changed Event is in the "wrong order".
Please note this was sampled using a PTS 4.2 dongle. We tried two different versions of the PTS dongles, the 4.0 and 4.2 versions, and the 4.0 version is even slower to send the security changed event.
The SMP codes 0x08 or 0x09 are not allowed by the stack until Security Changed has been handled.
Now this is actually not according to the Core Specification as we read it. You cannot expect the order of the SMP packets and the HCI events to be deterministic. So whenever SMP receives an "unexpected SMP code", it should just accept it and assume the link is encrypted, or it should put it back on the RX queue to be dealt with later, when the security change has been received (with some retry count perhaps).

BTMON log:

ACL Data RX: Handle 98 flags 0x02 dlen 21 [hci0] 4777.145983
SMP: Pairing DHKey Check (0x0d) len 16
E: cba1afc281093748fe085857f6accf2e
< HCI Command: LE Start Encryption (0x08|0x0019) plen 28 [hci0] 4777.148927
Handle: 98
Random number: 0x0000000000000000
Encrypted diversifier: 0x0000
Long term key: af0850f45d9de80bfae67421ecca486f
HCI Event: Command Status (0x0f) plen 4 [hci0] 4777.153787
LE Start Encryption (0x08|0x0019) ncmd 1
Status: Success (0x00)
< HCI Command: Host Number of Completed Packets (0x03|0x0035) plen 5 [hci0] 4777.178758
Num handles: 1
Handle: 98
Count: 1
ACL Data RX: Handle 98 flags 0x02 dlen 21 [hci0] 4777.326478
SMP: Identity Information (0x08) len 16
Identity resolving key: c7da8bb3ad9f2541eb156fbe84b53422
HCI Event: Encryption Change (0x08) plen 4 [hci0] 4777.326782
Status: Success (0x00)
Handle: 98
Encryption: Enabled with AES-CCM (0x01)
ACL Data RX: Handle 98 flags 0x02 dlen 12 [hci0] 4777.327057
SMP: Identity Address Information (0x09) len 7
Address type: Public (0x00)
Address: A7:47:0A:09:AA:8C
< HCI Command: Host Number of Completed Packets (0x03|0x0035) plen 5 [hci0] 4777.328846
Num handles: 1
Handle: 98
Count: 1
< ACL Data TX: Handle 98 flags 0x00 dlen 6 [hci0] 4777.328922
SMP: Pairing Failed (0x05) len 1
Reason: Unspecified reason (0x08)
< HCI Command: Host Number of Completed Packets (0x03|0x0035) plen 5 [hci0] 4777.329098
Num handles: 1
Handle: 98
Count: 1
< ACL Data TX: Handle 98 flags 0x00 dlen 6 [hci0] 4777.329170
SMP: Pairing Failed (0x05) len 1
Reason: Unspecified reason (0x08)

@joerchan
Copy link
Contributor

@jhedberg Is this related to the USB issue where Events and Data are sent on different endpoints?

@ChristofferSchroeder
Copy link
Author

@jhedberg Is this related to the USB issue where Events and Data are sent on different endpoints?

Hi @joerchan

I was not aware of this issue with USB. Where can I find more on this?

@jhedberg
Copy link
Member

@joerchan I think it is. @ChristofferSchroeder this is a fairly well known (at least among Zephyr & BlueZ developers) design "bug" in the USB HCI transport specification. Since ACL data and HCI events come through different USB endpoints there is no sure way for the host to know what the real ordering over the air was. We've faced this issue multiple times in the past, and in several different protocols, i.e. this is not unique to SMP. One case where this happens is with BR/EDR where any non-SDP L2CAP Connect Request requires prior encryption and will otherwise result in a "security block" response. The most reliable fix that isn't a hack is to switch to using some other HCI transport (e.g. UART), or possibly using a custom single USB endpoint for all HCI data.

@ChristofferSchroeder
Copy link
Author

Thanks for the explanation. I am not too comfortable with the work-around to this problem. I would assume that since we are testing using a PTS certified dongle, we should be able to use this or the host would fail compliance tests.

@joerchan
Copy link
Contributor

@jhedberg How does BlueZ do this? Should we allow the SMP pairing packets before we have received the encryption change event? Maybe include it as a Kconfig option as a USB transport workaround?

@jhedberg
Copy link
Member

@joerchan BlueZ doesn't have any workaround of fix for it. Making such concessions would expose a vulnerability, e.g. let another device send pairing keys over an unencrypted link.

@jhedberg
Copy link
Member

jhedberg commented Jan 28, 2020

@joerchan also, we don't have host-side support for the USB HCI transport. We always use the UART HCI driver when using native_posix (actually the hci_userchan driver) and qemu. On the host OS (Linux) side USB might indeed be used and that's where the ordering issue happens.

@jhedberg
Copy link
Member

Unfortunately I don't see what we can do about this. Reading the original description is sounds like the issue was always there but didn't show up since L2CAP was deferred to the system work queue , i.e. probably the encrypt change event got handled before it even though the HCI driver gave these to us in the wrong order. So I'll go ahead and close this for now, in the absence of any alternative explanation for what's going on.

@joerchan
Copy link
Contributor

@jhedberg Yeah I was hoping we didn't have to do that. Because that sounded sketchy.

The host-side support for USB HCI i didn't understand. In this setup, running zephyr host on native_posix and attached PTS 4.2 dongle then this would be a USB device. Which HCI driver is being used now?

@jhedberg
Copy link
Member

@joerchan I meant that we don't have a USB HCI driver in Zephyr. We have USB support for the controller-side, i.e. samples/bluetooth/hci_usb but not the host. When you run on a Linux host you use the Linux kernel's HCI driver for whatever controller you have (e.g. the btusb driver in the case of a Bluetooth USB dongle like the PTS one).

In order to run Zephyr on top of this there are two options: native_posix and qemu. qemu uses the UART HCI driver (drivers/bluetooth/hci/h4.c) since we use btproxy to connect the host's controller to a qemu serial port. With native_posix we use the drivers/bluetooth/hci/userchan.c in Zephyr which opens a HCI User Channel socket to the Linux kernel, which behaves essentially the same way as the UART HCI transport (serialised over a single endpoint and same encoding with a single-byte packet type identifier in the beginning).

I'm fine if you guys want to try to brainstorm possible solutions for this issue, but we've had to deal with it ever since BR/EDR SC was introduced in Bluetooth 4.1 (2013) and haven't come up with any other good solution than using some other HCI transport. Btw, I think there's some work ongoing to specify a new transport over USB using a single endpoint, but I don't know much about it (and even if I did this stuff won't be public until adoption in some future core spec release)

@joerchan
Copy link
Contributor

@jhedberg thank you for the explanation. I don't think there is any other option than solving this in the spec itself. Hopefully they will do that.

fnde-ot added a commit to fnde-ot/zephyr that referenced this issue Jan 31, 2020
This commit adds a new option CONFIG_BT_SMP_USB_HCI_CTLR_WORKAROUND
to support USB HCI controllers that sometimes send out-of-order HCI
events and ACL Data due to using different USB endpoints.

Enabling this option will make the master role not require the
encryption-change event to be received before accepting
key-distribution data.

It opens up for a potential vulnerability as the master cannot detect
if the keys are distributed over an encrypted link.

Fixes: zephyrproject-rtos#22086

Signed-off-by: François Delawarde <fnde@oticon.com>
jhedberg pushed a commit that referenced this issue Feb 3, 2020
This commit adds a new option CONFIG_BT_SMP_USB_HCI_CTLR_WORKAROUND
to support USB HCI controllers that sometimes send out-of-order HCI
events and ACL Data due to using different USB endpoints.

Enabling this option will make the master role not require the
encryption-change event to be received before accepting
key-distribution data.

It opens up for a potential vulnerability as the master cannot detect
if the keys are distributed over an encrypted link.

Fixes: #22086

Signed-off-by: François Delawarde <fnde@oticon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants