Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: hci_usb fails to connect to two devices with slow advertising interval #23280

Closed
ChristianHirsch opened this issue Mar 5, 2020 · 29 comments
Assignees
Labels
area: Bluetooth area: USB Universal Serial Bus bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug Waiting for response Waiting for author's response

Comments

@ChristianHirsch
Copy link
Contributor

Describe the bug
Connecting to two devices one after another and reading data with the samples/bluetooth/hci_usb sample fails on a PC. First the PC connects to device 1 (AA:BB:CC:DD:EE:FF) and reads the data. Then, it tries to connect to device 2 (FF:EE:DD:CC:BB:AA) and read the data, but establishing a connection (or channel?) fails, although, the PC (or bluetooth service) seems to be connected to the device (the panel icon shows an established connection to the device and the device can't be found if the PC scans for devices). The PC does not terminate the connection until manually restarting the blueetooth service.

To Reproduce
Steps to reproduce the behavior:

  1. Use samples/bluetooth/hci_usb on one board with CONFIG_BT_MAX_CONN increased to 8 (e.g., on a nRF52840 board).
  2. Use samples/bluetooth/peripheral_hr on two other boards (e.g., particle_xenon) and change BT_GAP_ADV_FAST_INT_MIN_2 and BT_GAP_ADV_FAST_INT_MAX_2 to BT_GAP_ADV_SLOW_INT_MIN and BT_GAP_ADV_SLOW_INT_MAX, by, e.g.
#define BT_LE_ADV_CONN_NAME BT_LE_ADV_PARAM(BT_LE_ADV_OPT_CONNECTABLE | \
                                            BT_LE_ADV_OPT_USE_NAME, \
                                            BT_GAP_ADV_SLOW_INT_MIN, \
                                            BT_GAP_ADV_SLOW_INT_MAX)
  1. Connect with gatttool to the peripheral_hr devices one after another and read some data:
gatttool -t random -b AA:BB:CC:DD:EE:FF --char-read --handle=0x0012 ; gatttool -t random -b FF:EE:DD:CC:BB:AA --char-read --handle=0x0012
  1. See error:
Characteristic value/descriptor: 62
connect error: Software caused connection abort (103)

Expected behavior
Connecting to both devices, one after another, should not fail, and both connections terminate after reading some data. Expected output:

Characteristic value/descriptor: 62
Characteristic value/descriptor: 62

Impact
Showstopper

Screenshots or console output
See above.

Environment (please complete the following information):

  • Ubuntu 18.04
  • Zephyr SDK 0.10.2
  • Zephyr 2.1.0

Additional context
I don't know if this is a problem of Zephyr or the PC's bluetooth driver or bluez, however, it only happens with the hci_usb sample (running on a nRF52840 board). It does not happen with the PC's internal bluetooth adapter.
Sometimes connecting to both devices, one after another, with the hci_usb sample works. But most of the time it does not.

@ChristianHirsch ChristianHirsch added the bug The issue is a bug, or the PR is fixing a bug label Mar 5, 2020
@jfischer-no jfischer-no added area: Bluetooth area: USB Universal Serial Bus labels Mar 5, 2020
@ChristianHirsch
Copy link
Contributor Author

I get the same behavior on a Raspberry Pi 3 running Raspbian GNU/Linux 9 (stretch): everything works with the internal bluetooth adapter, but with the hci_usb sample the error occurs.

@carlescufi
Copy link
Member

@ChristianHirsch can you please reproduce the issue while running btmon and then attach the log?

@ChristianHirsch
Copy link
Contributor Author

@carlescufi see logs attached.
btmon_internal.log
btmon_hci_usb.log

@ChristianHirsch
Copy link
Contributor Author

It seems that when the problem occurs, it repeatedly requests the data from the remote device..

@ChristianHirsch
Copy link
Contributor Author

At btmon_hci_usb.log line 428 I restarted the bluetooth service.

@cvinayak
Copy link
Contributor

cvinayak commented Mar 5, 2020

With the internal controller, the connections do not overlap, seems the controller you are using does not support simultaneous connections and scanning to connect until the previous connection is being disconnected by the gatttool.

The LE Create Connection is only issued after the previous connection is disconnected. Also, this way it reuses the same connection handle 3585.

< HCI Command: Disconnect (0x01|0x0006) plen 3                                     #29 [hci0] 13.298610
        Handle: 3585
        Reason: Remote User Terminated Connection (0x13)
> HCI Event: Command Status (0x0f) plen 4                                          #30 [hci0] 13.299451
      Disconnect (0x01|0x0006) ncmd 1
        Status: Success (0x00)
> HCI Event: Disconnect Complete (0x05) plen 4                                     #31 [hci0] 13.333455
        Status: Success (0x00)
        Handle: 3585
        Reason: Connection Terminated By Local Host (0x16)
@ MGMT Event: Device Disconnected (0x000c) plen 8                             {0x0002} [hci0] 13.333477
        LE Address: AA:BB:CC:DD:EE:FF (Static)
        Reason: Connection terminated by local host (0x02)
@ MGMT Event: Device Disconnected (0x000c) plen 8                             {0x0001} [hci0] 13.333477
        LE Address: AA:BB:CC:DD:EE:FF (Static)
        Reason: Connection terminated by local host (0x02)
> HCI Event: LE Meta Event (0x3e) plen 23                                          #32 [hci0] 13.424468
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Random (0x01)
        Address: FF:EE:DD:CC:BB:AA (Static)
        Data length: 11
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        16-bit Service UUIDs (complete): 3 entries
          Heart Rate (0x180d)
          Battery Service (0x180f)
          Current Time Service (0x1805)
        RSSI: -54 dBm (0xca)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2                             #33 [hci0] 13.424493
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                                        #34 [hci0] 13.454465
      LE Set Scan Enable (0x08|0x000c) ncmd 2
        Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25                          #35 [hci0] 13.454535
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: FF:EE:DD:CC:BB:AA (Static)
        Own address type: Public (0x00)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                                          #36 [hci0] 13.455458
      LE Create Connection (0x08|0x000d) ncmd 2
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19                                          #37 [hci0] 15.439467
      LE Connection Complete (0x01)
        Status: Success (0x00)
        Handle: 3585
        Role: Master (0x00)
        Peer address type: Random (0x01)
        Peer address: FF:EE:DD:CC:BB:AA (Static)
        Connection interval: 48.75 msec (0x0027)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x00
@ MGMT Event: Device Connected (0x000b) plen 24                               {0x0002} [hci0] 15.439524
        LE Address: FF:EE:DD:CC:BB:AA (Static)
        Flags: 0x00000000
        Data length: 11
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        16-bit Service UUIDs (complete): 3 entries
          Heart Rate (0x180d)
          Battery Service (0x180f)
          Current Time Service (0x1805)
@ MGMT Event: Device Connected (0x000b) plen 24                               {0x0001} [hci0] 15.439524
        LE Address: FF:EE:DD:CC:BB:AA (Static)
        Flags: 0x00000000
        Data length: 11
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        16-bit Service UUIDs (complete): 3 entries
          Heart Rate (0x180d)
          Battery Service (0x180f)
          Current Time Service (0x1805)

In the Zephyr controller, I am "assuming" the host is made aware that simultaneous connection and scanning to initiate is supported, and hence second connection is initiated while the previous is still connected.

< HCI Command: LE Create Connection (0x08|0x000d) plen 25   #35 [hci1] 5.725045
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: FF:EE:DD:CC:BB:AA (Static)
        Own address type: Random (0x01)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                   #36 [hci1] 5.727045
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
< HCI Command: Disconnect (0x01|0x0006) plen 3              #37 [hci1] 6.442589
        Handle: 0
        Reason: Remote User Terminated Connection (0x13)
> HCI Event: Command Status (0x0f) plen 4                   #38 [hci1] 6.444045
      Disconnect (0x01|0x0006) ncmd 1
        Status: Success (0x00)
> HCI Event: Disconnect Complete (0x05) plen 4              #39 [hci1] 6.486055
        Status: Success (0x00)
        Handle: 0
        Reason: Connection Terminated By Local Host (0x16)
> HCI Event: LE Meta Event (0x3e) plen 19                   #40 [hci1] 6.732042
      LE Connection Complete (0x01)
        Status: Success (0x00)
        Handle: 1
        Role: Master (0x00)
        Peer address type: Random (0x01)
        Peer address: FF:EE:DD:CC:BB:AA (Static)
        Connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x07
@ MGMT Event: Device Connected (0x000b) plen 13        {0x0002} [hci1] 6.732104
        LE Address: FF:EE:DD:CC:BB:AA (Static)
        Flags: 0x00000000
        Data length: 0
@ MGMT Event: Device Connected (0x000b) plen 13        {0x0001} [hci1] 6.732104
        LE Address: FF:EE:DD:CC:BB:AA (Static)
        Flags: 0x00000000
        Data length: 0

I do not see anything wrong in the HCI logs, before you restarted at line 428, the second connection is up and alive receiving the notifications.

> ACL Data RX: Handle 1 flags 0x02 dlen 8                  #84 [hci1] 16.832679
      ATT: Handle Value Notification (0x1b) len 3
        Handle: 0x0012
          Data: 2e
> ACL Data RX: Handle 1 flags 0x02 dlen 8                  #85 [hci1] 17.832644
      ATT: Handle Value Notification (0x1b) len 3
        Handle: 0x0012
          Data: 2d
= bluetoothd: Terminating

@cvinayak
Copy link
Contributor

cvinayak commented Mar 5, 2020

@jhedberg do you know if Bluez/gatttool would behave differently? Does the host use the local supported features?

@jfischer-no
Copy link
Collaborator

jfischer-no commented Mar 5, 2020

seems the controller you are using does not support simultaneous connections and scanning to connect until the previous connection is being disconnected by the gatttool.

@cvinayak The person is using Zephyr's hci_usb sample on a nRF52840 board :-)

@ChristianHirsch
Copy link
Contributor Author

I do not see anything wrong in the HCI logs, before you restarted at line 428, the second connection is up and alive receiving the notifications.

That's also strange, @cvinayak, I did not enable notifications, so why am I receiving them?

@cvinayak
Copy link
Contributor

cvinayak commented Mar 5, 2020

seems the controller you are using does not support simultaneous connections and scanning to connect until the previous connection is being disconnected by the gatttool.

@cvinayak The person is using Zephyr's hci_usb sample on a nRF52840 board :-)

@jfischer-phytec-iot Reporter is using PC's controller

It does not happen with the PC's internal bluetooth adapter.

@cvinayak
Copy link
Contributor

cvinayak commented Mar 5, 2020

I do not see anything wrong in the HCI logs, before you restarted at line 428, the second connection is up and alive receiving the notifications.

That's also strange, @cvinayak, I did not enable notifications, so why am I receiving them?

@ ChristianHirsch something in the host does the following... not sure if this is the CCCD handle (0x0013) being written to enable notifications.

< ACL Data TX: Handle 1 flags 0x00 dlen 9                   #63 [hci1] 7.533291
      ATT: Write Request (0x12) len 4
        Handle: 0x0013
          Data: 0100
> HCI Event: Number of Completed Packets (0x13) plen 5      #64 [hci1] 7.583002
        Num handles: 1
        Handle: 1
        Count: 1
> ACL Data RX: Handle 1 flags 0x02 dlen 5                   #65 [hci1] 7.632953
      ATT: Write Response (0x13) len 0

@cvinayak
Copy link
Contributor

cvinayak commented Mar 5, 2020

@ChristianHirsch also, the handle you wanted to read has been read in the second connection.

< ACL Data TX: Handle 1 flags 0x00 dlen 7                   #48 [hci1] 6.983287
      ATT: Read Request (0x0a) len 2
        Handle: 0x0012
> HCI Event: Number of Completed Packets (0x13) plen 5      #49 [hci1] 7.034038
        Num handles: 1
        Handle: 1
        Count: 1
> ACL Data RX: Handle 1 flags 0x02 dlen 6                   #50 [hci1] 7.083071
      ATT: Read Response (0x0b) len 1
        Value: 38

@jfischer-no
Copy link
Collaborator

a)

With the internal controller, the connections do not overlap, seems the controller you are using does not support simultaneous connections and scanning to connect until the previous connection is being disconnected by the gatttool.

b)

seems the controller you are using does not support simultaneous connections and scanning to connect until the previous connection is being disconnected by the gatttool.

@cvinayak The person is using Zephyr's hci_usb sample on a nRF52840 board :-)

@jfischer-phytec-iot Reporter is using PC's controller

It does not happen with the PC's internal bluetooth adapter.

To be clear, this seems the controller you are using from (a) is hci_usb sample on a nRF52840 board, @cvinayak do you agree with that?

@cvinayak
Copy link
Contributor

cvinayak commented Mar 5, 2020

Both (a) and (b) are the same comment text you have quote, and is for the internal PC's controller.

In the Zephyr controller, I am "assuming" the host is made aware that simultaneous connection and scanning to initiate is supported, and hence second connection is initiated while the previous is still connected.

This is for the Zephyr controller.

@ChristianHirsch
Copy link
Contributor Author

@cvinayak, there are some things that bother me:

  1. You are right. the CCCD handle is 0x0013. But I am only executing the commands above to read the characteristic 0x0012. I am not setting/writing to the CCCD handle and do not activate the notifications. I just tried the following to stay connected to the device with the internal adapter: gatttool -t random -b AA:BB:CC:DD:EE:FF --char-read --handle=0x0012 --listen. In btmon (see below), there is no value written to the CCCD handle and I do not receive notifications.

With the internal controller, the connections do not overlap, seems the controller you are using does not support simultaneous connections and scanning to connect until the previous connection is being disconnected by the gatttool.

The LE Create Connection is only issued after the previous connection is disconnected. Also, this way it reuses the same connection handle 3585.

This is not true. I can connect to two devices simultaneously with the internal adapter. I did the following and it works perfectly with the internal adapter:

$ gatttool -t random -b AA:BB:CC:DD:EE:FF --char-read --handle=0x0012 --listen & (sleep 5; echo "Executing gatttool"; gatttool -t random -b FF:EE:DD:CC:BB:AA --char-read --handle=0x0012)
Characteristic value/descriptor: 41
Executing gatttool
Characteristic value/descriptor: 4e

Here is the log btmon_internal_2sim.log. You also see that no CCCD is written to device AA:BB:CC:DD:EE:FF, although I'm staying connected.

  1. I can't reproduce the problem with the hci_usb sample when using fast advertising intervals on the devices running the peripheral_hr sample:
#define BT_LE_ADV_CONN_NAME BT_LE_ADV_PARAM(BT_LE_ADV_OPT_CONNECTABLE | \
                                            BT_LE_ADV_OPT_USE_NAME, \
                                            BT_GAP_ADV_FAST_INT_MIN_2, \
                                            BT_GAP_ADV_FAST_INT_MAX_2)

@cvinayak
Copy link
Contributor

cvinayak commented Mar 9, 2020

@ChristianHirsch any reason why you are using a 5 seconds sleep with your internal controller when connecting to devices?

I am not able to reproduce the issue reported with vanilla peripheral sample or slow advertising peripheral sample. My commandline and btmon logs atttached.

# gatttool -t random -b 47:17:03:F5:9B:C1 --char-read --handle=0x0012; ./attrib/gatttool -t random -b 7B:86:8C:1A:D2:32 --char-read --handle=0x0012
Characteristic value/descriptor: 4c 
Characteristic value/descriptor: 5c 
# gatttool -t random -b 7F:4A:0B:69:CC:3E --char-read --handle=0x0012; ./attrib/gatttool -t random -b 5E:B6:1E:6D:E3:AF --char-read --handle=0x0012
Characteristic value/descriptor: 62 
Characteristic value/descriptor: 13

hci_usb.log
hci_usb_slow_adv.log

@ChristianHirsch
Copy link
Contributor Author

@cvinayak thanks for trying.

@ChristianHirsch any reason why you are using a 5 seconds sleep with your internal controller when connecting to devices?

This is just to guarantee, that gatttool connects to device 1 first and keep the connection alive, before connecting to device 2. If I don't use the sleep, it might happen, that device 2 is connected first. This was to show that the internal adapter can handle two connections simultaneously.

Did you execute the commands on the hci_usb sample more often? If I boot my PC I can successfully connect to both devices for the first couple of times, but then the problems starts..

Could you please try to execute this script? The sleep in the script makes sure that the PC disconnects from the devices.

#!/bin/bash
while true ; do
  gatttool -t random -b AA:BB:CC:DD:EE:FF --char-read --handle=0x0012
  if [ $? -eq 1 ]; then
    break
  fi
  gatttool -t random -b FF:EE:DD:CC:BB:AA --char-read --handle=0x0012
  if [ $? -eq 1 ]; then
    break
  fi
  sleep 5
done

This script runs successfully the first couple of times. But after some attempts, it fails. Then, I need to restart the bluetooth service and from that point on it almost fails every time in the first attempt to connect to the second device..

@cvinayak
Copy link
Contributor

cvinayak commented Mar 9, 2020

@ChristianHirsch Yes, I have tried connections many times (and with upto 3 peripherals).

What I find different from your logs is, your host is doing more than just reading the characteristics (like ATT MTU request, which does not get num complete and no more ATT transaction there after). That is not what is done when using internal controller. If this happens it looks like gatttool is out-of-sync with the connection, and with notifications enabled. I am suspecting gatttool and something inherent to bluetoothd interfere.

gatttool is deprecated in the latest BlueZ releases (and had to use --enable-deprecated to build it), @Vudentz which tool should we be using instead?

@cvinayak
Copy link
Contributor

@ChristianHirsch Any updates? Please use latest BlueZ and compatible tools, also latest Zephyr upstream master and revert back (I have been using hci_usb on my virtual machine for the last one week, scanning and making simultaneous connections using bluetoothctl, and not encountered any issues)

Alternatively, you could build hci_usb with CONFIG_BT_LL_SW_LEGACY=y in the latest upstream master branch, which is the controller in Zephyr v1.14 but with fixes.

@cvinayak cvinayak added the Waiting for response Waiting for author's response label Mar 12, 2020
@ChristianHirsch
Copy link
Contributor Author

@cvinayak Unfortunately not. I'm already running the hci_usb sample of Zephyr 2.2.0, but with no result. I'll try the sample with the config you mentioned though.

Do you have a script for bluetoothctl that you use for connecting and reading characteristics? Unfortunately, I'm a little bit lost with bluetoothctl.. It does not show me all the available services and characteristics of the peripheral_hr sample (with the internal adapter), especially the battery service and battery level characteristic.

@ChristianHirsch
Copy link
Contributor Author

@cvinayak I wrote a python script, that reproduces the error by using the bluez d-bus api. I have a similar behavior: it runs without any problems with the internal adapter, but I get errors with the hci_usb sample.
Could you please check if this works with your setup? The error I get is the following when disconnecting:

org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

The script is available here. You can execute it as follows:

./ble_dbus.py -i hci1 00002a19-0000-1000-8000-00805f9b34fb AA:BB:CC:DD:EE:FF FF:EE:DD:CC:BB:AA

@carlescufi carlescufi removed the Waiting for response Waiting for author's response label Mar 18, 2020
@carlescufi
Copy link
Member

@cvinayak could you take a look please?

@cvinayak
Copy link
Contributor

@carlescufi yes, placed on the top in my backlog, hope to start on this 19/03.

@cvinayak
Copy link
Contributor

@ChristianHirsch I am able to reproduce the DBus error when running your script. In the btmon logs, it is the same behavior as before, the host stack performs service discovery and enables notifications. I dont see any issue in the hci_usb sample, it is merely doing what the host is telling the controller to do using HCI commands/data.

I tried with hci_usb build with no PHY update, data length and privacy support (to emulate your internal device based on the feature exchange I read), and got different results. But in this case too, host does service discovery but disconnections happen before enabling notifications.

Logs:
btmon_20200318.txt
console.txt

@carlescufi carlescufi added the Waiting for response Waiting for author's response label Mar 23, 2020
@ChristianHirsch
Copy link
Contributor Author

Thanks @cvinayak for trying, but your console out does not look like what I was expecting. It seems that you did not receive any succefful value, right? I will create a more robust script.

@cvinayak
Copy link
Contributor

cvinayak commented Apr 7, 2020

@ChristianHirsch Sorry for the late reply. No, its weird that something in the host automatically performs service discovery and notification enable, and the request to read may have been missed or delayed.

@carlescufi
Copy link
Member

@ChristianHirsch can you check with the BlueZ folks to see if they have an idea of what the issue might be?

@carlescufi carlescufi added the priority: low Low impact/importance bug label Apr 7, 2020
@ChristianHirsch
Copy link
Contributor Author

Yes, I will check with them @carlescufi.

@carlescufi
Copy link
Member

Yes, I will check with them @carlescufi.

Thank you. I will close this in the meantime, feel free to reopen if you find out that this is indeed a Zephyr issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth area: USB Universal Serial Bus bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug Waiting for response Waiting for author's response
Projects
None yet
Development

No branches or pull requests

4 participants