Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library should timeout read / write operations in case of BLE stack misbehaviour #118

Closed
Zkffkah opened this issue Jan 18, 2017 · 16 comments
Assignees
Labels
bug Bug that is caused by the library enhancement

Comments

@Zkffkah
Copy link

Zkffkah commented Jan 18, 2017

In some phone, when chunkifying data and sending them to ble device in sequence using the following code.

BleOperationUtil.startWriteWithChunk(rxBleConnection, UUID_SEND_DATA, bytes, 50, TimeUnit.MILLISECONDS) // bytes is about 300k 

 public static Observable<byte[]> startWriteWithChunk(RxBleConnection rxBleConnection,
                                                         String characteristicUuid,
                                                         byte[] writeData,
                                                         long delay,
                                                         TimeUnit timeUnit) {
        return chunkify(writeData) // splitting the byte array into smaller pieces
                .flatMap(
                        bytesChunk -> {
                            return rxBleConnection
                                    .writeCharacteristic(UUID.fromString(characteristicUuid), bytesChunk) // sending the bytes
                                    .observeOn(AndroidSchedulers.mainThread())
                                    .delay(delay, timeUnit);
                        }, // and waiting for a specific time after each operation

                        1 // to make the flatMap executed sequentially
                );
    }

It's OK in the begining, but after sometime. Strange things happen. System's writeCharacteristic was called but onCharacteristicWrite was not called after that. And then the whole process stop. But RxAndroidBle was not throwing out exception.

I was wondering how can I handle such situation. Should the library provide a way to set operation timeout and notify this kind of error or anything I can do at the application level? Thanks.

@dariuszseweryn
Copy link
Owner

Hello @lowwor
Could you attach a log of what is going on?
It may be a good idea to think either about timeouting such operations or maybe it will be possible to cancel operation when it is unsubscribed.

@Zkffkah
Copy link
Author

Zkffkah commented Jan 19, 2017

01-18 17:19:31.949 D/BluetoothGatt(12331): writeCharacteristic() - uuid: 001120a2-2233-4455-6677-889912345678

01-18 17:19:31.949 D/BtGatt.GattService( 1911): writeCharacteristic() - address=00:02:F7:00:5F:D2

01-18 17:19:31.949 D/BtGatt.btif( 1911): btif_gattc_write_char

01-18 17:19:31.949 D/BtGatt.btif( 1911): btgattc_handle_event: Event 1015

01-18 17:19:31.949 D/BtGatt.btif( 1911): btif_gattc_upstreams_evt: Event 4

01-18 17:19:31.949 D/BtGatt.GattService( 1911): onWriteCharacteristic() - address=00:02:F7:00:5F:D2, status=0

01-18 17:19:31.949 D/BluetoothGatt(12331): onCharacteristicWrite() - Device=00:02:F7:00:5F:D2 UUID=001120a2-2233-4455-6677-889912345678 Status=0

01-18 17:19:32.009 D/X-LOG   (12331): 	├ com.bletest.BleTestApplication$XlogTree.log(BleTestApplication.java:153)

01-18 17:19:32.009 D/X-LOG   (12331): 	├ timber.log.Timber$Tree.prepareLog(Timber.java:541)

01-18 17:19:32.009 D/X-LOG   (12331): 	└ timber.log.Timber$Tree.d(Timber.java:418)

01-18 17:19:32.009 D/X-LOG   (12331): bytesChunk

01-18 17:19:32.009 E/X-LOG   (12331): 	├ com.bletest.BleTestApplication$XlogTree.log(BleTestApplication.java:162)

01-18 17:19:32.009 E/X-LOG   (12331): 	├ timber.log.Timber$Tree.prepareLog(Timber.java:541)

01-18 17:19:32.009 E/X-LOG   (12331): 	└ timber.log.Timber$Tree.e(Timber.java:463)

01-18 17:19:32.009 E/X-LOG   (12331): onNext sent oldSize 17420 newSize 17440

01-18 17:19:32.009 D/BluetoothGatt(12331): writeCharacteristic() - uuid: 001120a2-2233-4455-6677-889912345678

01-18 17:19:32.009 D/BtGatt.GattService( 1911): writeCharacteristic() - address=00:02:F7:00:5F:D2

01-18 17:19:32.009 D/BtGatt.btif( 1911): btif_gattc_write_char

01-18 17:19:32.009 D/BtGatt.btif( 1911): btgattc_handle_event: Event 1015

01-18 17:19:32.009 D/BtGatt.btif( 1911): btif_gattc_upstreams_evt: Event 4

01-18 17:19:32.009 D/BtGatt.GattService( 1911): onWriteCharacteristic() - address=00:02:F7:00:5F:D2, status=0

01-18 17:19:32.009 D/BluetoothGatt(12331): onCharacteristicWrite() - Device=00:02:F7:00:5F:D2 UUID=001120a2-2233-4455-6677-889912345678 Status=0

01-18 17:19:32.089 D/X-LOG   (12331): 	├ com.bletest.BleTestApplication$XlogTree.log(BleTestApplication.java:153)

01-18 17:19:32.089 D/X-LOG   (12331): 	├ timber.log.Timber$Tree.prepareLog(Timber.java:541)

01-18 17:19:32.089 D/X-LOG   (12331): 	└ timber.log.Timber$Tree.d(Timber.java:418)

01-18 17:19:32.089 D/X-LOG   (12331): bytesChunk

01-18 17:19:32.089 E/X-LOG   (12331): 	├ com.bletest.BleTestApplication$XlogTree.log(BleTestApplication.java:162)

01-18 17:19:32.089 E/X-LOG   (12331): 	├ timber.log.Timber$Tree.prepareLog(Timber.java:541)

01-18 17:19:32.089 E/X-LOG   (12331): 	└ timber.log.Timber$Tree.e(Timber.java:463)

01-18 17:19:32.089 E/X-LOG   (12331): onNext sent oldSize 17440 newSize 17460

01-18 17:19:32.089 D/BluetoothGatt(12331): writeCharacteristic() - uuid: 001120a2-2233-4455-6677-889912345678

01-18 17:19:32.099 D/BtGatt.GattService( 1911): writeCharacteristic() - address=00:02:F7:00:5F:D2

01-18 17:19:32.099 D/BtGatt.btif( 1911): btif_gattc_write_char

01-18 17:19:32.099 D/BtGatt.btif( 1911): btgattc_handle_event: Event 1015

01-18 17:19:32.099 D/BtGatt.btif( 1911): btif_gattc_upstreams_evt: Event 4

01-18 17:19:32.099 D/BtGatt.GattService( 1911): onWriteCharacteristic() - address=00:02:F7:00:5F:D2, status=0

01-18 17:19:32.099 D/BluetoothGatt(12331): onCharacteristicWrite() - Device=00:02:F7:00:5F:D2 UUID=001120a2-2233-4455-6677-889912345678 Status=0

01-18 17:19:32.159 D/X-LOG   (12331): 	├ com.bletest.BleTestApplication$XlogTree.log(BleTestApplication.java:153)

01-18 17:19:32.159 D/X-LOG   (12331): 	├ timber.log.Timber$Tree.prepareLog(Timber.java:541)

01-18 17:19:32.159 D/X-LOG   (12331): 	└ timber.log.Timber$Tree.d(Timber.java:418)

01-18 17:19:32.159 D/X-LOG   (12331): bytesChunk

01-18 17:19:32.159 E/X-LOG   (12331): 	├ com.bletest.BleTestApplication$XlogTree.log(BleTestApplication.java:162)

01-18 17:19:32.159 E/X-LOG   (12331): 	├ timber.log.Timber$Tree.prepareLog(Timber.java:541)

01-18 17:19:32.159 E/X-LOG   (12331): 	└ timber.log.Timber$Tree.e(Timber.java:463)

01-18 17:19:32.159 E/X-LOG   (12331): onNext sent oldSize 17460 newSize 17480

01-18 17:19:32.159 D/BluetoothGatt(12331): writeCharacteristic() - uuid: 001120a2-2233-4455-6677-889912345678

01-18 17:19:32.159 D/BtGatt.GattService( 1911): writeCharacteristic() - address=00:02:F7:00:5F:D2

01-18 17:19:32.159 D/BtGatt.btif( 1911): btif_gattc_write_char

01-18 17:19:32.159 D/BtGatt.btif( 1911): btgattc_handle_event: Event 1015

This is the log before the process stop. We can see that after the last writeCharacteristic operation, no corresponding onCharacteristicWrite is invoked, so the lib do nothing because of that, no further operation, and no onError onNext or onComplete is invoked at all. It looks like the phones problem not invoking the onCharacteristicWrite, but I think I should do something to take care of this kind of weird situation as well.

@Zkffkah
Copy link
Author

Zkffkah commented Jan 19, 2017

Also, after that, further ble operation like writing or reading characteristic、disconnecting from the ble device take no effect at all. Only turning off the phone's bluetooth and open it again can make the ble work again.

@dariuszseweryn
Copy link
Owner

That's true. The library should timeout read / write operations after a reasonable time. We should dig through the BLE specification to check what is the maximum delay after which a device needs to acknowledge the operation.

Btw. as a temporary workaround - what would happen if you would make the delay bigger? Like 100 ms?

@dariuszseweryn dariuszseweryn changed the title In some device, onCharacteristicWrite may not called when sending data in sequence Library should timeout read / write operations in case of BLE stack misbehaviour Jan 19, 2017
@dariuszseweryn dariuszseweryn self-assigned this Jan 19, 2017
@dariuszseweryn dariuszseweryn added bug Bug that is caused by the library enhancement labels Jan 19, 2017
@Zkffkah
Copy link
Author

Zkffkah commented Jan 19, 2017

Yeah, tried that already. Everything is OK after increasing the delay ( like 200ms). But the minimum value varies from different devices, so I can't really tell what a safe delay is. Considering the risk and speed, I am now using a higher delay for general case and a lower delay for the devices tested as a white list.

@dariuszseweryn
Copy link
Owner

I think the issue may not lie in the BLE stack but in the peripheral you're talking to - perhaps it doesn't work fast enough to consume the next chunk of data. It lacks some flow control features. Maybe if it would inform the app by a notification that it is ready for the next chunk the problem could be solved.

(But still the library should timeout operations after some time.)

@Zkffkah
Copy link
Author

Zkffkah commented Jan 19, 2017

Yeah, the peripheral's processing capacity may be a factor for the problem of transfering data. But in this case, in fact, most phone don't have this kind of problem with the delay of 50ms and the peripheral can deal with the data correctly (tell from the ble device's serial port log) . So I blame the phone‘s (mostly Chinese OEM phone) poor ble stack for this problem. The flow control feature is a good idea and we have implemented it in another situation (say a different working mode of the ble device) like tranfering command data which may take time to process. But in this situation, the peripheral is more like a transparent proxy ( pass through data to elsewhere without doing anything), so to maximize the speed, we had decided to omit the flow control part. Anyway, I'm looking forward to the timeout feature and thanks for your kind support.

@dariuszseweryn
Copy link
Owner

I am looking into the topic. I have found this: http://stackoverflow.com/questions/30420137/writecharacteristic-returns-true-but-does-not-call-oncharacteristicwrite
Is it possible that on some occasions chunkify() fails to work correctly and the peripheral does not acknowledge a finished write? Have you tried to manually check (debug) what is being transferred? And could you paste what version of chunkify() you use?

@Zkffkah
Copy link
Author

Zkffkah commented Jan 20, 2017

I have found that too when I searched for the topic.
So I have done the following check on the problem phone:

  • sending a single byte -> ok, onCharacteristicWrite is called, the peripheral can log the byte
  • sending a 21 bytes data -> ok, onCharacteristicWrite is called, the peripheral can log the byte
  • interval sending fixed 20 bytes data (51 51 51 ... 51 51 51) with a 50 ms delay-> problems still occurs and the last chunkified bytes is 20 bytes (51 51... 51).

And after using higher delay, I do a few check too.

  • interval sending fixed 20 bytes data (51 51 51 ... 51 51 51) with larger delay -> ok, onCharacteristicWrite is called, the peripheral can log the bytes, same as what we sent
  • interval sending a 21 bytes data (51 51 51 ... 51 51 51) with larger delay -> ok, onCharacteristicWrite is called, the peripheral can log the byte, same as what we sent

So, I think it's not the same kind of problem. And I think it's really rare and abnormal that the phone can't even just send a single byte which I don't see on all the phone I tested.

Following is the interval code and chunkify() code I use

  BleOperationUtil.startWriteWithChunk(rxBleConnection, UUID_SEND_DATA, bytes, delay, TimeUnit.MILLISECONDS)
                        .observeOn(AndroidSchedulers.mainThread())
                        .repeatWhen(o -> {
                                    return o.concatMap(new Func1<Void, Observable<? extends Long>>() {
                                                           @Override
                                                           public Observable<? extends Long> call(Void aVoid) {
                                                               if (!isStopped) {
                                                                   return Observable.timer(intervalTime, TimeUnit.MILLISECONDS)
                                                                           .observeOn(AndroidSchedulers.mainThread());
                                                               } else {
                                                                   return Observable.error(new StopSignal());
                                                               }
                                                           }
                                                       }
                                    );

                                }
                        )

chunkify()

Observable<byte[]> chunkify(byte[] bytes) {
    return Observable.create(new SyncOnSubscribe<ByteBuffer, byte[]>() {

        @Override
        protected ByteBuffer generateState() {
            return ByteBuffer.wrap(bytes);
        }

        @Override
        protected ByteBuffer next(ByteBuffer state, Observer<? super byte[]> observer) {
            final int chunkLength = Math.min(state.remaining(), 20);
            if (chunkLength > 0) {
                final byte[] bytesChunk = new byte[chunkLength];
                state.get(bytesChunk);
                observer.onNext(bytesChunk);
            }
            if (chunkLength == 0 || state.remaining() == 0) {
                observer.onCompleted();
            }
            return state;
        }
    });
}

@dariuszseweryn
Copy link
Owner

@lowwor Sorry for a late response. So with a delay longer than 50 ms the code works without a problem?

@Zkffkah
Copy link
Author

Zkffkah commented Jan 29, 2017

Yes, it is.

@dariuszseweryn
Copy link
Owner

I have a branch with added timeouts for all operations. It should be enough to keep the library working.

dariuszseweryn added a commit that referenced this issue Jan 30, 2017
Reviewers: pawel.urban

Reviewed By: pawel.urban

Differential Revision: https://phabricator.polidea.com/D2144
@dariuszseweryn
Copy link
Owner

A fix was pushed to the master branch and should be available in the SNAPSHOT shortly. The library should no longer stop in case of the issue you have suffered. Best Regards.

@Zkffkah
Copy link
Author

Zkffkah commented Jan 31, 2017

Thanks a lot for the great job!
But I was curious that how is the 30 second timeout coming from? Is it just a long enough time or is there any reference for that?

@dariuszseweryn
Copy link
Owner

I have digged through the Bluetooth Specification to check if it has a maximum time for a device response specified. Actually the time varies by two factors - the time between the activities of the device and the number of the activities the device may skip but still consider the connection open.

It roughly translates to a time between several milliseconds to 32 seconds (or so). The Android OS is considering a connection not being able to start after 30 seconds (BluetoothGatt.establishConnection(Context, false)) - I have used the same value.

@Zkffkah
Copy link
Author

Zkffkah commented Feb 2, 2017

Understood, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is caused by the library enhancement
Projects
None yet
Development

No branches or pull requests

2 participants