Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long write operations does not seem to work correctly #135

Closed
maoueh opened this issue Feb 13, 2017 · 35 comments
Closed

Long write operations does not seem to work correctly #135

maoueh opened this issue Feb 13, 2017 · 35 comments
Assignees
Labels
bug Bug that is caused by the library

Comments

@maoueh
Copy link
Contributor

maoueh commented Feb 13, 2017

Summary

I tested the new long write operation commit that has been pushed recently. Tried it in my application and it crashes after the second "long" write operation.

Let me explain a little more my use case. I'm currently sending a rather big file to a BLE device. The file is a firmware in Intel HEX forma. From the defined application exchange protocol, I'm iterating through each line, wrap the line in a application packet format (starts and ends with a predefined byte value), chunk that into multiple packets (20 bytes) and write them one by one to the device by using writeCharacteristic. The callback is then used to trigger the next packet once a line has been fully written, I start sending the next one.

I wanted to test the new long operation as it greatly reduce amount of operation on the queue since a full line using long write operation would used a single operation on the radio queue.

I tried it with the following code:

    getBleConnection().getCharacteristic(characteristicUuid).flatMap(characteristic -> {
      characteristic.setWriteType(BluetoothGattCharacteristic.WRITE_TYPE_NO_RESPONSE);

      return getBleConnection().createNewLongWriteBuilder()
                               .setCharacteristic(characteristic)
                               .setMaxBatchSize(20)
                               .setBytes(bytes)
                               .build();
    }).subscribe(ignore -> {
    }, this::onError, this::onLineWritten);

Where the onLineWritten simply triggers the next line writing operation.

Preconditions


Steps to reproduce actual result

Not clear, but here what I think.

  1. Send multiple lines of data to a device. Each line taking more than 20 bytes.
  2. Use the above code to trigger the line sending one by one and let RxAndroidBLE chunk them as needed.
  3. Run the "upload" operation.

Actual result

There is crash happening, here the log part, the characteristic being written to is the one starting with b4520104 and the b4520107 is some kind of ACK channel that we don't really use.

The second line operation is hanging until a timeout occurs.

02-13 12:28:21.426 D/FlashFirmwareStateMachine: Trying to transition from [FLASH_ACK_RECEIVED (2)] to [FLASH_PROGRESS (3)].
02-13 12:28:21.427 D/FlashFirmwareStateMachine: Writing [AA06063A000000003C4D23343232C223304B802923304B10105CCBAB51EE8C05A6CDD6A0531246B6977D4B49808AE5A6F85A6016D4BFD1DF18A791D803E82330D765D789540F3DCA0B942F3A203D07D16540C306BE7C9C4783F1C3111D7DC9E6516873A4C71C6C876836B1A7E523341F9BEC86DFF644E407AE1963E37AAEF666C6E1DC87AF1C5381F17D0E8E2D9F4382CB1792C3077482233197E21EC36416E53281696CB520F25675F749AD7451326E919660F183C98A250B7E1F1A6D43F81D769481559EDC8FF87480918BFA316A42B969DE118FF1928AF88F7230571E8DE2626D233587A67660BE6EC45049B07BC9CEBC975E30518C8456137AA9B1372779BC2D5BCCBB89B5E3C6D07BBE743FD39C12F33EBB] to device.
02-13 12:28:21.429 D/RxBle#Radio:   QUEUED RxBleRadioOperationCharacteristicLongWrite(91420617)
02-13 12:28:21.430 D/FlashFirmwareStateMachine: Emitting event [1 / 697 (0.14%)].
02-13 12:28:21.430 D/RxBle#Radio:  STARTED RxBleRadioOperationCharacteristicLongWrite(91420617)
02-13 12:28:21.451 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.456 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.463 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.468 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.473 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.479 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.483 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:21.484 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.497 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:21.501 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.515 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:21.528 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:21.530 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.543 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.553 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.558 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.560 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:21.563 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.572 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:21.584 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.585 V/FlashFirmwareStateMachine: Wrote packet to device, checking if we can write next line.
02-13 12:28:21.586 D/FlashFirmwareStateMachine: Writing [AA06063A000000003C414E559C91B161961B8DCCAA815466E15B9D0EF5726F256F00C65944F8E323302335664F946BBC30C24E91A5DEAC348D62AD8933C0EB1064A470FAC2E00FAB8BE012C5A670A52CFA433F1D6D01ED4F1AF62331D9B2938EBA4A701A524660D361496BA714204B01A9674A5FC6927AA7B3C98C8BFF3658C04F218EF3928329CF006B746A4841BD02BF682ED5C037571C5BDB92DD7F515049FF3DF608FB280784975CDF1D076F5CF1EADD3AE712B4F004E657DA6F21C7449541785223321A6AD1692542CE466C6BC6CCAADF427D99B0F21F98ACB88741592B15F242F5F9154B976E26F0D314C02262359612B240AF867B04CEA587EE6814CDBCCCBBD7F3609F45887562B87C5AB6EFE53EBB] to device.
02-13 12:28:21.590 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:21.594 D/RxBle#Radio:   QUEUED RxBleRadioOperationCharacteristicLongWrite(10581863)
02-13 12:28:21.596 D/FlashFirmwareStateMachine: Emitting event [2 / 697 (0.29%)].
02-13 12:28:21.598 D/RxBle#Radio: FINISHED RxBleRadioOperationCharacteristicLongWrite(91420617)
02-13 12:28:21.599 D/RxBle#Radio:  STARTED RxBleRadioOperationCharacteristicLongWrite(10581863)
02-13 12:28:21.604 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:21.631 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520104-a308-4e56-8a52-536c2ad07147 status=0
02-13 12:28:21.662 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520107-a308-4e56-8a52-536c2ad07147
02-13 12:28:51.607 D/RxBle#Radio: FINISHED RxBleRadioOperationCharacteristicLongWrite(10581863)
02-13 12:28:51.608 D/FlashFirmwareStateMachine: Finishing with error (Received onError without having requested it).
02-13 12:28:51.613 D/FlashFirmwareStateMachine: Unsubscribing from subscriber(s).
02-13 12:28:51.616 E/BleDebugActivity: co.samsao.directardware.exception.NgmmDeviceStateMachineException: Received onError without having requested it
02-13 12:28:51.616 E/BleDebugActivity:     at co.samsao.directardware.ngmm.StateMachine.finishWithError(StateMachine.java:392)
02-13 12:28:51.616 E/BleDebugActivity:     at co.samsao.directardware.ngmm.StateMachine.onError(StateMachine.java:108)
02-13 12:28:51.616 E/BleDebugActivity:     at co.samsao.directardware.ngmm.StateMachine.access$lambda$10(StateMachine.java)
02-13 12:28:51.616 E/BleDebugActivity:     at co.samsao.directardware.ngmm.StateMachine$$Lambda$11.call(Unknown Source)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.internal.util.ActionSubscriber.onError(ActionSubscriber.java:44)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.observers.SafeSubscriber._onError(SafeSubscriber.java:153)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.observers.SafeSubscriber.onError(SafeSubscriber.java:115)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.internal.operators.OperatorMerge$MergeSubscriber.reportError(OperatorMerge.java:266)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.internal.operators.OperatorMerge$MergeSubscriber.checkTerminate(OperatorMerge.java:818)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.internal.operators.OperatorMerge$MergeSubscriber.emitLoop(OperatorMerge.java:579)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.internal.operators.OperatorMerge$MergeSubscriber.emit(OperatorMerge.java:568)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.internal.operators.OperatorMerge$InnerSubscriber.onError(OperatorMerge.java:852)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.observers.Subscribers$5.onError(Subscribers.java:230)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.observers.Subscribers$5.onError(Subscribers.java:230)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.observers.Subscribers$5.onError(Subscribers.java:230)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.subjects.ReplaySubject$ReplayUnboundedBuffer.drain(ReplaySubject.java:625)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.subjects.ReplaySubject$ReplayState.onError(ReplaySubject.java:470)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.subjects.ReplaySubject.onError(ReplaySubject.java:247)
02-13 12:28:51.616 E/BleDebugActivity:     at com.polidea.rxandroidble.internal.RxBleRadioOperation.onError(RxBleRadioOperation.java:110)
02-13 12:28:51.616 E/BleDebugActivity:     at com.polidea.rxandroidble.internal.operations.RxBleRadioOperationCharacteristicLongWrite.access$400(RxBleRadioOperationCharacteristicLongWrite.java:31)
02-13 12:28:51.616 E/BleDebugActivity:     at com.polidea.rxandroidble.internal.operations.RxBleRadioOperationCharacteristicLongWrite$2.call(RxBleRadioOperationCharacteristicLongWrite.java:111)
02-13 12:28:51.616 E/BleDebugActivity:     at com.polidea.rxandroidble.internal.operations.RxBleRadioOperationCharacteristicLongWrite$2.call(RxBleRadioOperationCharacteristicLongWrite.java:108)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.Completable$29.callOnError(Completable.java:1997)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.Completable$29.onError(Completable.java:1988)
02-13 12:28:51.616 E/BleDebugActivity:     at rx.Completable$10$1.onError(Completable.java:582)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OnSubscribeRedo$4$1.onError(OnSubscribeRedo.java:326)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OperatorTakeWhile$2.onError(OperatorTakeWhile.java:85)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OnSubscribeMap$MapSubscriber.onError(OnSubscribeMap.java:88)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OnSubscribeMap$MapSubscriber.onError(OnSubscribeMap.java:88)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OnSubscribeRedo$3$1.onNext(OnSubscribeRedo.java:300)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OnSubscribeRedo$3$1.onNext(OnSubscribeRedo.java:284)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.NotificationLite.accept(NotificationLite.java:141)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.subjects.SubjectSubscriptionManager$SubjectObserver.emitNext(SubjectSubscriptionManager.java:253)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.subjects.BehaviorSubject.onNext(BehaviorSubject.java:160)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.observers.SerializedObserver.onNext(SerializedObserver.java:91)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.subjects.SerializedSubject.onNext(SerializedSubject.java:67)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OnSubscribeRedo$2$1.onError(OnSubscribeRedo.java:237)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.observers.SerializedObserver.onError(SerializedObserver.java:152)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.observers.SerializedSubscriber.onError(SerializedSubscriber.java:78)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OperatorTimeoutBase$TimeoutSubscriber$1.onError(OperatorTimeoutBase.java:187)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OnSubscribeThrow.call(OnSubscribeThrow.java:44)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OnSubscribeThrow.call(OnSubscribeThrow.java:28)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.Observable.unsafeSubscribe(Observable.java:10200)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OperatorTimeoutBase$TimeoutSubscriber.onTimeout(OperatorTimeoutBase.java:200)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.operators.OperatorTimeout$1$1.call(OperatorTimeout.java:41)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.schedulers.EventLoopsScheduler$EventLoopWorker$2.call(EventLoopsScheduler.java:189)
02-13 12:28:51.617 E/BleDebugActivity:     at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
02-13 12:28:51.617 E/BleDebugActivity:     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:428)
02-13 12:28:51.617 E/BleDebugActivity:     at java.util.concurrent.FutureTask.run(FutureTask.java:237)
02-13 12:28:51.617 E/BleDebugActivity:     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:272)
02-13 12:28:51.617 E/BleDebugActivity:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
02-13 12:28:51.617 E/BleDebugActivity:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
02-13 12:28:51.617 E/BleDebugActivity:     at java.lang.Thread.run(Thread.java:761)
02-13 12:28:51.617 E/BleDebugActivity: Caused by: BleGattCallbackTimeoutException{macAddress=C9:27:49:98:4B:E2, status=-1 (0xffffffff), bleGattOperationType=BleGattOperation{description='CHARACTERISTIC_LONG_WRITE'}}
02-13 12:28:51.617 E/BleDebugActivity:     at com.polidea.rxandroidble.internal.operations.RxBleRadioOperationCharacteristicLongWrite.protectedRun(RxBleRadioOperationCharacteristicLongWrite.java:82)
02-13 12:28:51.617 E/BleDebugActivity:     at com.polidea.rxandroidble.internal.RxBleRadioOperation.run(RxBleRadioOperation.java:44)
02-13 12:28:51.617 E/BleDebugActivity:     at com.polidea.rxandroidble.internal.radio.RxBleRadioImpl$1$1.call(RxBleRadioImpl.java:59)
02-13 12:28:51.617 E/BleDebugActivity:     at com.polidea.rxandroidble.internal.radio.RxBleRadioImpl$1$1.call(RxBleRadioImpl.java:56)
...

Expected result

I think this code should work. Since I write the next line only on the onComplete callback, there is no reason the next line can not be written correctly and that a timeout occurs.

Note that a similar code without using long write operation does work correctly:

        mPacketWriter.write(characteristicUuid, bytes, delay, delayUnit).subscribe(ignore -> {
        }, this::onError, this::onLineWritten);

Where mPacketWriter is a custom chunker that creates all the chunks to write and then concatMap them using the mBleConnection.writeCharacteristic observable as the "synchronization" point.

@dariuszseweryn
Copy link
Owner

Hello @maoueh
Thank you for your report. Could you explain a little bit more: what device / OS do you test on?
I see one difference between using the long write operation and your PacketWriter - you use a delay and delayUnit which is not present in the long write - could you give more info about it?

@maoueh
Copy link
Contributor Author

maoueh commented Feb 13, 2017

@DariuszAniszewski the delay is not used in "production", it's more when I have to debug stuff with the actual device.

Here the core of the packet writer I usually use:

private Observable<Void> writeToCharacteristic(UUID characteristic,
                                                 byte[] bytes,
                                                 long delay,
                                                 TimeUnit delayUnit,
                                                 int startIndexInclusive,
                                                 int endIndexExclusive) {
    return Observable.fromCallable(() -> Arrays.copyOfRange(bytes, startIndexInclusive, endIndexExclusive))
                     .flatMap(data -> {
                       if (!Delay.isImmediate(delay) && startIndexInclusive == 0) {
                         return Observable.timer(delay, delayUnit).map(ignore -> data);
                       }

                       return Observable.just(data);
                     })
                     .flatMap(data -> {
                       Timber.d("Writing packetize data [%s] to NGMM device.", Bytes.bytesToHex(data));

                       return mBleConnection.writeCharacteristic(characteristic, data).doOnNext((result) -> {
                         onPacketWritten(data);
                       });
                     })
                     .map(ignore -> null);
  }

In production code, the Delay.isImmediate(delay) is always true. So there is no delay usually, the "delay" is usually induced by the mBleConnection.writeCharacteristic(characteristic, data) observable onComplete event.

This is being tested on a LG G5 Android 7.0 phone. The receiving peripheral is backed by a Nordic nRF51822.

I just tried adding a big delay (250ms and also 1s) just before each long operation:

    Observable.timer(250, TimeUnit.MILLISECONDS).flatMap(ignore -> {
      return getBleConnection().getCharacteristic(characteristicUuid).flatMap(characteristic -> {
        return getBleConnection().createNewLongWriteBuilder()
                                 .setCharacteristic(characteristic)
                                 .setMaxBatchSize(20)
                                 .setBytes(bytes)
                                 .build();
      });
    }).subscribe(ignore -> {
    }, this::onError, this::onLineWritten);

The operation goes farther, I'm able to write about between 6 and 14 lines but at some point, operation hangs and transfer stops hitting the timeout exception.

I know the transfer is happening since there is a LED blinking on the peripheral when transfer is in progress.

Just not sure why the operation hangs like this.

@DariuszAniszewski
Copy link

@dariuszseweryn ^^

@dariuszseweryn dariuszseweryn added the bug Bug that is caused by the library label Feb 13, 2017
@dariuszseweryn dariuszseweryn self-assigned this Feb 13, 2017
@dariuszseweryn
Copy link
Owner

It may be possible that that you are experiencing a race condition that happens inside the library. I have seen it though I could not find the cause.

If you could enable RxBleLog on VERBOSE and paste the logs I would get more info.

As a temporary workaround you could delay starting the next long write for about 10 milliseconds after the previous completed.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 13, 2017

I activated VERBOSE but the log was unchanged. There is only four occurrences of RxBleLog.v and none impact RxBleRadio. They are all in BleConnectionCompat class.

I tried adding a delay (250ms and 1s), but this changed nothing. In fact, like mentioned, I make the whole process go a bit farther (between 4 an 14 lines) but it still hangs.

Do you want me to add verbose statement somewhere in RxAndroidBLE code?

@maoueh
Copy link
Contributor Author

maoueh commented Feb 13, 2017

Nevermind, just understood that this is probably exactly what you wanted to check! Here the connect part of the log:

02-13 14:08:59.577 21119-21119 D/RxBle#Radio:   QUEUED RxBleRadioOperationConnect(145097946)
02-13 14:08:59.577 21119-23612 D/RxBle#Radio:  STARTED RxBleRadioOperationConnect(145097946)
02-13 14:08:59.579 21119-21119 V/RxBle#BleConnectionCompat: Connecting without reflection
02-13 14:08:59.580 21119-21119 D/BluetoothGatt: connect() - device: C9:27:49:98:4B:E2, auto: false
02-13 14:08:59.580 21119-21119 D/BluetoothGatt: registerApp()
02-13 14:08:59.580 21119-21119 D/BluetoothGatt: registerApp() - UUID=36f37d3f-bd96-4b06-af12-6ce1a7f6ca5d
02-13 14:08:59.582 21119-23617 D/BluetoothGatt: onClientRegistered() - status=0 clientIf=5
02-13 14:08:59.732 21119-23617 D/BluetoothGatt: onClientConnectionState() - status=0 clientIf=5 device=C9:27:49:98:4B:E2
02-13 14:08:59.735 21119-23617 D/RxBle#BluetoothGatt: onConnectionStateChange newState=2 status=0
02-13 14:08:59.751 21119-23651 I/DeviceWrapper$OpenConnectionOnSubscribe: Connection to device [NAME (C9:27:49:98:4B:E2)] established, notifying subscriber about it.
02-13 14:08:59.754 21119-23651 D/RxBle#Radio:   QUEUED RxBleRadioOperationServicesDiscover(188160369)
02-13 14:08:59.760 21119-23612 D/RxBle#Radio: FINISHED RxBleRadioOperationConnect(145097946)
02-13 14:08:59.761 21119-23612 D/RxBle#Radio:  STARTED RxBleRadioOperationServicesDiscover(188160369)
02-13 14:08:59.763 21119-21119 D/BluetoothGatt: discoverServices() - device: C9:27:49:98:4B:E2
02-13 14:08:59.782 21119-21131 D/BluetoothGatt: onSearchComplete() = Device=C9:27:49:98:4B:E2 Status=0
02-13 14:08:59.782 21119-21131 D/RxBle#BluetoothGatt: onServicesDiscovered status=0
02-13 14:08:59.783 21119-23651 D/BluetoothGatt: setCharacteristicNotification() - uuid: b4520107-a308-4e56-8a52-536c2ad07147 enable: true
02-13 14:08:59.787 21119-23651 D/RxBle#Radio:   QUEUED RxBleRadioOperationDescriptorWrite(254820577)
02-13 14:08:59.788 21119-23651 D/BluetoothGatt: setCharacteristicNotification() - uuid: b4520102-a308-4e56-8a52-536c2ad07147 enable: true
02-13 14:08:59.795 21119-23651 D/RxBle#Radio:   QUEUED RxBleRadioOperationDescriptorWrite(154966624)
02-13 14:08:59.796 21119-23651 D/BluetoothGatt: setCharacteristicNotification() - uuid: b4520105-a308-4e56-8a52-536c2ad07147 enable: true
02-13 14:08:59.800 21119-23651 D/RxBle#Radio:   QUEUED RxBleRadioOperationDescriptorWrite(147424987)
02-13 14:08:59.801 21119-23612 D/RxBle#Radio: FINISHED RxBleRadioOperationServicesDiscover(188160369)
02-13 14:08:59.801 21119-23612 D/RxBle#Radio:  STARTED RxBleRadioOperationDescriptorWrite(254820577)
02-13 14:08:59.963 21119-21131 D/RxBle#BluetoothGatt: onDescriptorWrite descriptor=00002902-0000-1000-8000-00805f9b34fb status=0
02-13 14:08:59.967 21119-23612 D/RxBle#Radio: FINISHED RxBleRadioOperationDescriptorWrite(254820577)
02-13 14:08:59.971 21119-23612 D/RxBle#Radio:  STARTED RxBleRadioOperationDescriptorWrite(154966624)
02-13 14:09:00.156 21119-21135 D/RxBle#BluetoothGatt: onDescriptorWrite descriptor=00002902-0000-1000-8000-00805f9b34fb status=0
02-13 14:09:00.159 21119-23612 D/RxBle#Radio: FINISHED RxBleRadioOperationDescriptorWrite(154966624)
02-13 14:09:00.159 21119-23612 D/RxBle#Radio:  STARTED RxBleRadioOperationDescriptorWrite(147424987)
02-13 14:09:00.265 21119-21131 D/RxBle#BluetoothGatt: onDescriptorWrite descriptor=00002902-0000-1000-8000-00805f9b34fb status=0
02-13 14:09:00.270 21119-23612 D/RxBle#Radio: FINISHED RxBleRadioOperationDescriptorWrite(147424987)

@dariuszseweryn
Copy link
Owner

Uh, excuse me. You have already had the log. My mistake. If you could add a log in the writeData method and another one on each emission of completed event emission (around line 170) change

                return writeOperationAckStrategy.call(emittingOnBatchWriteFinished.map(bufferIsNotEmpty(byteBuffer)))
                        .takeWhile(bufferIsNotEmpty(byteBuffer));

to something like

                return writeOperationAckStrategy.call(emittingOnBatchWriteFinished
                                                                                 .doOnNext(new Action1<Object>() {
                                                                                     // I am writing this on a web interface, add a log here - it will be triggered after each batch write has finished
                                                                                 })
                                                                                 .map(bufferIsNotEmpty(byteBuffer)))
                        .takeWhile(bufferIsNotEmpty(byteBuffer));

@maoueh
Copy link
Contributor Author

maoueh commented Feb 13, 2017

I added the log and tried other kind of operation. The problem seems to happen way more frequently when a bunch of characteristic changed event arrives after the long write operation has started. Is it possible they are interfering with the way the library check if the characteristic was written correctly?

To add more context, here an overview of how of interact with the peripheral. We write some data to the peripheral and it responds through some notification. Here a different operation than flash that request some data from the device:

02-13 14:48:40.955 17567-17567 D/StateMachine: Writing [AA01040BBB] to device.
02-13 14:48:41.209 17567-18900 D/RxBle#Radio:   QUEUED RxBleRadioOperationCharacteristicLongWrite(253784101)
02-13 14:48:41.211 17567-18663 D/RxBle#Radio:  STARTED RxBleRadioOperationCharacteristicLongWrite(253784101)
02-13 14:48:41.220 17567-17567 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [5 bytes] chunk to characteristic with success [true].
02-13 14:48:41.221 17567-17582 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 14:48:41.256 17567-17582 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:41.262 17567-18786 V/StateMachine: Received notification data [AA060200BB].
02-13 14:48:41.262 17567-18786 D/StateMachine: Processing complete packet [AA060200BB].
02-13 14:48:41.264 17567-18786 D/StateMachine: Trying to transition from [START (0)] to [ACK_RECEIVED (1)].
02-13 14:48:41.321 17567-17625 D/OpenGLRenderer: endAllActiveAnimators on 0x7d6baebc00 (MenuPopupWindow$MenuDropDownListView) with handle 0x7d6a11abc0
02-13 14:48:41.911 17567-17584 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:41.912 17567-18786 V/StateMachine: Received notification data [AA3102FE04FFBB].
02-13 14:48:41.913 17567-18786 D/StateMachine: Processing complete packet [AA3102FE04FFBB].
02-13 14:48:41.931 17567-17581 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:41.932 17567-17581 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:41.933 17567-18786 V/StateMachine: Received notification data [AA3102FE046EF5040E024445492D424C45200000].
02-13 14:48:41.934 17567-18786 V/StateMachine: Received notification data [0000BB].
02-13 14:48:41.937 17567-18786 D/StateMachine: Processing complete packet [AA3102FE046EF5040E024445492D424C452000000000BB].
02-13 14:48:42.064 17567-17582 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.070 17567-18786 V/StateMachine: Received notification data [AA3102FEDA1202000FBB].
02-13 14:48:42.071 17567-17582 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.072 17567-18786 D/StateMachine: Processing complete packet [AA3102FEDA1202000FBB].
02-13 14:48:42.074 17567-17582 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.074 17567-18786 V/StateMachine: Received notification data [AA3102FEFBFEB80F35F900B485101A5030CCCC2A].
02-13 14:48:42.076 17567-18786 V/StateMachine: Received notification data [43AE5100000000BB].
02-13 14:48:42.077 17567-18786 D/StateMachine: Processing complete packet [AA3102FEFBFEB80F35F900B485101A5030CCCC2A43AE5100000000BB].
02-13 14:48:42.083 17567-17584 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.087 17567-18786 V/StateMachine: Received notification data [AA01023B00BB].
02-13 14:48:42.088 17567-18786 D/StateMachine: Processing complete packet [AA01023B00BB].
02-13 14:48:42.089 17567-18786 D/StateMachine: Trying to transition from [ACK_RECEIVED (1)] to [RESPONSE_RECEIVED (2)].
02-13 14:48:42.090 17567-18786 D/StateMachine: Finishing with result [[B@77afbd].
02-13 14:48:42.090 17567-18786 D/StateMachine: Emitting event [[B@77afbd].
02-13 14:48:42.090 17567-18786 D/StateMachine: Unsubscribing from subscriber(s).
02-13 14:48:42.910 17567-17582 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.928 17567-17584 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.932 17567-17584 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.936 17567-17584 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.938 17567-17584 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:42.962 17567-17581 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:48:45.815 17567-17584 D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 14:49:11.233 17567-18944 D/StateMachine: Finishing with error (Received onError without having requested it).
02-13 14:49:11.233 17567-18663 D/RxBle#Radio: FINISHED RxBleRadioOperationCharacteristicLongWrite(253784101)

Note here that the data is written with success (WRITE_DATA) but the ACK never arrived. I validated that the log statements does print correctly when everything is working:

02-13 14:53:42.016 17567-17567StateMachineD/OtherStateMachine: Writing [AA060103E80000BB] to device.
02-13 14:53:42.271 17567-18900StateMachineD/RxBle#Radio:   QUEUED RxBleRadioOperationCharacteristicLongWrite(21173544)
02-13 14:53:42.274 17567-18663StateMachineD/RxBle#Radio:  STARTED RxBleRadioOperationCharacteristicLongWrite(21173544)
02-13 14:53:42.287 17567-17567StateMachineD/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [8 bytes] chunk to characteristic with success [true].
02-13 14:53:42.294 17567-17582StateMachineD/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 14:53:42.297 17567-18988StateMachineD/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
02-13 14:53:42.299 17567-18663StateMachineD/RxBle#Radio: FINISHED RxBleRadioOperationCharacteristicLongWrite(21173544)

@maoueh
Copy link
Contributor Author

maoueh commented Feb 13, 2017

I added a log statement in writeResponseForMatchingCharacteristic.

    private Func1<ByteAssociation<UUID>, Boolean> writeResponseForMatchingCharacteristic() {
        return new Func1<ByteAssociation<UUID>, Boolean>() {
            @Override
            public Boolean call(ByteAssociation<UUID> uuidByteAssociation) {
                RxBleLog.d("MATCHING_CHAR: Checking if matching UUID [%s].", uuidByteAssociation.first);
                return uuidByteAssociation.first.equals(bluetoothGattCharacteristic.getUuid());
            }
        };
    }

When it's not working, even though the onCharacteristicWrite log appears, the writeResponseForMatchingCharacteristic matching function is not invoked somehow

Here a log when not working:

02-13 15:07:44.940 9074-11069D/RxBle#Radio:   QUEUED RxBleRadioOperationCharacteristicLongWrite(257835523)
02-13 15:07:44.943 9074-10958D/RxBle#Radio:  STARTED RxBleRadioOperationCharacteristicLongWrite(257835523)
02-13 15:07:44.956 9074-9074D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [5 bytes] chunk to characteristic with success [true].
02-13 15:07:44.956 9074-10982D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:07:44.987 9074-10982D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 15:07:44.988 9074-11032V/StateMachine: Received notification data [AA060200BB].
02-13 15:07:44.989 9074-11032D/StateMachine: Processing complete packet [AA060200BB].
02-13 15:07:44.989 9074-10982D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 15:07:44.990 9074-11032D/StateMachine: Trying to transition from [START (0)] to [ACK_RECEIVED (1)].
02-13 15:07:44.991 9074-11032V/StateMachine: Received notification data [AA01023B00BB].
02-13 15:07:44.992 9074-11032D/StateMachine: Processing complete packet [AA01023B00BB].
02-13 15:07:44.993 9074-11032D/StateMachine: Trying to transition from [ACK_RECEIVED (1)] to [RESPONSE_RECEIVED (2)].
02-13 15:07:44.994 9074-11032D/StateMachine: Finishing with result [[B@841ec62].
02-13 15:07:44.996 9074-11032D/StateMachine: Emitting event [[B@841ec62].
02-13 15:07:44.996 9074-11032D/StateMachine: Unsubscribing from subscriber(s).
02-13 15:07:45.058 9074-9104D/OpenGLRenderer: endAllActiveAnimators on 0x7d6a1cfc00 (MenuPopupWindow$MenuDropDownListView) with handle 0x7d6ddff500
02-13 15:08:14.949 9074-11084D/StateMachine: Finishing with error (Received onError without having requested it).
02-13 15:08:14.952 9074-10958D/RxBle#Radio: FINISHED RxBleRadioOperationCharacteristicLongWrite(257835523)

And one when it works:

02-13 15:07:37.149 9074-11069D/RxBle#Radio:   QUEUED RxBleRadioOperationCharacteristicLongWrite(202844953)
02-13 15:07:37.150 9074-10958D/RxBle#Radio:  STARTED RxBleRadioOperationCharacteristicLongWrite(202844953)
02-13 15:07:37.153 9074-9074D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [5 bytes] chunk to characteristic with success [true].
02-13 15:07:37.154 9074-9087D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:07:37.155 9074-11128D/RxBle#RadioOperationCharacteristicLongWrite: MATCHING_CHAR: Checking if matching UUID [b4520101-a308-4e56-8a52-536c2ad07147].
02-13 15:07:37.156 9074-11128D/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
02-13 15:07:37.157 9074-10958D/RxBle#Radio: FINISHED RxBleRadioOperationCharacteristicLongWrite(202844953)
02-13 15:07:37.187 9074-9087D/RxBle#BluetoothGatt: onCharacteristicChanged characteristic=b4520102-a308-4e56-8a52-536c2ad07147
02-13 15:07:37.188 9074-11032V/StateMachine: Received notification data [AA060200BB].
02-13 15:07:37.188 9074-11032D/StateMachine: Processing complete packet [AA060200BB].

@dariuszseweryn
Copy link
Owner

Yup. It seems that this is the exact problem with a race condition. The RxBleGattCallback recieves the notification about finished write but it is not transmitted to the observer (in this situation Long Write Operation). I will need to look into this tomorrow.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 13, 2017

In fact, now that you talked about the observer not being notified, I think I might have hit a similar (same?) problem when I first did the "flashing" implementation. At first, what I tried to do was to "subscribe" multiple time to writeCharacteristic without never really waiting between "write" operation.

My idea was that this would fill the queue and the "consumer" thread (the one in RxBleRadioImpl) would slowly but steadily remove elements from the queue. However, the queue quickly (maybe after two or three writes) become stalled only queuing more and more operation without removing new entries.

At first I thought it was a problem with threading on my part and went with writing next chunk only previous one did answer.

But now that we talked about operation not being notified, I think that what happened at that time was the exact same problem as now. The notification was not making it's way to the observer (at that time the RxBleRadioOperationCharacteristicWrite). This was of course making the operation taking control of the "consumer" thread forever as there was no timeout on operation before. This was in return preventing the "consumer" thread from popping item from the queue.

Feel free to ping me as I can easily reproduce the problem. I can try solution or give you some detailed log when working and when not working.

Thanks for looking into this.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 13, 2017

I think one of the problem (I say one, see below) is that RxBleGattCallback#bluetoothGattBehaviorSubject was not serialized causing it to fail when multiple thread were writing to it. I changed it to:

    private final SerializedSubject<BluetoothGatt, BluetoothGatt> bluetoothGattBehaviorSubject =
            BehaviorSubject.<BluetoothGatt>create().toSerialized();

And it did help since I now see all the ACK when onCharacteristicWrite is being notified. However, it still hangs at some point:

02-13 15:46:54.609 16800-18734 D/FlashFirmwareStateMachine: Writing [AA06063A000000003C4D23343232C223304B802923304B10105CCBAB51EE8C05A6CDD6A0531246B6977D4B49808AE5A6F85A6016D4BFD1DF18A791D803E82330D765D789540F3DCA0B942F3A203D07D16540C306BE7C9C4783F1C3111D7DC9E6516873A4C71C6C876836B1A7E523341F9BEC86DFF644E407AE1963E37AAEF666C6E1DC87AF1C5381F17D0E8E2D9F4382CB1792C3077482233197E21EC36416E53281696CB520F25675F749AD7451326E919660F183C98A250B7E1F1A6D43F81D769481559EDC8FF87480918BFA316A42B969DE118FF1928AF88F7230571E8DE2626D233587A67660BE6EC45049B07BC9CEBC975E30518C8456137AA9B1372779BC2D5BCCBB89B5E3C6D07BBE743FD39C12F33EBB] to device.
02-13 15:46:54.612 16800-18734 D/FlashFirmwareStateMachine: Emitting event [1 / 697 (0.14%)].
02-13 15:46:54.863 16800-18768 D/RxBle#Radio:   QUEUED RxBleRadioOperationCharacteristicLongWrite(86038308)
02-13 15:46:54.864 16800-18631 D/RxBle#Radio:  STARTED RxBleRadioOperationCharacteristicLongWrite(86038308)
02-13 15:46:54.868 16800-16800 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [20 bytes] chunk to characteristic with success [true].
02-13 15:46:54.870 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.871 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite Entering subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.872 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite Exiting subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.874 16800-18736 D/RxBle#RadioOperationCharacteristicLongWrite: MATCHING_CHAR: Checking if matching UUID [b4520101-a308-4e56-8a52-536c2ad07147].
02-13 15:46:54.876 16800-18736 D/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
02-13 15:46:54.879 16800-16800 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [20 bytes] chunk to characteristic with success [true].
02-13 15:46:54.880 16800-16815 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.881 16800-16815 D/RxBle#BluetoothGatt: onCharacteristicWrite Entering subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.883 16800-16815 D/RxBle#BluetoothGatt: onCharacteristicWrite Exiting subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.885 16800-18768 D/RxBle#RadioOperationCharacteristicLongWrite: MATCHING_CHAR: Checking if matching UUID [b4520101-a308-4e56-8a52-536c2ad07147].
02-13 15:46:54.886 16800-18768 D/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
02-13 15:46:54.890 16800-16800 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [20 bytes] chunk to characteristic with success [true].
02-13 15:46:54.892 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.893 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite Entering subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.894 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite Exiting subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.897 16800-18736 D/RxBle#RadioOperationCharacteristicLongWrite: MATCHING_CHAR: Checking if matching UUID [b4520101-a308-4e56-8a52-536c2ad07147].
02-13 15:46:54.898 16800-18736 D/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
02-13 15:46:54.903 16800-16800 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [20 bytes] chunk to characteristic with success [true].
02-13 15:46:54.906 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.907 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite Entering subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.907 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite Exiting subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.910 16800-18768 D/RxBle#RadioOperationCharacteristicLongWrite: MATCHING_CHAR: Checking if matching UUID [b4520101-a308-4e56-8a52-536c2ad07147].
02-13 15:46:54.911 16800-18768 D/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
02-13 15:46:54.923 16800-16800 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [20 bytes] chunk to characteristic with success [true].
02-13 15:46:54.923 16800-16815 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.924 16800-16815 D/RxBle#BluetoothGatt: onCharacteristicWrite Entering subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.925 16800-16815 D/RxBle#BluetoothGatt: onCharacteristicWrite Exiting subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.926 16800-18736 D/RxBle#RadioOperationCharacteristicLongWrite: MATCHING_CHAR: Checking if matching UUID [b4520101-a308-4e56-8a52-536c2ad07147].
02-13 15:46:54.926 16800-18736 D/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
02-13 15:46:54.930 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.931 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite Entering subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.932 16800-16812 D/RxBle#BluetoothGatt: onCharacteristicWrite Exiting subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
02-13 15:46:54.932 16800-16800 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA: Wrote a [20 bytes] chunk to characteristic with success [true].
02-13 15:47:24.932 16800-18631 D/RxBle#Radio: FINISHED RxBleRadioOperationCharacteristicLongWrite(86038308)
02-13 15:47:24.943 16800-18744 D/FlashFirmwareStateMachine: Finishing with error (Received onError without having requested it).

I see the WRITE_DATA: Wrote a [20 bytes] chunk to characteristic with success [true]. but then never received a onCharacteristicWrite callback, weird.

@dariuszseweryn
Copy link
Owner

This issue with no callback from the Android BLE Stack I have heard of - that is why all operations which are expecting a callback have some sort of a timeout. For this you could add a delay between writes and check if that will help.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

I don't think it comes from the Android BLE stack. I think it's an event ordering problem. Let's check the last snippet of log more closely (lightly modified to make fit easier in the screen):

15:46:54.926 18736 ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
15:46:54.930 16812 BluetootGatt: onCharacteristicWrite characteristic=<uuid> status=0
15:46:54.932 16800 LongWriteOperation: WRITE_DATA: Wrote a [20 bytes] chunk to characteristic with success [true].
15:47:24.932 18631 Radio: FINISHED RxBleRadioOperationCharacteristicLongWrite(86038308)

Here, we see the last acknowledgement. Then, before WRITE_DATA has a chance to log, we see the onCharacteristicWrite callback being notified. Then 2ms later, we then see the actual WRITE_DATA log statement.

So, somehow, before the ACK checker has been set up correctly, the onCharacteristicWrite is already being notified. When the ACK checker is then ready to process event, the onCharacteristicWrite event has already been processed and thus, is missed.

Clearly some threading issue, not quite sure what is causing this however.

@dariuszseweryn
Copy link
Owner

Could you confirm by printing out the number of the batch which the log is referring to?
Back in 2014 when I was developing a FW update functionality for a device I thought I had a similar issue but it was a side-channel ACK as far as I remember - Android BT Stack responded correctly in terms of the sequence.
Here the problem may lay in using the doOnSubscribe which is actually happening before the subscription taking place itself. I will look into creating doAfterSubscribe which could help.

Neverthless could you add a simple counter to be sure about which log is for which batch of the transmission?

@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

Here one log with batch sequence number. The INC_BATCH comes before the bluetoothGatt.writeCharacteristic(bluetoothGattCharacteristic) call while the WRITE_DATA comes after it.

The counter was added to RxBleGattCallback as a static AtomicInteger sCounter. It is incremented only between the two INC_BATCH log statements. Rest of logs are using the value currently set.

 08:24:59.604 8263-9045 D/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
 08:24:59.606 8263-8263 D/RxBle#RadioOperationCharacteristicLongWrite: INC_BATCH: Value before incrementing batch number is [22].
 08:24:59.607 8263-8263 D/RxBle#RadioOperationCharacteristicLongWrite: INC_BATCH: Value after incrementing batch number is [23].
 08:24:59.609 8263-8263 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA (23): Wrote a [20 bytes] chunk to characteristic with success [true].
 08:24:59.611 8263-8276 D/RxBle#BluetoothGatt: onCharacteristicWrite (23) characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
 08:24:59.612 8263-8276 D/RxBle#BluetoothGatt: onCharacteristicWrite (23) Entering subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
 08:24:59.612 8263-8276 D/RxBle#BluetoothGatt: onCharacteristicWrite (23) Exiting subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
 08:24:59.613 8263-9055 D/RxBle#RadioOperationCharacteristicLongWrite: MATCHING_CHAR (23): Checking if matching UUID [b4520101-a308-4e56-8a52-536c2ad07147].
 08:24:59.615 8263-9055 D/RxBle#ActionObserver: ACKNOWLEDGE_LONG_CHUNK: Acknowledged!
 08:24:59.616 8263-8263 D/RxBle#RadioOperationCharacteristicLongWrite: INC_BATCH: Value before incrementing batch number is [23].
 08:24:59.617 8263-8263 D/RxBle#RadioOperationCharacteristicLongWrite: INC_BATCH: Value after incrementing batch number is [24].
 08:24:59.621 8263-8275 D/RxBle#BluetoothGatt: onCharacteristicWrite (24) characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
 08:24:59.621 8263-8275 D/RxBle#BluetoothGatt: onCharacteristicWrite (24) Entering subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
 08:24:59.622 8263-8275 D/RxBle#BluetoothGatt: onCharacteristicWrite (24) Exiting subject characteristic=b4520101-a308-4e56-8a52-536c2ad07147 status=0
 08:24:59.623 8263-8263 D/RxBle#RadioOperationCharacteristicLongWrite: WRITE_DATA (24): Wrote a [20 bytes] chunk to characteristic with success [true].

In last batch (i.e 24), we see that counter is incremented, the write characteristic is then called, but the onCharacteristicWrite is notified before the writeData function has finished yet.

I will try the doAfterSubscribe, but I'm not sure it will solve the problem.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

Well in fact, there is no doAfterSubscribe operator, so cannot use it.

@dariuszseweryn
Copy link
Owner

I have created one - straight forward implementation. The potential fixes are now in the process of code review. I hope to merge them into the master before the end of the day (shortly) so you could test it with a snapshot.

dariuszseweryn added a commit that referenced this issue Feb 14, 2017
…hGatt.writeCharacteristic()` in `RxBleRadioOperationCharacteristicLongWrite`. #135

Reviewers: michal.zielinski, pawel.urban

Reviewed By: pawel.urban

Differential Revision: https://phabricator.polidea.com/D2181
dariuszseweryn added a commit that referenced this issue Feb 14, 2017
… a `SerializedSubject`. #135

Reviewers: michal.zielinski, pawel.urban

Reviewed By: pawel.urban

Differential Revision: https://phabricator.polidea.com/D2180
@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

@DariuszAniszewski I checked the subject change and I think you pick the wrong one. The one I think being called by multiple various thread is more likely the bluetoothGattBehaviorSubject which is used in all Gatt callbacks.

Overall, I think all subject in this class should be reviewed as to where they are called by various different thread or not. If it's the case, they would need to be toSerialized also.

@dariuszseweryn
Copy link
Owner

Most of the subjects used in the RxBleGattCallback are not subjects to rapid changes. bluetoothGattBehaviourSubject has no critical observers in terms of sequential delivery of events. Adding .toSerialized() on all subjects could add a non-negligible overhead - that is why it is not added everywhere.

A new snapshot should be available. Could you try it out?

@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

I agree, but as you know, they are not thread-safe. Not having rapid changes does not remove the chance that a race condition or dead clock happens in the subject, less likely if there are not much changes but quite possible.

To my eye, the bluetoothGattBehaviourSubject seems problematic as being called by all callbacks. That means that an onCharacteristicWrite and multiple onCharacteristicChanged happening rapidly in the same time window would call the bluetoothGattBehaviourSubject.onNext from various different thread which is an undefined behavior as of RxJava since subjects are not thread safe.

That's why I think this one (bluetoothGattBehaviourSubject) should be toSerialized to prevent problems when multiple stuff happens are the same time.

I will test your branch and report.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

Nop, it did not solve the issue (I changed a bit log statement since now we pinned point the problem):

 10:40:06.520 30341-30341 D/LongWriteOp: INC_COUNTER: Value just after increment is (10).
 10:40:06.521 30341-30341 D/LongWriteOp: WRITE_DATA: Wrote value with counter (10).
 10:40:06.522 30341-30459 D/BluetoothGatt: onCharacteristicWrite (10) characteristic=<uuid> status=0
 10:40:06.524 30341-30341 D/LongWriteOp: INC_COUNTER: Value just after increment is (11).
 10:40:06.526 30341-30459 D/BluetoothGatt: onCharacteristicWrite (11) characteristic=<uuid> status=0
 10:40:06.526 30341-30341 D/LongWriteOp: WRITE_DATA: Wrote value with counter (11).

@dariuszseweryn
Copy link
Owner

So it seems that we should somehow synchronise the subscription to the onCharacteristicWrite with writeCharacteristic. Only hackish solutions come to my mind at the moment.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

Yeah I know, I thought about using some replay or caching logic, so that the "notification" is held until it can be used, but they would probably prove problematic in some cases.

To work, we would need to cache event happening between just after the subscribe until they are processed ensuring no caching occurs outside the boundaries to avoid false positive.

I tried a bit yesterday night but was not able to do so. I would need to dive deeper to understand at which moment stuff are being done.

@dariuszseweryn
Copy link
Owner

I am not in the office anymore. What if you could synchronise this method on rxBleGattCallback:

    private void writeData(byte[] bytesBatch) {
        bluetoothGattCharacteristic.setValue(bytesBatch);
        final boolean success = bluetoothGatt.writeCharacteristic(bluetoothGattCharacteristic);
        if (!success) {
            throw new BleGattCannotStartException(bluetoothGatt, BleGattOperationType.CHARACTERISTIC_LONG_WRITE);
        }
    }

and in the RxBleGattCallback add a synchronized block in the onCharacteristicWrite() callback?

@dariuszseweryn
Copy link
Owner

The synchronized block would be needed to used on RxBleGattCallback.this ofcourse.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

That would probably work indeed, will test that later today and report.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 14, 2017

It seemed it help as through various invocations, I never hit the event being out of order and execution goes farther (batch 109 was my best). However, it still hang at some point, seems we are chasing multiple problems here :)

Log:

13:44:13.191 14397-14397D/LongWriteOp: INC_COUNTER: Value just after increment is (82).
13:44:13.192 14397-14397D/LongWriteOp: WRITE_DATA: Wrote value with counter (82).
13:44:13.193 14397-14409D/BluetoothGatt: onCharacteristicWrite (82) characteristic=<uuid> status=0
13:44:13.194 14397-15081D/LongWriteOp: ACK_CHECK: Checking for characteristic (82).
13:44:13.195 14397-15081D/LongWriteOp: ACK: Acknowledged (82)!
13:44:13.196 14397-14397D/LongWriteOp: INC_COUNTER: Value just after increment is (83).
13:44:13.199 14397-14397D/LongWriteOp: WRITE_DATA: Wrote value with counter (83).
13:44:13.199 14397-14414D/BluetoothGatt: onCharacteristicWrite (83) characteristic=<uuid> status=0

For batch 83, even if the onCharacteristicWrite happens after the write data correctly, it seems the ACK_CHECK is not invoked as expected.

I will need to dive deeper into Rx scheduler to understand what happen exactly. If you have ideas or want to try stuff, will be happy to check them out.

@dariuszseweryn
Copy link
Owner

I think it may still be the same issue. Adding the sychronisation only made the window smaller than before. Ehhh... As a friend of mine said: You have to be this tall [points high in the sky] to write threadsafe code
I will probably have to create some kind of a wrapper and old-fashioned callbacks for writing characteristic with no response setting.
Btw. if you are able to write with response you probably will not face the problem.

@dariuszseweryn
Copy link
Owner

An idea just popped inside my head when already in bed:
You could change:

        rxBleGattCallback.getOnCharacteristicWrite()
                .lift(new OperatorDoAfterSubscribe<ByteAssociation<UUID>>(writeNextBatch(batchSize, byteBuffer)))

to something like (exchange lambdas for anonymous classes ofc and pardon pseudocode):

Observable.fromEmitter(emitter -> {
    Subscription s = rxBleGattCallback.getOnCharacteristicWrite().subscribe(emitter);
    emitter.setCancellation(() -> { s.unsubscribe() });
    writeNextBatch();
})

This could help in always subscribing to the notification before writing the batch of data.

dariuszseweryn added a commit that referenced this issue Feb 15, 2017
…ng called even before `BluetoothGatt.writeCharacteristic()` returns.

Summary: #135

Reviewers: pawel.urban

Reviewed By: pawel.urban

Differential Revision: https://phabricator.polidea.com/D2184
@dariuszseweryn
Copy link
Owner

@maoueh I have pushed a potential fix for the race condition. You could test it once you will have a moment.

@maoueh
Copy link
Contributor Author

maoueh commented Feb 15, 2017

@dariuszseweryn Yup, that fixed the issue, good job!

I also tested by removing our "old" fixes like the synchronized block and the serialized subject and they seems to be not required (I still think bluetoothGattBehaviourSubject should be made SerializedSubject, but that's another topic :)).

So, the code as in the repository right now seem to work correctly for my use case.

Out of scope, but two questions while I have you.

Would you guys accept a PR that would enable the library to execute radio operation out of the main thread? We have a fork where we execute off main thread for most phone (we kept main thread only when manufacturer is Samsung and Android version is 4.3)?

Would you guys accept a PR that would enable to execute my own radio operation? The idea here would be to avoid going on & off of the RxRadio queue. The PR could look like:

RxBleConnection {
  <T> Observable<T> queue(Func3<Scheduler, BluetoothGatt, RxBleGattCallback, RxBleRadioOperation<T>> builder);
}

Of course, this a JavaDoc would explain that usage of the method should be made upon great care.

Anyway, all in all, thanks for the great support!

@maoueh maoueh closed this as completed Feb 15, 2017
@dariuszseweryn
Copy link
Owner

Glad that one solution finally helped. I will revert all unneeded changes tomorrow.

As for the main thread versus background - I am a bit afraid that there may be more devices that are more stable while using the main thread than we know of. In this situation we would need to keep a list of devices that need a specific solution - which could be a burden.

About the custom operation - I see no particular reason to do it like this. I was thinking about just exposing the RxBleGattCallback and BluetoothGatt for direct usage for optimisation purposes.
Could you elaborate more on your use-case? We would then also have material for internal discussions.

@dariuszseweryn
Copy link
Owner

I have added some optimisations to the long write operations. You can check it out. Should use a lot less allocations.

Best Regards

@maoueh
Copy link
Contributor Author

maoueh commented Feb 16, 2017

@dariuszseweryn Ok good, I check the changes, don't see a reason for them not to work. But I will test the latest branch nonetheless.

As for the two questions, my use case is simply to optimize the flash firmware operation. With all optimizations in (scheduler off main thread, single operation in the queue), I was able to go from 140s in the worst case (heavy-lifted screen, balanced priority connection) to only 39s in my last try (high priority connection, scheduler off main thread, one operation in the queue).

I will open two issues to keep discussion at the right place for them. I will provide my answer there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is caused by the library
Projects
None yet
Development

No branches or pull requests

3 participants