Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting stuck in notify after a while #151

Closed
vermasrijan19 opened this issue Nov 11, 2024 · 10 comments · Fixed by #152
Closed

Getting stuck in notify after a while #151

vermasrijan19 opened this issue Nov 11, 2024 · 10 comments · Fixed by #152

Comments

@vermasrijan19
Copy link
Contributor

vermasrijan19 commented Nov 11, 2024

I am using the ble to transmit constant sensor data, after running for a while, there are some Error and then code seemingly gets stuck in in notify loop.

All callbacks are working as expected but the loop in which i am transmitting data is stuck.

In the below logs I manually disconnect after a while of it being stuck in the loop

I (183351) walk_rs::initialize_components: SuccessNotify
I (183360) NimBLE: GATT procedure initiated: notify;
I (183363) NimBLE: att_handle=18

I (183367) walk_rs::initialize_components: ErrorGatt
I (183375) NimBLE: GATT procedure initiated: notify;
I (183378) NimBLE: att_handle=18

I (183382) walk_rs::initialize_components: ErrorGatt
I (183391) NimBLE: GATT procedure initiated: notify;
I (183393) NimBLE: att_handle=18

I (183398) walk_rs::initialize_components: ErrorGatt
I (183406) NimBLE: GATT procedure initiated: notify;
I (183409) NimBLE: att_handle=18

I (183413) walk_rs::initialize_components: ErrorGatt
I (183426) NimBLE: GATT procedure initiated: notify;
I (183426) NimBLE: att_handle=18

I (183429) walk_rs::initialize_components: SuccessNotify
I (183438) NimBLE: GATT procedure initiated: notify;
I (183440) NimBLE: att_handle=18

I (183445) walk_rs::initialize_components: SuccessNotify
I (183453) NimBLE: GATT procedure initiated: notify;
I (183456) NimBLE: att_handle=18

I (183460) walk_rs::initialize_components: SuccessNotify
I (183469) NimBLE: GATT procedure initiated: notify;
I (183472) NimBLE: att_handle=18

I (183476) walk_rs::initialize_components: SuccessNotify
I (183485) NimBLE: GATT procedure initiated: notify;
I (183488) NimBLE: att_handle=18

I (183492) walk_rs::initialize_components: SuccessNotify
I (183501) NimBLE: GATT procedure initiated: notify;
I (183503) NimBLE: att_handle=18

I (183508) walk_rs::initialize_components: SuccessNotify
I (183516) NimBLE: GATT procedure initiated: notify;
I (183519) NimBLE: att_handle=18

I (183524) walk_rs::initialize_components: SuccessNotify
I (183533) NimBLE: GATT procedure initiated: notify;
I (183535) NimBLE: att_handle=18

I (222177) walk_rs::initialize_components: Client disconnected (Err(Remote User Terminated Connection))
@vermasrijan19
Copy link
Contributor Author

After debugging for a while, it seems that it gets stuck in handle_gap_event in ble_characteristic.rs when the following code is executed, there seems to be a n event with ctxt.op as 0.

let mutex = unsafe { voidp_to_ref::<Mutex<Self>>(arg) }; let mut characteristic = mutex.lock();

I have temporarily fixed it for me by adding a check in the start of the function when the value is 0.

@taks
Copy link
Owner

taks commented Nov 18, 2024

Thanks @vermasrijan19

The event with ctxt.op as 0 is "Read characteristic event" ( BLE_GATT_ACCESS_OP_READ_CHR).
Is the connected device reading the characteristic?

@vermasrijan19
Copy link
Contributor Author

This had taken me a while to get back because I wanted to be sure but i wanted to be sure but the code of the app is definitely not reading, it is only subscribed to the notification.

Likely explanation:

I have faced this with 3/4 with Android devices i have tested with.

The event never seems to occur when using the highest-spec android 15 pixel7.

But it can be reproduced on an older Zenfone 6 and poco f1.

I have also noticed that it may be related to the bt buffer on the client device as it seems to occur faster on the lower end devices than the Zenfone.

(I honestly don't know enough about bt or Android to have any clue)

I have put a dirty workaround for now by using rhack and returning 0 if the event occurs but it is far from ideal.

I think using something like try_lock() instead of lock() in the function might be the solution

@taks
Copy link
Owner

taks commented Nov 22, 2024

I changed to use try_lock() in the fix-daed branch.
Could you check this?

@vermasrijan19
Copy link
Contributor Author

Sorry but i haven't had the bandwidth to be able to test this, I'll try and get back to you soon

@vermasrijan19
Copy link
Contributor Author

vermasrijan19 commented Dec 27, 2024

I have not been able to re-produce the issue but i think try_lock was a wrong suggestion by me as it would lead to read/other event failures if the characteristic is busy

@vermasrijan19
Copy link
Contributor Author

okay, so i have found a way to reliably reproduced the issue

I will update soon

@vermasrijan19
Copy link
Contributor Author

You are correct the issue seems to occur when a read/write operation happens during notify, this is not resolved on the fix-deadlock branch, I will try to provide more information

@vermasrijan19
Copy link
Contributor Author

Okay so , it looks like it is my bad, the deadlock was being caused due to a resource being locked before notify, it is not available 🤦.

But i have a couple of things , i would found a couple of things that I can submit PRs for

@vermasrijan19
Copy link
Contributor Author

Okay so follow up, there were more than 1 issues , it seems that if you keep notifying in a loop, without delay it does eventually cause a deadlock. Using try_lock in my loop on the deadlock branch seems to be the solution

@taks taks closed this as completed in #152 Jan 2, 2025
taks added a commit that referenced this issue Feb 12, 2025
Added connection Handle check before gap event is handled in characteristic to fix the deadlock issue in #151
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants