Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New connections can't be made after some time. #255

Closed
LeoBortoni opened this issue Jul 14, 2020 · 9 comments
Closed

New connections can't be made after some time. #255

LeoBortoni opened this issue Jul 14, 2020 · 9 comments
Assignees
Labels
Backend: pythonnet Issues or PRs relating to the .NET/pythonnet backend bug Something isn't working
Milestone

Comments

@LeoBortoni
Copy link

  • bleak version: 0.7.0
  • Python version: 3.7.7
  • Operating System: Windows 10

Description

So this is what I am trying to do since 2 months ago: My goal is to create application which reads some number N of sensors at the same time via bluetooth, collect its data and publish it on my data bank. It is a simple thing: I created a app which load the list of address of the sensors, and split it on another sublists of simultaneos_connections_list, so then for each sensor in the group of simultaneos_connections_list, I create a thread or async task to read this sensor (Yes, i created a wrapper of the bleak lib that allows me to use threads). This thread, or async task, do this: Create a Bleak client, search and try to connect with the sensor, get its data and publish it on the data bank. After this, is time for the other group of simultaneos_connections create its thread and perform the exact same task, but for others sensor's address. Finished all the groups, the while (True) loop restarts and the applications execute the creation of threads again. Every thing works pretty well for some time (both the async and thread version). The problem occours after 3 or 4 hours running the application. After this time, the bleak lib just cant connect with any other sensor. The bleak finds the bluetooth address and try to connect with the sensor, but it aways stops on the "Get Services..." log and than raise a empty exception, which I assume to be a timeout exception. The number of simultaneos connections N that I have tested are 1,2 and 3. I have tested many differents approaches and i will list it bellow, togehter with my assumptions of why this is happening. For each thread, i try to connect with the sensor 20 times, and each time I call the connect method from the bleak client like that: client.connect(timeout = 3). After try the connection 20 times, the thread throws a exception "Could not find the address".

Note > For each measurement on my sensors, i create a thread, a bleak client on this thread, try to connect and read it, publish the result on the data bank and then I delet this client.

What is more weird, is that when this application can't connect with the sensor any more, even if i finalize the app and restart it, itself will not work anymore! It can't connect with the sensor, just like if it had been running for 4 hours and then stoped. Even if I restart the bluetooth module on the windows system, it does not connect with any sensor. I do have one test.py program which just connect with the sensor, reads its data and print on the screen, just to test the bleak library, and even this program dosen't work anymore, can't connect. The only thing that allows me to connect with my sensors again is to restart the computer. Then, i am able to run this test.py or the main application with threads and it works.

One thing to note > Despite after stops to connect with my sensors and cant connect anymore with any sensor unless i restart the computer, I CAN connect with the sensors devices using the windows bluetooth interface to connect with devices. Just the bleak module dosen't work anymore.

One thing interesting, that has led me to some direction of what is happening, is as follow: Once, I left the app with threads running and after 3 hours it stoped to connect with the sensors, ok. nothing new, but i still left it there, running, just to see if after some hours it would connect again. Then, after 2 more hours, I got a memory error from python: Process is terminated due stackoverflow exception. And the application was forced close. And the error box which appeared to me said: 'A new guard page for the stack cannot be created'. That is pretty weird since i am creating the client in a thread, and deleting the client after the try to connect or the successfully connection. The pictures of the errors are atached.

error_memory
still_error_memory

So, let start to describe what i have tryied.

What I Did

The first thing I tryied was to for each cycle of the while (True), execute a subprocess to run a powershell script to turn on and off the bluetooth windows module. Dosen't work.

Then, I tryied to execute some thing more severe. On adm mode, I tryied to execute a program to force restart of all the bluetooth services from windows, for each run of the while (True). Did not work.

The I tryied to track the memory usage of the application, for a situation when the application was left running for 3 hours and then stoped to connect. Atached is the graphic of the memory usage. I have made the calculations, that highest pick of memory use, is the exacly moment when the application can't connect anymore. That picke there is the last sample collected, after that the bleak cant connect.
log_untill_break

Then I tryied to force the garbage collector after each while (True) with: import gc, and gc.collect(). Did not work.

I also have tracked the highest memory usage of my application, when it stops to connect with the sensors after 3 hours, atached is the track.

memory_track

And this result really make sense, because when i try to just collect data from one sensor, i mean, left the whole application running just to collect data from one sensor device (one thread) and then publishind this data in the databank, the application runs for 10 hours and than stops to connect. This is interesting because perhaps some stack at the discovery.py is overloading the bluetooth driver of the OS, and it just get back to normality after restart the machine.

Now prepare your self for the gran finalle.

Ok, i thought, I could try to force the garbage collector to work in another way. I could try to create a process instead of thread with subprocess library. And that i did. some simple stuff. And then i left it running. For each measurement, i create a process, inside this process i create a bleak client, do my reads and publish it on my data bank. IT WORKS. Right now is completing 24 hours of running the application with 3 simoultaneos connections (This mean that in my main process, i create 3 threads per time, on each thread i create a process).

But this is not a good solution, since i am creating a lot of process and this is too much costly. Besides, i need to use the async version of this application, without threads. The use of threads was just to try to find a solution.

So, i think this is everything i can remember right now. Please, let me know if you guys already had tome thing like that.

@hbldh
Copy link
Owner

hbldh commented Jul 20, 2020

Very good analysis of the problem. It is now proven that #133 is real and might need solving.
When creating a process and closing it, all .NET objects are released by the python when it ends so there it should work without problem, that is your only option currently, at least in version 0.7.1 and lower.

I would be much helped by you installing the develop branch of bleak and running your long-running code now, since I made some changes in cbc6069. I have some more ideas for improving this, but this is a first attempt.

@hbldh hbldh self-assigned this Jul 20, 2020
@hbldh hbldh added Backend: pythonnet Issues or PRs relating to the .NET/pythonnet backend bug Something isn't working labels Jul 20, 2020
@hbldh hbldh added this to the Version 0.8.0 milestone Jul 20, 2020
hbldh added a commit that referenced this issue Jul 20, 2020
@LeoBortoni
Copy link
Author

I did as you have asked, and repeated the long running experiment with the develop branch, version 0.7.2a3. Still with the same behaviour. I tested calling and not calling the garbage collector after the while (True) of the application. It is the same. Here is a picture of the test with 0.7.2a3 calling the garbage collector. When the time is approximately 4200 s is when my data bank relates the last sample collected via bluetooth. After this it stops completely and further my machine stoped to work and crashed. I got a black screen with a horizontal bar blinking, and i had to force shutdown.
bleak7 2_long

@hbldh
Copy link
Owner

hbldh commented Jul 29, 2020

@LeoBortoni Could you please send me the code you run, along with the memory measurement solution that you have and I will try to make sense of this problem in some way? If you don't want to publish it here then send it via mail to me.

@LeoBortoni
Copy link
Author

bleak_approachs.zip
There are both the codes, the one which works (creating multiple process with threads), and the other which dosent work.
On the bleak_which_works.py you will need to change the arguments of the line 48 to the arguments that you use to execute a python programm. Try to let the bleak_which_dosent_work.py running for one day, i am curious if your machine will crash and got a black screen as has happened with mine. To generate the memory graphs i am using the application depicted here: https://medium.com/zendesk-engineering/hunting-for-memory-leaks-in-python-applications-6824d0518774

@hbldh
Copy link
Owner

hbldh commented Aug 9, 2020

@LeoBortoni Thank you for this. I will run this during this week and get back to you.

I have been experimenting extensively with the pythonnet package and the Windows UWP Bluetooth LE API these past two weeks. I have gotten approximatey nowhere, but I think the problem is due to that no Event handler degreistration can be done currently. All Events can have handlers added (with +=, in which case the handler method has its counter incremented) but when calling removal of the handler (with -=) they all respond with an error that the method canno be found. I think it has somethig to do with pythonnet and that UWP is not directly supported by it. Handler are left with their counters incremented and these are not decremented.

These links are related:

I want to do something like this, but it is apparently impossible with the UWP/WinRT solutions.

This might be solved by implementing #180.

@hbldh
Copy link
Owner

hbldh commented Aug 19, 2020

@LeoBortoni I have ran your code and tried to modify the pythonnet code in different ways to alleviate the problem, but it is still there. I have been unsuccessful in solving the issue directly.

However, I solved a different problem: I implemented a different .NET backend, using the winrt package instead. If you are willing to check out the feature/winrt branch and run your code using that backend instead, I would appreciate that! Import everything as usual, but you have to send handles to read_gatt_char/write_gatt_char/start_notify, because the winrt has a bug handling uuid values right now, and I do not know if it will ever be solved... Handles can be foudn e.g. by running the service_explorer.py example.

In case it solves your memory problem I will merge the winrt backend into the codebase and leave it as an optional Windows backend. It is a bit slower it seems, but if it manages to do gc corectly then it has clear edge...

Run pip install winrt and then run your code with handles.

@hbldh
Copy link
Owner

hbldh commented Aug 20, 2020

This is me running connect, notify and disconnect with the current develop branch for 10 minutes
image

This is me running connect, notify and disconnect with the feature/winrt branch for 10 minutes
image

There are comments about C# not collecting memory unless needed, so the graphs might not actually unavailable memory, e.g. here. That is, the feature/winrt might still work, but the memory is shown as spoken for.

In your example, try to remove everything that is not bleak (influxdb and suchlike). I cannot understand why your plot has such a spiky behaviour at the end. It does drop down at the end, so the program does release memory when reaching the end. I want to know if the crash is really due to bleak or due to other components you use in your code.

I will run longer sessions during the weekend to see if I can crash my system as well. I've got 16 GB of RAM so it might take a while...

@hbldh
Copy link
Owner

hbldh commented Oct 20, 2020

There are multiple improvements in version 0.9.0 regarding the Windows backend. It might be so that this is solved there, if you want to try it out.

@dlech dlech mentioned this issue Aug 5, 2021
@dlech
Copy link
Collaborator

dlech commented Jul 25, 2022

We dropped the pythonnet backend some time ago. If this issue also exists in the winrt backend, let's start a new issue.

@dlech dlech closed this as completed Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend: pythonnet Issues or PRs relating to the .NET/pythonnet backend bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants