Improvements for async device management and exclusive access to device #376

frobijn · 2024-09-17T15:30:14Z

Description

One feature of the proposed test platform v3 should be in this library:

System-wide exclusive access to a device via system-wide mutex, required to avoid competition between e.g.,Device Explorer auto-discovery, test platform etc. If this is used, it is possible for the nanoFramework tools to wait for a device to become available instead of throwing an exception. If an exception is thrown, that is an indication of a real problem, not a competition with other nanoFramework tools. Not sure if that also means the WaitAndRetry in the PortSerialManager should be removed.
Use of System-wide exclusive access in the PortSerialManager+DeviceWatcher; async part of that operation moved from PortSerialManager to DeviceWatcher. Also updated PortTcpIpManager+DeviceWatcher.

The other changes have been triggered by the use of the debugger library in test platform v3:

Local locks for global NanoFrameworkDevices collection because of discovery of devices is asynchronous (and because of the use of multiple async PortManagers in the test platform v3):
- Make sure all list operations use a lock of the collection
- PortTcpIpManager._networkDevices made static as it acts as a complement to the NanoFrameworkDevices collection - if there are multiple, the device list management does not work properly for a non-static _networkDevices.
Serial ports: DeviceWatcher now has a static method to find all serial ports, incl exclusion list, so that all nanoFramework tools can use the same method to find the ports. Also to be used in the test platform v3.
Port*Manager returns the added or already registered device for PostBase.AddDevice. Makes use in test platform code easier.
Fixed OnDeviceEnumerationCompleted logic if both AddDevice and a device watcher are used. The OnDeviceEnumerationCompleted is now (indirectly) fired by the DeviceWatcher.
Also fixed at least one statement that would result in an exception if NanoNetworkDevices have been discovered before NanoSerialDevices.
.editorconfig added

Motivation and Context

These changes are a result of the work for test platform v3: improvements and one extra feature that should be in this library rather than in test platform v3. But the changes are probably also relevant for other applications, so they are presented in a separate PR.

In the proposed test platform v3 I've introduced a mechanism that nanoFramework tools can use to get exclusive access to a device. The tool waits for the device to become available (optionally with timeout) rather than throwing an exception. If an exception is thrown, either a non-nanoFramework application is using the device or it is not a nanoFramework tool. This mechanism is needed in the test platform to prevent issues if the developer starts multiple test hosts. But is also prevents interference from the Device Explorer in Visual Studio, if the VS extension would use the same mechanism.

That mechanism should not be part of the test platform, but should be in this library. I'm prepared to ensure that the mechanism is working correctly in nanoff (and of course in the test platform v2 and v3). It should work out of the box for the VS Extension once the submodule reference to the nf-debugger repository has been updated, and would prevent device detection issues if two Visual Studio instances are open and try to discover devices at the same time . That requires this library and NuGet package to be updated first.

The other changes are related to the (ease of) use of the library in the test platform. The library seems to be created for device watcher-type applications, and some code for modifications of the global list of devices are not properly protected against simultaneous updates. Tried to fix that. Also: an application like test platform may create a debugger for a specific device that has not previously been discovered via a device watcher, but the application has to get the result from the global list anyway instead of having the device as a return value of the call.

I've noticed that there is some specialized code to prevent a connection to certain serial devices. Made that code available via a static method, so an application like test platform v3 can use exactly the same code to know which serial ports are present.

How Has This Been Tested?

Using the Serial Test App, at least for the serial devices. Don't know how to connect a device via TCP/IP (is that supposed to work?). The global exclusive access was tested by running one Serial Test App in the debugger and one outside Visual Studio.

Types of changes

Improvement (non-breaking change that improves a feature, code or algorithm)
Bug fix (non-breaking change which fixes an issue with code or algorithm)
New feature (non-breaking change which adds functionality to code)
Breaking change (fix or feature that would cause existing functionality to change)
Config and build (change in the configuration and build system, has no impact on code or features)
Dependencies (update dependencies and changes associated, has no impact on code or features)
Unit Tests (add new Unit Test(s) or improved existing one(s), has no impact on code or features)
Documentation (changes or updates in the documentation, has no impact on code or features)

Checklist:

My code follows the code style of this project (only if there are changes in source code).
My changes require an update to the documentation (there are changes that require the docs website to be updated).
I have updated the documentation accordingly (the changes require an update on the docs in this repo).
I have read the CONTRIBUTING document.
I have tested everything locally and all new and existing tests passed (only if there are changes in source code).
I have added new tests to cover my changes.

…covery of devices is asynchronous (and because of the use of multiple async PortManagers in the test platform v3): - Make sure all list operations use a lock of the collection - PortTcpIpManager._networkDevices made static as it acts as a complement to the NanoFrameworkDevices collection - if there are multiple, the device list management does not work properly for a non-static _networkDevices. System-wide exclusive access to a device via system-wide mutex, required to avoid competition between e.g.,Device Explorer auto-discovery, test platform etc. If this is used, it is possible for the nanoFramework tools to wait for a device to become available instead of throwing an exception. If an exception is thrown, that is an indication of a real problem, not a competition with other nanoFramework tools. Not sure if that also means the WaitAndRetry in the PortSerialManager should be removed. Use of System-wide exclusive access in the serial DeviceWatcher; async part of that operation moved from PortSerialManager to DeviceWatcher. Also updated PortTcpIpManager+DeviceWatcher. Serial ports: DeviceWatcher now has a static method to find all serial ports, incl exclusion list, so that all nanoFramework devices use the same method. Also to be used in the test platform v3. Port**Manager returns the added or already registered device for PostBase.AddDevice. Makes use in test platform code easier. Fixed OnDeviceEnumerationCompleted logic if both AddDevice and a device watcher are used. The OnDeviceEnumerationCompleted is now (indirectly) fired by the DeviceWatcher. Also fixed at least one statement that would result in an exception if NanoNetworkDevices have been discovered before NanoSerialDevices. Signed-off-by: Frank Robijn <robijn@good-heavens.nl>

coderabbitai · 2024-09-17T15:30:21Z

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (15)

.editorconfig is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Net/PortDefinitions/PortBase.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/DeviceConfiguration/NanoFrameworkDevices.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/NFDevice/GlobalExclusiveDeviceAccess.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/PortComposite/PortCompositeDeviceManager.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/PortDefinitions/IPort.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/PortDefinitions/PortBase.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/PortSerial/DeviceWatcher.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/PortSerial/PortSerialManager.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/DeviceWatcher.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/PortTcpIp.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/PortTcpIpManager.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/WireProtocol/Engine.cs is excluded by none and included by none
nanoFramework.Tools.DebugLibrary.Shared/nanoFramework.Tools.DebugLibrary.Net.projitems is excluded by none and included by none
spelling_exclusion.dic is excluded by none and included by none

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…ecause it was not read-only. SonarCloud may well be right that the way NanoFrameworkDevices is used in this library is questionable, but changing is more work. As it is now (readonly object instance), NanoFrameworkDevices is a valid lock-target as the instance never changes. Also added an exception handling in GlobalExclusiveDeviceAccess in case of killed processes - change was not committed previously.

- Before the changes in this PR, the PortSerialManager.OnDeviceEnumerationCompleted event was postponed if a new device was detected before the addition of previously detected devices was completed. In the modified version that was no longer the case. Behaviour is restored in this version. - The DeviceWatcher.GetPortNames was changed to a static method, as it apparently is not used by any code in nanoFramework. As it is a public library, this may break the code of someone else. Original method reinstated and new static method made internal. The public version of the method is made part of the PortSerialManager, as external code is most likely to interact with the PortSerialManager than with DeviceWatcher. Addition: - It is allowed to change the PortExclusionList after the DeviceWatcher has been started. Made sure to add locks in the right places.

josesimoes

Frank I can see where you're going and I see value in this global lock approach.
Few comments to address.

Also, there seems to be something wrong the rescan devices call execution. Not only it takes a serious ammount of time, but it never flags the completion to the caller...
I have these COM ports on my machine (2 of the COM ports are nanoDevices)

Last comment: the algorithm is trying to reach the serial Bluetooth devices (when it shouldn't).

nanoFramework.Tools.DebugLibrary.Shared/PortComposite/PortCompositeDeviceManager.cs

nanoFramework.Tools.DebugLibrary.Shared/PortSerial/DeviceWatcher.cs

nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/PortTcpIpManager.cs

frobijn · 2024-09-24T18:20:55Z

Also, there seems to be something wrong the rescan devices call execution. Not only it takes a serious amount of time,

O yes, a lot of time. If you reserve a COM port for a virtual device and don't have that device running, it also takes ages before the enumeration completes. Ive checked in the debugger that this has nothing to do with the locking mechanism. I suspect it is the WaitAndRetry in PortSerialManager.OnDeviceAdded. The total wait time alone is about 10 seconds for each port that is not connected to a nanoDevice, plus ten times the amount of time the underlying code needs to determine there is no nanoDevice connected to the port.

I've several of those reserved-for-virtual-device-ports, and I've noticed that although the PortSerialManager.OnDeviceAdded method is called at the same time for all ports (as the method is called async and in parallel for all ports), the method never completes at the same time. It completes for one port at a time. The time between successive completions of the method in case of ports that are not connected to a device is a few seconds. That is very similar to the last wait period of the WaitAndRetry (= 2500ms). I suspect that somewhere in the code that connects to the device there is a lock and the connections to the various devices are not tried in parallel, but are forced to be sequential.

If this is correct, then in your case, with 4 non-nanoDevice ports, the total time before completion is about 10 + (6-2-1)*2.5 + (a bit for the 2 connected devices)= 20 seconds.

I don't know why the WaitAndRetry is needed, for nanodevices only one try (no retry) seems to be sufficient. I also didn't investigate the long return times. Instead I aim to have a configuration where you can exclude all special ports (in addition to the exclusion filter in the library).

but it never flags the completion to the caller...

I cannot reproduce that. The only way that can happen is if somewhere deep in the library there is code that tries to connect to a device via a port that never returns. But then that should also happen in the current code (if you would remove the "rogue port" filter).

I have these COM ports on my machine (2 of the COM ports are nanoDevices)
Last comment: the algorithm is trying to reach the serial Bluetooth devices (when it shouldn't).

I think that's solved in the next commit. Please check again (I don't have BT COM ports, can't test that.)

frobijn · 2024-09-25T05:28:51Z

I've investigated the timing issue a bit further. No devices connected, just 6 COM ports that are unresponsive. The code that makes all code execute sequentially is the first operation that uses the wire protocol to query the nanoDevice. That fails of course, but only after some timeout.

What causes all calls to be sequential is that communication via the wire protocol is protected via a static lock: Engine._syncReqLock. So the device watcher launches parallel threads for the six ports, and then they wait in turn to be able to send a message, each to a different port. There is no need for such a system-wide lock, nothing in the DebugEngine is static.

This is the only lock: if the lock is static, scanning the six ports takes 26 seconds. If the lock is not static but per Engine instance, scanning the ports takes about 10 seconds (= total wait time for the wait and retry loop).

I presume that the fact that the lock is static will also hurt the test framework if you run tests on multiple real hardware devices, as the communication between the test host and devices for deployment and to get the results is also done via the wire protocol.

It seems easy to get rid of the static lock. To make it non-static may be a bit risky: if there is code out there that creates two Engines for the same port, then that code may fail after the change. That is not the case if _syncReqLock is a dictionary with the NanoDevice.DeviceId as key and the SemaphoreSlim as value.

I'll see whether I can implement the dictionary later today; I'll add the commit to this PR.

…m-wide lock.

Ellerbach

Couple of comments, all up, logic seems good to me.

nanoFramework.Tools.DebugLibrary.Shared/NFDevice/GlobalExclusiveDeviceAccess.cs

nanoFramework.Tools.DebugLibrary.Shared/PortComposite/PortCompositeDeviceManager.cs

nanoFramework.Tools.DebugLibrary.Shared/PortSerial/PortSerialManager.cs

nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/PortTcpIpManager.cs

nanoFramework.Tools.DebugLibrary.Shared/PortComposite/PortCompositeDeviceManager.cs

nanoFramework.Tools.DebugLibrary.Shared/PortSerial/DeviceWatcher.cs

josesimoes · 2024-09-25T11:02:08Z

The rescan issue is coming from this call:

For some reason the call to Open() blocks, never returns and doesn't throw any exception whatsoever. 😯
This is happening with a faulty device I have here.

Maybe we should add a timeout on the serial port workflow so a faulty device can't block/hinder the discovery process.

josesimoes · 2024-09-25T11:07:22Z

@frobijn discovery and rescan working much smoother now! This is making very good progress. 👍🏻

frobijn · 2024-09-25T12:06:23Z

The rescan issue is coming from this call:
For some reason the call to Open() blocks, never returns and doesn't throw any exception whatsoever. 😯 This is happening with a faulty device I have here.
Maybe we should add a timeout on the serial port workflow so a faulty device can't block/hinder the discovery process.

I didn't analyse all the serial communication code, there's a lot and I'm not very familiar with that.

frobijn · 2024-09-25T12:30:22Z

@josesimoes Your minor changes caused a merge conflict that VS2022 could not handle "because of ... codepage". Automerge resulted in an empty file. I hope that I've got all your changes in the new commit.

Kept the static GetPortNames method, see above.

frobijn · 2024-09-25T12:33:54Z

Maybe we should add a timeout on the serial port workflow so a faulty device can't block/hinder the discovery process.

The result of a "hanging" device is that the "enumeration complete" event is never fired. The discovery process continues. As external software (like the Device Explorer) also listens to the "device added" event, the effect may be minimal.

josesimoes · 2024-09-25T13:00:13Z

@josesimoes Your minor changes caused a merge conflict that VS2022 could not handle "because of ... codepage". Automerge resulted in an empty file. I hope that I've got all your changes in the new commit.

Kept the static GetPortNames method, see above.

Apologies for getting in the way...

josesimoes · 2024-09-25T13:02:13Z

Maybe we should add a timeout on the serial port workflow so a faulty device can't block/hinder the discovery process.

The result of a "hanging" device is that the "enumeration complete" event is never fired. The discovery process continues. As external software (like the Device Explorer) also listens to the "device added" event, the effect may be minimal.

Not exactly. There are other callers (like VS device managed) that disable the rescan button when a scan operation is running. If that never flags is as being completed. It will never enable back the button. That happens every now and then. Only remedy is to close and restart VS to release the resources. Let's not worry about this for now.

frobijn · 2024-09-26T12:19:27Z

Ok. Then I think that I have addressed all issues/suggestions raised so far.

josesimoes · 2024-09-27T02:46:36Z

@frobijn on last improvement that ensures the devices are really closed (therefore made available) when removed from the device list. This resolves a situation on device rescan when a device was connected it wasn't possible to resume communication with it, until restarting the application.

Please take a look. If your OK, lets have this merged!

josesimoes

All good!
Great job improving this. 💯 👏🏻

frobijn · 2024-09-27T08:35:12Z

Please take a look. If your OK, lets have this merged!

No comments, looks good. Let's merge!

nfbot · 2024-09-27T13:46:32Z

@frobijn thank you again for your contribution! 🙏😄

.NET nanoFramework is all about community involvement, and no contribution is too small.
We would like to invite you to join the project's Contributors list.

Please edit it and add an entry with your GitHub username in the appropriate location (names are sorted alphabetically):

  <tr>
    <td><img src="https://github.com/frobijn.png?size=50" height="50" width="50" ></td>
    <td><a href="https://github.com/frobijn">Frank Robijn</a></td>
  </tr>

(Feel free to adjust your name if it's not correct)

…ce (#376)

Frank Robijn added 2 commits September 17, 2024 10:17

.editorconfig

2432653

nfbot changed the title ~~Improvements for async device list managements + exclusive access to devices.~~ Improvements for async device list managements + exclusive access to devices Sep 17, 2024

nfbot added Type: Bug Type: enhancement labels Sep 17, 2024

Frank Robijn added 2 commits September 17, 2024 22:24

Might have misread the locking issue. Fixed this one.

7e7e683

josesimoes self-requested a review September 19, 2024 01:03

josesimoes added the Status: Under review label Sep 19, 2024

josesimoes changed the title ~~Improvements for async device list managements + exclusive access to devices~~ Improvements for async device management and exclusive access to device Sep 19, 2024

josesimoes reviewed Sep 24, 2024

View reviewed changes

Changes as a result of review comments

500f6bb

Use per-port lock for wire protocol communication rather than a syste…

3faf575

…m-wide lock.

Ellerbach reviewed Sep 25, 2024

View reviewed changes

josesimoes reviewed Sep 25, 2024

View reviewed changes

nanoFramework.Tools.DebugLibrary.Shared/PortComposite/PortCompositeDeviceManager.cs Outdated Show resolved Hide resolved

josesimoes reviewed Sep 25, 2024

View reviewed changes

nanoFramework.Tools.DebugLibrary.Shared/PortSerial/DeviceWatcher.cs Outdated Show resolved Hide resolved

Minor fixes and improvements in comments

5bf0262

Frank Robijn added 3 commits September 25, 2024 14:07

Next round of comments

839145b

Merged changes (auto-merge failed because of a wrong codepage???)

bc708a7

Hope everything is merged

75560bb

GetPortNames made static

1df5b19

josesimoes removed the Status: Under review label Sep 26, 2024

Port in now closed on device removal

dc2fe16

josesimoes approved these changes Sep 27, 2024

View reviewed changes

josesimoes added the CI: Update Dependents label Sep 27, 2024

josesimoes merged commit c5b0e01 into nanoframework:main Sep 27, 2024

frobijn deleted the GlobalLock_Parallel branch September 27, 2024 15:05

frobijn mentioned this pull request Sep 30, 2024

Add new overloads for GlobalExclusiveDeviceAccess.CommunicateWithDevice #377

Merged

14 tasks

josesimoes pushed a commit that referenced this pull request Nov 8, 2024

Improvements for async device management and exclusive access to devi…

6d8904a

…ce (#376)

Uh oh!

Improvements for async device management and exclusive access to device #376

Improvements for async device management and exclusive access to device #376

Uh oh!

Conversation

frobijn commented Sep 17, 2024

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

coderabbitai bot commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

josesimoes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frobijn commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frobijn commented Sep 25, 2024

Uh oh!

Ellerbach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josesimoes commented Sep 25, 2024

Uh oh!

josesimoes commented Sep 25, 2024

Uh oh!

frobijn commented Sep 25, 2024

Uh oh!

frobijn commented Sep 25, 2024

Uh oh!

frobijn commented Sep 25, 2024

Uh oh!

josesimoes commented Sep 25, 2024

Uh oh!

josesimoes commented Sep 25, 2024

Uh oh!

frobijn commented Sep 26, 2024

Uh oh!

josesimoes commented Sep 27, 2024

Uh oh!

josesimoes left a comment

Choose a reason for hiding this comment

Uh oh!

frobijn commented Sep 27, 2024

Uh oh!

nfbot commented Sep 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

coderabbitai bot commented Sep 17, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

frobijn commented Sep 24, 2024 •

edited

Loading