Skip to content

Conversation

@frobijn
Copy link
Contributor

@frobijn frobijn commented Sep 17, 2024

Description

One feature of the proposed test platform v3 should be in this library:

  • System-wide exclusive access to a device via system-wide mutex, required to avoid competition between e.g.,Device Explorer auto-discovery, test platform etc. If this is used, it is possible for the nanoFramework tools to wait for a device to become available instead of throwing an exception. If an exception is thrown, that is an indication of a real problem, not a competition with other nanoFramework tools. Not sure if that also means the WaitAndRetry in the PortSerialManager should be removed.

  • Use of System-wide exclusive access in the PortSerialManager+DeviceWatcher; async part of that operation moved from PortSerialManager to DeviceWatcher. Also updated PortTcpIpManager+DeviceWatcher.

The other changes have been triggered by the use of the debugger library in test platform v3:

  • Local locks for global NanoFrameworkDevices collection because of discovery of devices is asynchronous (and because of the use of multiple async PortManagers in the test platform v3):

    • Make sure all list operations use a lock of the collection
    • PortTcpIpManager._networkDevices made static as it acts as a complement to the NanoFrameworkDevices collection - if there are multiple, the device list management does not work properly for a non-static _networkDevices.
  • Serial ports: DeviceWatcher now has a static method to find all serial ports, incl exclusion list, so that all nanoFramework tools can use the same method to find the ports. Also to be used in the test platform v3.

  • Port*Manager returns the added or already registered device for PostBase.AddDevice. Makes use in test platform code easier.

  • Fixed OnDeviceEnumerationCompleted logic if both AddDevice and a device watcher are used. The OnDeviceEnumerationCompleted is now (indirectly) fired by the DeviceWatcher.

  • Also fixed at least one statement that would result in an exception if NanoNetworkDevices have been discovered before NanoSerialDevices.

  • .editorconfig added

Motivation and Context

These changes are a result of the work for test platform v3: improvements and one extra feature that should be in this library rather than in test platform v3. But the changes are probably also relevant for other applications, so they are presented in a separate PR.

In the proposed test platform v3 I've introduced a mechanism that nanoFramework tools can use to get exclusive access to a device. The tool waits for the device to become available (optionally with timeout) rather than throwing an exception. If an exception is thrown, either a non-nanoFramework application is using the device or it is not a nanoFramework tool. This mechanism is needed in the test platform to prevent issues if the developer starts multiple test hosts. But is also prevents interference from the Device Explorer in Visual Studio, if the VS extension would use the same mechanism.

That mechanism should not be part of the test platform, but should be in this library. I'm prepared to ensure that the mechanism is working correctly in nanoff (and of course in the test platform v2 and v3). It should work out of the box for the VS Extension once the submodule reference to the nf-debugger repository has been updated, and would prevent device detection issues if two Visual Studio instances are open and try to discover devices at the same time . That requires this library and NuGet package to be updated first.

The other changes are related to the (ease of) use of the library in the test platform. The library seems to be created for device watcher-type applications, and some code for modifications of the global list of devices are not properly protected against simultaneous updates. Tried to fix that. Also: an application like test platform may create a debugger for a specific device that has not previously been discovered via a device watcher, but the application has to get the result from the global list anyway instead of having the device as a return value of the call.

I've noticed that there is some specialized code to prevent a connection to certain serial devices. Made that code available via a static method, so an application like test platform v3 can use exactly the same code to know which serial ports are present.

How Has This Been Tested?

Using the Serial Test App, at least for the serial devices. Don't know how to connect a device via TCP/IP (is that supposed to work?). The global exclusive access was tested by running one Serial Test App in the debugger and one outside Visual Studio.

Types of changes

  • Improvement (non-breaking change that improves a feature, code or algorithm)
  • Bug fix (non-breaking change which fixes an issue with code or algorithm)
  • New feature (non-breaking change which adds functionality to code)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Config and build (change in the configuration and build system, has no impact on code or features)
  • Dependencies (update dependencies and changes associated, has no impact on code or features)
  • Unit Tests (add new Unit Test(s) or improved existing one(s), has no impact on code or features)
  • Documentation (changes or updates in the documentation, has no impact on code or features)

Checklist:

  • My code follows the code style of this project (only if there are changes in source code).
  • My changes require an update to the documentation (there are changes that require the docs website to be updated).
  • I have updated the documentation accordingly (the changes require an update on the docs in this repo).
  • I have read the CONTRIBUTING document.
  • I have tested everything locally and all new and existing tests passed (only if there are changes in source code).
  • I have added new tests to cover my changes.

Frank Robijn added 2 commits September 17, 2024 10:17
…covery of devices is asynchronous (and because of the use of multiple async PortManagers in the test platform v3):

- Make sure all list operations use a lock of the collection
- PortTcpIpManager._networkDevices made static as it acts as a complement to the NanoFrameworkDevices collection - if there are multiple, the device list management does not work properly for a non-static _networkDevices.

System-wide exclusive access to a device via system-wide mutex, required to avoid competition between e.g.,Device Explorer auto-discovery, test platform etc. If this is used, it is possible for the nanoFramework tools to wait for a device to become available instead of throwing an exception. If an exception is thrown, that is an indication of a real problem, not a competition with other nanoFramework tools. Not sure if that also means the WaitAndRetry in the PortSerialManager should be removed.

Use of System-wide exclusive access in the serial DeviceWatcher; async part of that operation moved from PortSerialManager to DeviceWatcher. Also updated PortTcpIpManager+DeviceWatcher.

Serial ports: DeviceWatcher now has a static method to find all serial ports, incl exclusion list, so that all nanoFramework devices use the same method. Also to be used in the test platform v3.

Port**Manager returns the added or already registered device for PostBase.AddDevice. Makes use in test platform code easier.

Fixed OnDeviceEnumerationCompleted logic if both AddDevice and a device watcher are used. The OnDeviceEnumerationCompleted is now (indirectly) fired by the DeviceWatcher.

Also fixed at least one statement that would result in an exception if NanoNetworkDevices have been discovered before NanoSerialDevices.

Signed-off-by: Frank Robijn <robijn@good-heavens.nl>
@nfbot nfbot changed the title Improvements for async device list managements + exclusive access to devices. Improvements for async device list managements + exclusive access to devices Sep 17, 2024
@coderabbitai
Copy link

coderabbitai bot commented Sep 17, 2024

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (15)
  • .editorconfig is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Net/PortDefinitions/PortBase.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/DeviceConfiguration/NanoFrameworkDevices.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/NFDevice/GlobalExclusiveDeviceAccess.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortComposite/PortCompositeDeviceManager.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortDefinitions/IPort.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortDefinitions/PortBase.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortSerial/DeviceWatcher.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortSerial/PortSerialManager.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/DeviceWatcher.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/PortTcpIp.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/PortTcpIp/PortTcpIpManager.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/WireProtocol/Engine.cs is excluded by none and included by none
  • nanoFramework.Tools.DebugLibrary.Shared/nanoFramework.Tools.DebugLibrary.Net.projitems is excluded by none and included by none
  • spelling_exclusion.dic is excluded by none and included by none

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Frank Robijn added 2 commits September 17, 2024 22:24
…ecause it was not read-only. SonarCloud may well be right that the way NanoFrameworkDevices is used in this library is questionable, but changing is more work. As it is now (readonly object instance), NanoFrameworkDevices is a valid lock-target as the instance never changes.

Also added an exception handling in GlobalExclusiveDeviceAccess in case of killed processes - change was not committed previously.
@josesimoes josesimoes self-requested a review September 19, 2024 01:03
@josesimoes josesimoes changed the title Improvements for async device list managements + exclusive access to devices Improvements for async device management and exclusive access to device Sep 19, 2024
- Before the changes in this PR, the PortSerialManager.OnDeviceEnumerationCompleted event was postponed if a new device was detected before the addition of previously detected devices was completed. In the modified version that was no longer the case. Behaviour is restored in this version.

- The DeviceWatcher.GetPortNames was changed to a static method, as it apparently is not used by any code in nanoFramework. As it is a public library, this may break the code of someone else. Original method reinstated and new static method made internal. The public version of the method is made part of the PortSerialManager, as external code is most likely to interact with the PortSerialManager than with DeviceWatcher.

Addition:

- It is allowed to change the PortExclusionList after the DeviceWatcher has been started. Made sure to add locks in the right places.
Copy link
Member

@josesimoes josesimoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frank I can see where you're going and I see value in this global lock approach.
Few comments to address.

Also, there seems to be something wrong the rescan devices call execution. Not only it takes a serious ammount of time, but it never flags the completion to the caller...
I have these COM ports on my machine (2 of the COM ports are nanoDevices)

image

Last comment: the algorithm is trying to reach the serial Bluetooth devices (when it shouldn't).

@frobijn
Copy link
Contributor Author

frobijn commented Sep 24, 2024

Also, there seems to be something wrong the rescan devices call execution. Not only it takes a serious amount of time,

O yes, a lot of time. If you reserve a COM port for a virtual device and don't have that device running, it also takes ages before the enumeration completes. Ive checked in the debugger that this has nothing to do with the locking mechanism. I suspect it is the WaitAndRetry in PortSerialManager.OnDeviceAdded. The total wait time alone is about 10 seconds for each port that is not connected to a nanoDevice, plus ten times the amount of time the underlying code needs to determine there is no nanoDevice connected to the port.

I've several of those reserved-for-virtual-device-ports, and I've noticed that although the PortSerialManager.OnDeviceAdded method is called at the same time for all ports (as the method is called async and in parallel for all ports), the method never completes at the same time. It completes for one port at a time. The time between successive completions of the method in case of ports that are not connected to a device is a few seconds. That is very similar to the last wait period of the WaitAndRetry (= 2500ms). I suspect that somewhere in the code that connects to the device there is a lock and the connections to the various devices are not tried in parallel, but are forced to be sequential.

If this is correct, then in your case, with 4 non-nanoDevice ports, the total time before completion is about 10 + (6-2-1)*2.5 + (a bit for the 2 connected devices)= 20 seconds.

I don't know why the WaitAndRetry is needed, for nanodevices only one try (no retry) seems to be sufficient. I also didn't investigate the long return times. Instead I aim to have a configuration where you can exclude all special ports (in addition to the exclusion filter in the library).

but it never flags the completion to the caller...

I cannot reproduce that. The only way that can happen is if somewhere deep in the library there is code that tries to connect to a device via a port that never returns. But then that should also happen in the current code (if you would remove the "rogue port" filter).

I have these COM ports on my machine (2 of the COM ports are nanoDevices)
Last comment: the algorithm is trying to reach the serial Bluetooth devices (when it shouldn't).

I think that's solved in the next commit. Please check again (I don't have BT COM ports, can't test that.)

@frobijn
Copy link
Contributor Author

frobijn commented Sep 25, 2024

I've investigated the timing issue a bit further. No devices connected, just 6 COM ports that are unresponsive. The code that makes all code execute sequentially is the first operation that uses the wire protocol to query the nanoDevice. That fails of course, but only after some timeout.

What causes all calls to be sequential is that communication via the wire protocol is protected via a static lock: Engine._syncReqLock. So the device watcher launches parallel threads for the six ports, and then they wait in turn to be able to send a message, each to a different port. There is no need for such a system-wide lock, nothing in the DebugEngine is static.

This is the only lock: if the lock is static, scanning the six ports takes 26 seconds. If the lock is not static but per Engine instance, scanning the ports takes about 10 seconds (= total wait time for the wait and retry loop).

I presume that the fact that the lock is static will also hurt the test framework if you run tests on multiple real hardware devices, as the communication between the test host and devices for deployment and to get the results is also done via the wire protocol.

It seems easy to get rid of the static lock. To make it non-static may be a bit risky: if there is code out there that creates two Engines for the same port, then that code may fail after the change. That is not the case if _syncReqLock is a dictionary with the NanoDevice.DeviceId as key and the SemaphoreSlim as value.

I'll see whether I can implement the dictionary later today; I'll add the commit to this PR.

Copy link
Member

@Ellerbach Ellerbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments, all up, logic seems good to me.

@josesimoes
Copy link
Member

The rescan issue is coming from this call:

image

For some reason the call to Open() blocks, never returns and doesn't throw any exception whatsoever. 😯
This is happening with a faulty device I have here.

Maybe we should add a timeout on the serial port workflow so a faulty device can't block/hinder the discovery process.

@josesimoes
Copy link
Member

@frobijn discovery and rescan working much smoother now! This is making very good progress. 👍🏻

@frobijn
Copy link
Contributor Author

frobijn commented Sep 25, 2024

The rescan issue is coming from this call:
For some reason the call to Open() blocks, never returns and doesn't throw any exception whatsoever. 😯 This is happening with a faulty device I have here.
Maybe we should add a timeout on the serial port workflow so a faulty device can't block/hinder the discovery process.

I didn't analyse all the serial communication code, there's a lot and I'm not very familiar with that.

@frobijn
Copy link
Contributor Author

frobijn commented Sep 25, 2024

@josesimoes Your minor changes caused a merge conflict that VS2022 could not handle "because of ... codepage". Automerge resulted in an empty file. I hope that I've got all your changes in the new commit.

Kept the static GetPortNames method, see above.

@frobijn
Copy link
Contributor Author

frobijn commented Sep 25, 2024

Maybe we should add a timeout on the serial port workflow so a faulty device can't block/hinder the discovery process.

The result of a "hanging" device is that the "enumeration complete" event is never fired. The discovery process continues. As external software (like the Device Explorer) also listens to the "device added" event, the effect may be minimal.

@josesimoes
Copy link
Member

@josesimoes Your minor changes caused a merge conflict that VS2022 could not handle "because of ... codepage". Automerge resulted in an empty file. I hope that I've got all your changes in the new commit.

Kept the static GetPortNames method, see above.

Apologies for getting in the way...

@josesimoes
Copy link
Member

Maybe we should add a timeout on the serial port workflow so a faulty device can't block/hinder the discovery process.

The result of a "hanging" device is that the "enumeration complete" event is never fired. The discovery process continues. As external software (like the Device Explorer) also listens to the "device added" event, the effect may be minimal.

Not exactly. There are other callers (like VS device managed) that disable the rescan button when a scan operation is running. If that never flags is as being completed. It will never enable back the button. That happens every now and then. Only remedy is to close and restart VS to release the resources. Let's not worry about this for now.

@frobijn
Copy link
Contributor Author

frobijn commented Sep 26, 2024

Ok. Then I think that I have addressed all issues/suggestions raised so far.

@josesimoes
Copy link
Member

@frobijn on last improvement that ensures the devices are really closed (therefore made available) when removed from the device list. This resolves a situation on device rescan when a device was connected it wasn't possible to resume communication with it, until restarting the application.

Please take a look. If your OK, lets have this merged!

Copy link
Member

@josesimoes josesimoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!
Great job improving this. 💯 👏🏻

@frobijn
Copy link
Contributor Author

frobijn commented Sep 27, 2024

Please take a look. If your OK, lets have this merged!

No comments, looks good. Let's merge!

@josesimoes josesimoes merged commit c5b0e01 into nanoframework:main Sep 27, 2024
@nfbot
Copy link
Member

nfbot commented Sep 27, 2024

@frobijn thank you again for your contribution! 🙏😄

.NET nanoFramework is all about community involvement, and no contribution is too small.
We would like to invite you to join the project's Contributors list.

Please edit it and add an entry with your GitHub username in the appropriate location (names are sorted alphabetically):

  <tr>
    <td><img src="https://github.com/frobijn.png?size=50" height="50" width="50" ></td>
    <td><a href="https://github.com/frobijn">Frank Robijn</a></td>
  </tr>

(Feel free to adjust your name if it's not correct)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants