-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HWP Emulation and PID agent lockless restructure #606
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lockless restructure of hwp-pid agent looks good to me.
I really want to make HWP control more robust, so I really appreciate this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for PRing this, and sorry it took me a bit to get to. Is the intention here to eventually run integration tests for the simulator with an instance of this stood up? Do you have any examples of these tests yet?
The biggest thing to me is the logging issues pointed out in the comment on device_emulator.py
. Other comments are relatively minor.
if args.mode is not None: | ||
agent.log.warn("--mode agrument is deprecated.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is true, we should remove it from the example config block in the docs.
socs/agents/hwp_pid/agent.py
Outdated
return True, 'Reversing Direction' | ||
action = Actions.TuneStop(**params) | ||
self.action_queue.put(action) | ||
session.data = yield action.deferred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to keep session.data as a dict. Maybe standardize on session.data['output'] = ...
or something like that?
Also -- what happens here if there's an errback? Do you need to catch that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an errback, the yield
will just throw the error. I find this to be the easiest way to find and debug errors in ocs operations anyway, so I don't see the need to catch and handle manually.
0d47641
to
6f05671
Compare
Hi Brian, thanks for the comments, I think I addressed or responded to all of them. My goal with this is not necessarily to use this in tests, though it may be possible. Mainly the purpose of this is to allow myself and other HWP developers to run HWP agents for the PID, PMX, supervisor, etc. on their local systems, which is very helpful for developing new features and testing behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recent changes look good to me. There's just the failing tests and docs builds now.
Merging in the most recent main
should fix the docs builds. The tests will need fixing before we can merge though.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
20a8582
to
4bd1a64
Compare
for more information, see https://pre-commit.ci
Hi Brian, so I wasn't originally intending to use the emulator for tests, but when trying to fix the tests for the PID agent restructure, it ended up being much easier to use the emulator, since the PID responses are difficult to write out manually. With the emulator, the tests work well though so I moved it to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent quite a bit of time today trying to track this down and still don't really know what's going on.
I can't reproduce the failing test locally -- I do, however, produce other failed tests that seem to stem from some race condition with the task completing and when the response is checked.
An example of two back to back local runs:
=================================================================== short test summary info ===================================================================
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_set_direction - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_set_pid - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_set_scale - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_declare_freq - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_tune_freq - assert 6 == 5
===================================================== 5 failed, 3 passed, 1 warning in 180.37s (0:03:00) ======================================================
=================================================================== short test summary info ===================================================================
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_set_scale - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_declare_freq - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_tune_freq - assert 2 == 5
===================================================== 3 failed, 5 passed, 1 warning in 192.34s (0:03:12) ======================================================
While we're not seeing the error on the CI runner, it concerns me these crop up locally. I suspect it's something to do with the new structure. Are you able to reproduce? (As a related aside, task.wait()
works with this new structure, yes?)
I'll note I'm not testing with the newest ocs main
branch with the updated OpCodes. And these examples were tested on Python 3.8 (like the CI runner uses.)
You probably saw that I re-ran the tests in this PR, which failed. I isolated just the PMX and PID tests in a separate run, and that passed no problem.
I've only really seen the failure here when running all tests. I'm also seeing the occasional failure of a test in common/test_pmx.py
.
That said, recent runs of the CI workflow on other branches (like this merge to main
from the other day) have run fine, so I'm not convinced this is just a flakey test that's suddenly cropping up, but instead something related to changes here.
ed21a81
to
bfcc549
Compare
60cf3a0
to
45aa0e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you got the tests working! I have two small docstring requests that I missed in the initial review, and a question. Otherwise I think this is good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved! Will squash and merge.
* HWP Emulation and PID restructure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More complete PMX emulation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changes based on Brian's review * fix session.data + threading in PID agent * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes from further testing of PID agent * remode PID start mode from docs * Fix tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix mutable defaults * Adds traceback to device-emulator exception * adds time to device emulator logs * add debug print to when readline failes * change order of tests * test * change to update_responses * test update_responses * update_responses docstring * more testing race-conditions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adds session data info to docstring --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* HWP Emulation and PID restructure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More complete PMX emulation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changes based on Brian's review * fix session.data + threading in PID agent * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes from further testing of PID agent * remode PID start mode from docs * Fix tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix mutable defaults * Adds traceback to device-emulator exception * adds time to device emulator logs * add debug print to when readline failes * change order of tests * test * change to update_responses * test update_responses * update_responses docstring * more testing race-conditions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adds session data info to docstring --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This PR begins to implement an emulation framework for the HWP, to be used for testing HWP agent operation. Also restructures the HWP_PID to use the lockless format.
Description
During the dev call, and while testing out HWP supervisor status at the site, there are a number of things that we still need to update in many of the HWP agents before we can reliably use them to observe. Due to the limited access to HWP hardware, a relatively realistic HWP emulator will be extremely helpful for developing changes, and testing them before testing on real hardware. This PR begins to implement this emulation, and I have tested it running against PID and PMX agents on my local development setup.
The data being returned from the emulator isn't perfect (especially since the PID interface requires some more reverse engineering), but it is good enough such that agent tasks are able to run successfully.
In testing, I was finding many instances in which the PID agent could not successfully run operations due to the locking structure, so I refactored that agent to have the same structure as the new timeout-lockless agents, and tested with the emulator.
This PR contains the following changes:
How Has This Been Tested?
Tested by running with the HWP PMX and PID agents locally.
Types of changes
Checklist: