Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HWP Emulation and PID agent lockless restructure #606

Merged
merged 23 commits into from
Feb 6, 2024
Merged

Conversation

jlashner
Copy link
Collaborator

@jlashner jlashner commented Jan 4, 2024

This PR begins to implement an emulation framework for the HWP, to be used for testing HWP agent operation. Also restructures the HWP_PID to use the lockless format.

Description

During the dev call, and while testing out HWP supervisor status at the site, there are a number of things that we still need to update in many of the HWP agents before we can reliably use them to observe. Due to the limited access to HWP hardware, a relatively realistic HWP emulator will be extremely helpful for developing changes, and testing them before testing on real hardware. This PR begins to implement this emulation, and I have tested it running against PID and PMX agents on my local development setup.

The data being returned from the emulator isn't perfect (especially since the PID interface requires some more reverse engineering), but it is good enough such that agent tasks are able to run successfully.

In testing, I was finding many instances in which the PID agent could not successfully run operations due to the locking structure, so I refactored that agent to have the same structure as the new timeout-lockless agents, and tested with the emulator.

This PR contains the following changes:

  • Start of the HWP emulation framework
  • PID lockless restructure
  • some logging modifications to the DeviceEmulator class to make it more usable by general emulators like this

How Has This Been Tested?

Tested by running with the HWP PMX and PID agents locally.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Copy link
Contributor

@ykyohei ykyohei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lockless restructure of hwp-pid agent looks good to me.
I really want to make HWP control more robust, so I really appreciate this work.

Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for PRing this, and sorry it took me a bit to get to. Is the intention here to eventually run integration tests for the simulator with an instance of this stood up? Do you have any examples of these tests yet?

The biggest thing to me is the logging issues pointed out in the comment on device_emulator.py. Other comments are relatively minor.

simulators/hwp/hwp_emulator.py Outdated Show resolved Hide resolved
simulators/hwp/hwp_emulator.py Outdated Show resolved Hide resolved
simulators/hwp/hwp_emulator.py Outdated Show resolved Hide resolved
simulators/hwp/hwp_emulator.py Outdated Show resolved Hide resolved
simulators/hwp/hwp_emulator.py Outdated Show resolved Hide resolved
simulators/hwp/hwp_emulator.py Outdated Show resolved Hide resolved
socs/agents/hwp_pid/agent.py Show resolved Hide resolved
socs/agents/hwp_pid/agent.py Outdated Show resolved Hide resolved
socs/agents/hwp_pid/agent.py Show resolved Hide resolved
Comment on lines +302 to +326
if args.mode is not None:
agent.log.warn("--mode agrument is deprecated.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is true, we should remove it from the example config block in the docs.

return True, 'Reversing Direction'
action = Actions.TuneStop(**params)
self.action_queue.put(action)
session.data = yield action.deferred
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to keep session.data as a dict. Maybe standardize on session.data['output'] = ... or something like that?

Also -- what happens here if there's an errback? Do you need to catch that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an errback, the yield will just throw the error. I find this to be the easiest way to find and debug errors in ocs operations anyway, so I don't see the need to catch and handle manually.

@jlashner
Copy link
Collaborator Author

Hi Brian, thanks for the comments, I think I addressed or responded to all of them.

My goal with this is not necessarily to use this in tests, though it may be possible. Mainly the purpose of this is to allow myself and other HWP developers to run HWP agents for the PID, PMX, supervisor, etc. on their local systems, which is very helpful for developing new features and testing behavior.

Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recent changes look good to me. There's just the failing tests and docs builds now.

Merging in the most recent main should fix the docs builds. The tests will need fixing before we can merge though.

@jlashner
Copy link
Collaborator Author

Hi Brian, so I wasn't originally intending to use the emulator for tests, but when trying to fix the tests for the PID agent restructure, it ended up being much easier to use the emulator, since the PID responses are difficult to write out manually.

With the emulator, the tests work well though so I moved it to the socs.testing sub-package. Let me know if this is still ok with you.

Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent quite a bit of time today trying to track this down and still don't really know what's going on.

I can't reproduce the failing test locally -- I do, however, produce other failed tests that seem to stem from some race condition with the task completing and when the response is checked.

An example of two back to back local runs:

=================================================================== short test summary info ===================================================================
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_set_direction - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_set_pid - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_set_scale - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_declare_freq - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_tune_freq - assert 6 == 5
===================================================== 5 failed, 3 passed, 1 warning in 180.37s (0:03:00) ======================================================
=================================================================== short test summary info ===================================================================
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_set_scale - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_declare_freq - assert 2 == 5
FAILED integration/test_hwp_pid_agent_integration.py::test_hwp_rotation_tune_freq - assert 2 == 5
===================================================== 3 failed, 5 passed, 1 warning in 192.34s (0:03:12) ======================================================

While we're not seeing the error on the CI runner, it concerns me these crop up locally. I suspect it's something to do with the new structure. Are you able to reproduce? (As a related aside, task.wait() works with this new structure, yes?)

I'll note I'm not testing with the newest ocs main branch with the updated OpCodes. And these examples were tested on Python 3.8 (like the CI runner uses.)

You probably saw that I re-ran the tests in this PR, which failed. I isolated just the PMX and PID tests in a separate run, and that passed no problem.

I've only really seen the failure here when running all tests. I'm also seeing the occasional failure of a test in common/test_pmx.py.

That said, recent runs of the CI workflow on other branches (like this merge to main from the other day) have run fine, so I'm not convinced this is just a flakey test that's suddenly cropping up, but instead something related to changes here.

socs/testing/hwp_emulator.py Outdated Show resolved Hide resolved
socs/testing/hwp_emulator.py Show resolved Hide resolved
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you got the tests working! I have two small docstring requests that I missed in the initial review, and a question. Otherwise I think this is good to go.

socs/agents/hwp_pid/agent.py Show resolved Hide resolved
socs/agents/hwp_pid/agent.py Show resolved Hide resolved
socs/agents/hwp_pid/agent.py Show resolved Hide resolved
@BrianJKoopman BrianJKoopman mentioned this pull request Feb 5, 2024
6 tasks
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Will squash and merge.

@BrianJKoopman BrianJKoopman merged commit 54289e9 into main Feb 6, 2024
7 checks passed
@BrianJKoopman BrianJKoopman deleted the hwp-emulation branch February 6, 2024 20:42
hnakata-JP pushed a commit that referenced this pull request Apr 12, 2024
* HWP Emulation and PID restructure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* More complete PMX emulation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changes based on Brian's review

* fix session.data + threading in PID agent

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes from further testing of PID agent

* remode PID start mode from docs

* Fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix mutable defaults

* Adds traceback to device-emulator exception

* adds time to device emulator logs

* add debug print to when readline failes

* change order of tests

* test

* change to update_responses

* test update_responses

* update_responses docstring

* more testing race-conditions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adds session data info to docstring

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
BrianJKoopman pushed a commit that referenced this pull request May 8, 2024
* HWP Emulation and PID restructure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* More complete PMX emulation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changes based on Brian's review

* fix session.data + threading in PID agent

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes from further testing of PID agent

* remode PID start mode from docs

* Fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix mutable defaults

* Adds traceback to device-emulator exception

* adds time to device emulator logs

* add debug print to when readline failes

* change order of tests

* test

* change to update_responses

* test update_responses

* update_responses docstring

* more testing race-conditions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adds session data info to docstring

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants