Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeoutLock less HWP PMX agent #604

Merged
merged 18 commits into from
Jan 8, 2024
Merged

TimeoutLock less HWP PMX agent #604

merged 18 commits into from
Jan 8, 2024

Conversation

ykyohei
Copy link
Contributor

@ykyohei ykyohei commented Dec 27, 2023

Description

TimeoutLockless HWP PMX agent

Motivation and Context

In the operation of PMX at the site, we have many timeout lock errors. To avoid this error and make PMX control more robust, we modify the PMX agent in the same way as HWP PCU agent #600
I plan to do the same modification for the HWP PID agent as well.
The way I define actions may not be good. Suggestions for better coding are welcome.

How Has This Been Tested?

Tested in PMX in UTokyo setup

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Copy link
Collaborator

@jlashner jlashner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kyohei,

Looks good, but there are a few things off about the task structure here that need to be fixed. Can you fix them, and then I will help to resolve any test errors that still exist? Thanks!

Comment on lines 168 to 172
if self.shutdown_mode:
return False, "Shutdown mode is in effect"

self.dev.turn_on()
action = Actions.SetOn(**params)
self.action_queue.put(action)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kyohei, in the lockless agents, all tasks should be structured something like this:

@defer.inlineCallbacks
  def set_on(self, session, params):
    action = Actions.SetOn(**params)
    self.action_queue.put(action)
    session.data = yield action.deferred
    return True, f"Action {action} finished successfully"

I'd move the shutdown-mode check to either the _process_actions function, or to the action's process function, in which case it should be passed in along with the module.

They also need to be registered with blocking=False, like

agent.register_task('set_on', PMX.set_on, blocking=False)

Comment on lines 294 to 299
try:
PMX = pmx.PMX(ip=self.ip, port=self.port)
self.log.info('Connected to PMX Kikusui')
except BrokenPipeError:
self.log.error('Could not establish connection to PMX Kikusui')
reactor.callFromThread(reactor.stop)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can test the hardware and failure modes, I'd personally opt for something a bit more robust than this. The LS240 restructure branch has an example of how we can structure this such that on connection failures, it will continue to try making the connection instead of just killing the reactor thread:

module: Optional[Module] = None
session.set_status('running')
# Clear pre-existing actions
while not self.action_queue.empty():
action = self.action_queue.get()
action.deferred.errback(Exception("Action cancelled"))
pm = Pacemaker(self.f_sample, quantize=False)
while session.status in ['starting', 'running']:
if module is None:
try:
module = self._init_lakeshore()
except ConnectionRefusedError:
self.log.error(
"Could not connect to Lakeshore. "
"Retrying after 30 sec..."
)
time.sleep(30)
pm.sleep()
continue

@jlashner
Copy link
Collaborator

jlashner commented Jan 4, 2024

I'm looking at the test failures here, and it seems like the PMX integration tests rely strongly on the old agent structure, and I don't personally see a clear way to modify them to work with the new structure. I think we should remove them for the time being, and maybe add in tests later that use the HWP Emulator. @BrianJKoopman what do you think?

@ykyohei
Copy link
Contributor Author

ykyohei commented Jan 5, 2024

@jlashner I made requested changes, and confirmed that these are working in UTokyo setup. Please review.

@ykyohei ykyohei requested a review from jlashner January 5, 2024 12:51
@jlashner
Copy link
Collaborator

jlashner commented Jan 5, 2024

looking good to me! I made one small change (moved the publish / setting of session data into the try-block for the _get_data function), and fixed integration tests so that they can run for me. Hopefully tests pass, and then we can merge!

@jlashner
Copy link
Collaborator

jlashner commented Jan 8, 2024

Tests pass!! and integration tests were very informative in catching a race condition. I think we can merge now.

'block_name': 'hwppmx',
'data': {}
}
self._clear_queue()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlashner This needs to be put in the reactor too ... I suggest blockingCallFromThread.

@jlashner jlashner merged commit bab96fe into main Jan 8, 2024
7 checks passed
@jlashner jlashner deleted the hwp_pmx_restructure branch January 8, 2024 19:09
hnakata-JP pushed a commit that referenced this pull request Apr 12, 2024
* draft of new hwp_pmx

* debug in UTokyo  setup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* flake8

* int-test

* fix tasks

* fix shutdown mode

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more robust connection to PMX

* fix typo

* fix doc

* Fix tests and other small changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* flake8

* Use callFromThread on deferred

* clear queue in blockingCallFromThread

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jack Lashner <jlashner@gmail.com>
BrianJKoopman pushed a commit that referenced this pull request May 8, 2024
* draft of new hwp_pmx

* debug in UTokyo  setup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* flake8

* int-test

* fix tasks

* fix shutdown mode

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more robust connection to PMX

* fix typo

* fix doc

* Fix tests and other small changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* flake8

* Use callFromThread on deferred

* clear queue in blockingCallFromThread

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jack Lashner <jlashner@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants