-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolve EpicsMotor stalls sometimes at end of move #783
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks fine, but I'd like to clarify a few questions (see below).
ophyd/epics_motor.py
Outdated
status._finished(success=True) | ||
status.done = True | ||
|
||
self.motor_done_move.subscribe(dmov_callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we subscribe with run=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also have to remember to un-subscribe this or we will have as many callbacks as we have had motor moves!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the matching unsubscribe? self.motor_done_move.clear_sub(dmov_callback)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good move: run=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this?
self.motor_done_move.clear_sub(dmov_callback)
self.motor_done_move.unsubscribe(cbid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with:
cbid = self.motor_done_move.subscribe(dmov_callback, run=False)
|
||
def dmov_callback(old_value, value, **kwargs): | ||
self.log.debug(f'DEBUG: dmov_callback(): new={value} old={old_value}') | ||
if self._started_moving and value == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is setting self._started_moving
to True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This is tangly code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the _move_changed()
method:
if not self._started_moving:
started = self._started_moving = (not was_moving and self._moving)
Much of this logic is already in Lines 224 to 271 in 3cbf73b
I suspect that this is going to interact badly with that logic. It is subscribed a bit differently on the 1.3.x branch Line 81 in 9f21ca2
Lines 222 to 267 in 9f21ca2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is leaking subscriptions and will conflict with the existing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some Python implementation notes. How were causing the status
to finish before now? Is this a "whichever happens first" scenario?
Haha, I see three of us pounced in parallel.Thanks for the contribution, @prjemian, and good luck sifting through our feedback. 🙃 |
This might be the better place for the logic in the
should change to:
This logic is problematic:
since it requires the transition to declare the move complete. Suggest relocating the done_moving()` call thus:
|
So, removing the |
In the end, I restored the
Is there a better way to be certain that |
self.log.debug('dmov_callback(): new=%d old=%d', value, old_value) | ||
if self._started_moving and value == 1: | ||
|
||
self.motor_done_move.clear_sub(dmov_callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one of these should be required.
My understanding is that the motor record will always flash DMOV low and then high to indicate that it is moving (even if it already at it's target destination). I am still 👎 on this as-is because it is short-circuiting a whole bunch logic in Do we have a clear understanding of the source of the problem? @prjemian do you have any timing diagrams of what the various PVs are doing when it does and does not work correctly? My current guess is that we are hitting race-conditions on some of the internal state of In Lines 195 to 198 in 3cbf73b
we subscribe the status object to Lines 200 to 208 in 3cbf73b
Lines 270 to 271 in 3cbf73b
|
@tacaswell : The .DMOV transitions from 1 to 0 at the start of a move and transitions from 0 to 1 at the end, after the completion of the primary move plus any additional move complexities (such as backlash corrections or retries) controlled by the motor record. See this documentation for the details: https://epics.anl.gov/bcda/synApps/motor/motorRecord.html#Fields_status About the .DMOV field, the documentation states:
|
It is clear from the structure of the plan used to test this problem (when executed with logging level set to DEBUG):
that the next move has not yet begun. If the next move command had been issued, the RunEngine logging would indicate a |
Is there a better way to be certain that st._finished() gets called? |
I agree that this change has implications of change for additional methods such as |
What is the behavior of DMOV is the motor is already moving and then we instruct it to move again? The monitor subscriptions are run on a background thread (one sourced via libca and then trampolined to python). The conflict is between the current move and the previous move's clean up code. |
If the .VAL field is changed while the motor is moving (.DMOV = 0), the .DMOV field will remain at zero until the new move is finished. |
What is trampolined?
How can I diagnose the code in the current master at this level? |
Here's an example of the timing of changes in motor VAL, MOVN, and DMOV fields using this command: before the move
step by 1.0
step by 1.0, then step by 1.0 during the first move
step by -1.0 (with backlash correction)
step by -1.0 (with backlash correction), then step by -1.0 during the first move
|
The code for this now lives in https://github.com/bluesky/ophyd/blob/master/ophyd/_dispatch.py . Due to how contexts work in libca/pyepics (this is the c-level structure that holds the various sockets / threads etc required to handle the asynchronous part of the epics protocol and manages the socket <-> virtual circuit <-> channel mapping) running python code in response to a monitor that does further caput/caget will fail. I do not remember exactly why this is off the top of my head (maybe @klauer can remind me ;) ). Instead the function we actually register with pyepics just "bounces" the values to a second queue that we process in a python thread which gets around this problem. I think we just have to start putting in logging statements and sort out the timing between the things happening on different threads and at the IOC level. I don't know any shortcut to tracking down race conditions, but at least we have a way to reliably reproduce this one! |
Thanks for the explanation, @tacaswell ! The next thing to do is to return to the master branch and begin diagnosis with that code, to determine what needs to be changed so the motor stall condition does not recur. I'll pause further work on this branch until that diagnosis is fruitful. |
Will make a VM image available to the DAMA team. |
|
Possibly related to #788? Will test new ophyd release 1.4.0 to see if this remains a problem. |
Need to stress test this with current release to see if still a problem. |
dropping this branch & PR |
fixes #782