-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add way to escape 'Another set in progress'. #901
base: master
Are you sure you want to change the base?
Add way to escape 'Another set in progress'. #901
Conversation
ophyd/signal.py
Outdated
Escape 'Another set in progress'. | ||
""" | ||
warnings.warn( | ||
"A previous set() operaiton is being ignored. Only this do this " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: "operation"
So far, this PR makes a clean way to dump the thread in the same way we do now. We need to terminate those dangling threads. As is, console session does not exit cleanly, requiring one or more ^C to fully exit the process. |
Good point, @prjemian. I took it further in this next commit. I tested against this pathological caproto IOC which takes forever to set: #!/usr/bin/env python3
from caproto.server import pvproperty, PVGroup, ioc_arg_parser, run
class Stubborn(PVGroup):
"""
When a PV is written to, write the new value into a file as a string.
"""
async def my_write(self, instance, value):
print(f"Ignored request to write {value}")
return 0
A = pvproperty(put=my_write, value=0)
B = pvproperty(put=my_write, value=0)
if __name__ == '__main__':
ioc_options, run_options = ioc_arg_parser(
default_prefix='stubborn:',
desc='Run an IOC with PVs that ignores write requests')
ioc = Stubborn(**ioc_options)
run(ioc.pvdb, **run_options) Before, console does not exit cleanly: In [1]: from ophyd import EpicsSignal
In [2]: s = EpicsSignal("stubborn:A", name='s')
In [3]: s.set(1)
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', timestamp=1598986036.873889, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)
In [4]: _.done
Out[4]: False
In [5]: s.set(2)
<snipped>
RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.
In [6]: s.clear_set()
/home/dallan/Repos/bnl/ophyd/ophyd/signal.py:318: UserWarning: A previous set() operaiton is being ignored. Only this do this when debugging or recovering from a hardware failure.
"A previous set() operaiton is being ignored. Only this do this "
In [7]: exit
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 269, in set_thread
rtol=self.rtolerance)
File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 263, in set_and_wait
current_value = signal.get()
File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1108, in get
self._read_pv, timeout, connection_timeout, as_string, form)
File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1066, in _get_with_timeout
info = pv.get_with_metadata(as_string=as_string, form=form, timeout=timeout)
File "/home/dallan/.conda/envs/py37/lib/python3.7/site-packages/epics/pv.py", line 46, in wrapped
raise RuntimeError('Expected CA context is unset')
RuntimeError: Expected CA context is unset After, In [1]: from ophyd import EpicsSignal
In [2]: s = EpicsSignal("stubborn:A", name='s')
In [3]: s.set(1)
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', value=1, timestamp=1598986283.0511467, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)
In [4]: _.done
Out[4]: False
In [5]: s.set(2)
<snipped>
RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.
In [6]: s.clear_set()
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 272, in set_thread
rtol=self.rtolerance, poison_pill=poison_pill)
File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
raise AbandonedSet
ophyd.utils.epics_pvs.AbandonedSet
/home/dallan/Repos/bnl/ophyd/ophyd/signal.py:328: UserWarning: A previous set() operaiton is being ignored. Only this do this when debugging or recovering from a hardware failure.
"A previous set() operaiton is being ignored. Only this do this "
In [7]: exit # no errors at exit |
I think this kind of event wait poison pill approach is great for killing these background set threads. Do you think signals should automatically call this at python exit? It might smooth out the shutdown of some of the ipython sessions. |
Yes, I think that will be the case in general. My only reservation about adding this is that it might be used routinely, rather than as intended role: allowing you to salvage the ophyd object instance but prompting you to look into the device or IOC issue. But, the current situation, where if you have bad failed set you have to remake your instance or (as is common in practice) restart your whole process, seem worse.
Great suggestion. Thanks. Will add now.... |
... realizing that the current implementation will be changed from the current technique of putting black tape over the blinking red light. (That is, it sets the object reference to |
Before, unclean exit because In [1]: from ophyd import EpicsSignal
In [2]: s = EpicsSignal("stubborn:A", name='s')
In [3]: s.set(1)
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', timestamp=1599074325.10357, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)
In [4]: _.done
Out[4]: False
In [5]: exit
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 272, in set_thread
rtol=self.rtolerance, poison_pill=poison_pill)
File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 274, in set_and_wait
current_value = signal.get()
File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1117, in get
self._read_pv, timeout, connection_timeout, as_string, form)
File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1075, in _get_with_timeout
info = pv.get_with_metadata(as_string=as_string, form=form, timeout=timeout)
File "/home/dallan/.conda/envs/py37/lib/python3.7/site-packages/epics/pv.py", line 46, in wrapped
raise RuntimeError('Expected CA context is unset')
RuntimeError: Expected CA context is unset After, In [5]: exit
Traceback (most recent call last):
File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 274, in set_thread
rtol=self.rtolerance, poison_pill=poison_pill)
File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
raise AbandonedSet
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1599075154.647881, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it. |
This is a very fair concern. In the past we've talked a lot about not leaving traps for our users. I think this PR does a good job of making the failure at least very verbose with the warnings and exceptions raised, hopefully verbose enough that someone investigates the root cause of the issue rather than patching over it indefinitely. In the context of "make it work for the user experiment" though, I think it's important to provide tools like this, else we end up doing some sort of kluge that makes the experiment work but is otherwise an accrual of technical debt. I'll take a standard bandaid over roll-your-own bandaid any day of the week. |
Well said. |
@danielballan : Your comment above shows exactly the same sort of error I was seeing recently. In that comment from USAXS, there is an attached text file with the thousands of lines output from the error dump on exiting the console session. In a loop (with bps.repeat), I ran a plan that included six different bp.scan runs. Four of the scans involve a motor with the
To clear the problems previously reported, I include at the end of a set of scans a call to the I'm concerned that the fix of |
@prjemian unless I'm misreading something here, this PR ensures that the set thread will die using a |
Right. In the earliest draft of this PR I just dropped the reference to the thread (“put black tape over the red warning light”) to prompt discussion on the question “Do we even want to expose this API?” But in the current draft I first ensure the thread will stop, as @ZLLentz describes, and only then drop the reference. |
Ah, I see the additional code now. Should I patch locally and give it a try? Seems I have a stress test demonstrated above. |
That would be great! |
Ok, NeXus conferences are done, back in the Bluesky saddle again. Tested with caproto IOC per above using this test code: from ophyd import EpicsSignal
s = EpicsSignal("stubborn:A", name='s')
st = s.set(1)
st.done
st = s.set(2)
st.done
exit
FYI
|
But, adding the call to
|
So, this code is noisy (can't suppress the comments) and it goes against the advice provided, BUT it continues to operate past the reported problems: from ophyd import EpicsSignal
from ophyd.utils.epics_pvs import AbandonedSet
s = EpicsSignal("stubborn:A", name='s')
for cycle in range(3):
st = s.set(1)
print(f"cycle {cycle}, status: {st.done}")
s.clear_set()
print(f"cycle {cycle}, status: {st.done}")
s.clear_set()
exit() ipython session:
|
Now for the stress test. Code#!/usr/bin/env python3
from ophyd import EpicsSignal
from ophyd.utils.epics_pvs import AbandonedSet
import time
s = EpicsSignal("sky:m1.STUP", name='s', string=True)
for cycle in range(3):
print(">"*40)
st = s.set("ON")
print(f"cycle {cycle}, status: {st.done}")
time.sleep(0.05)
s.clear_set()
print(f"cycle {cycle}, status: {st.done}")
print("<"*40)
s.clear_set()
exit() without
|
Calling
In the why didn't I think of that sooner category, isn't setting a timeout parameter the right way? Can the def plan9(signal):
for _itr in range(3):
try:
yield from bps.mv(signal, "ON", timeout=0.01)
except FailedStatus:
pass Yup, it works (if the exception is trapped):
|
The result can be quieter by suppressing the warning-level messages from the signal:
|
@danielballan @tacaswell |
But now I ask the question why the Line 1719 in 3dddf11
This
Now, try
Expecting
Repeat but use a
|
This seems 👍 to me. It adds a bunch of complexity, but I think that this is the minimum complexity we can have if we are going to have background status threads like this. |
Thanks for pushing on this @prjemian . |
I'm stuck trying to trigger AbandonedSet raised in
ophyd.utils.epics_pvs.set_and_wait
Ideas?
…On Thu, Jun 10, 2021, 12:12 PM Dan Allan ***@***.***> wrote:
Thanks for pushing on this @prjemian <https://github.com/prjemian> .
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#901 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARMUMHKNIOYPOTWTP37XEDTSDW6DANCNFSM4QR7ZBJQ>
.
|
I'm confused, |
Why is the CI taking 6 hours to run? |
No, this is semi-internal. It operates one layer below Status objects so it should not have anything to do with "poison pill". |
@tacaswell Can you help @prjemian get un-blocked here? This is close. I don't want to lose momentum on it again. |
The implementation of this reads reasonably, looking into if I can recreate the hang locally. |
Based on the age of this PR, is this still viable? |
We (@cooleyv and I) think we may be seeing this again. |
d2cc7bc
to
82dadb2
Compare
did the mechanical rebase (90% just whitespace and quotes due to auto-formatting), lets see if the hangs have gone away. |
9f50cfc
to
711be9f
Compare
See #757. This is one possible approach.
TO DO
set_and_wait
to accept a poison pillthreading.Event
or some other means of signaling that it should stop waiting.