Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeoutError from EpicsSignal #776

Closed
prjemian opened this issue Nov 25, 2019 · 23 comments
Closed

TimeoutError from EpicsSignal #776

prjemian opened this issue Nov 25, 2019 · 23 comments
Labels

Comments

@prjemian
Copy link
Contributor

One of the chronic problems that plague the APS beam lines from relying on Bluesky for user operations is TimeoutError: Failed to read {some PV} within None sec that happens, seemingly at random. When this happens, the exception stops the running plans completely. Keep in mind, such timeouts are not on initial connection but happen after innumerable previous successful EPICS CA monitor updates on the same. For example, sometimes it fails on the engineering units of an EPICS motor (motor.EGU) that, on point number 47 of a step scan, raises a TimeoutError. The previous steps did not raise this error.

In [15]: si3.hgap                                                                                                     
Out[15]: EpicsMotor(prefix='8idi:Slit3Hsize', name='si3_hgap', parent='si3', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])

In [16]: si3.hgapCA.Client.Exception...............................................                                   
    Warning: "Virtual circuit unresponsive"
    Context: "lambda.xray.aps.anl.gov:5064"
    Source File: ../tcpiiu.cpp line 920
    Current Time: Tue Nov 19 2019 15:02:30.072242898
..................................................................
Unexpected problem with CA circuit to server "lambda.xray.aps.anl.gov:5064" was "Connection reset by peer" - disconnecting
CA.Client.Exception...............................................
    Warning: "Virtual circuit disconnect"
    Context: "lambda.xray.aps.anl.gov:5064"
    Source File: ../cac.cpp line 1225
    Current Time: Tue Nov 19 2019 15:03:52.569633839
..................................................................
In [16]: si3.hgap                                                                                                     
Out[16]: EpicsMotor(prefix='8idi:Slit3Hsize', name='si3_hgap', parent='si3', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])

/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/epics/ca.py:1459: UserWarning: ca.get('8idi:m54.TDIR') timed out after 1.00 seconds.
  warnings.warn(msg % (name(chid), timeout))
Subscription value callback exception (EpicsSignalRO(read_pv='8idi:m54.DMOV', name='samplestage_x_motor_done_move', parent='samplestage_x', value=1, timestamp=1574696427.776781, auto_monitor=True, string=False))
Traceback (most recent call last):
  File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/ophydobj.py", line 471, in inner
    cb(*args, **kwargs)
  File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/device.py", line 1603, in wrapped
    ret = func(self, *args, **kwargs)
  File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/epics_motor.py", line 246, in _move_changed
    if self.direction_of_travel.get() == 0:
  File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/signal.py", line 830, in get
    raise TimeoutError(f'Failed to read {self._read_pvname} within '
TimeoutError: Failed to read 8idi:m54.TDIR within None sec
In [17]: si3.hgap                                                                                                     
Out[17]: EpicsMotor(prefix='8idi:Slit3Hsize', name='si3_hgap', parent='si3', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])

Here is an example from interactive use where the .TDIR (motor target direction changed sign, went the other way) raised a TimeoutError. ironically, this field is of no use to the bluesky user so this failure is unnecessary, at best (that's an ophyd device configuration issue). Of concern is the timeout with None sec and why a timeout at all.

By the way,

In [21]: print(APS_utils.print_RE_md(printing=False))                                                                 
=========== ================================
key         value                           
=========== ================================
beamline_id APS TRR 8-ID-I                  
login_id    8idiuser@quartz.xray.aps.anl.gov
pid         7116                            
proposal_id None                            
scan_id     210                             
versions    =========== ========            
            key         value               
            =========== ========            
            apstools    1.1.14              
            bluesky     1.5.4               
            databroker  0.13.0              
            epics       3.4.0               
            matplotlib  3.1.0               
            numpy       1.16.4              
            ophyd       1.4.0rc1            
            pyRestTable 2020.0.2            
            spec2nexus  2021.1.3            
            =========== ========            
=========== ================================
In [22]: print(APS_utils.device_read2table(si3.hgap))                                                                 
====================== ================== ==========================
name                   value              timestamp                 
====================== ================== ==========================
si3_hgap               15.000060000000001 2019-11-19 14:00:00.379946
si3_hgap_user_setpoint 15.000000000000002 2019-11-19 14:00:00.379946
====================== ================== ==========================
@prjemian prjemian added the bug label Nov 25, 2019
@prjemian
Copy link
Contributor Author

And, another example during a scan:

In [1]:                                                                                                                                                                                           

In [1]: samplestage.x                                                                                                                                                                             
Out[1]: EpicsMotor(prefix='8idi:m54', name='samplestage_x', parent='samplestage', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])

In [2]: mov scaler1.preset_time 0.1 
   ...:                                                                                                                                                                                           

In [3]: RE(bp.rel_scan([scaler1,lakeshore],samplestage.x,-2.0, 2.0, 41))                                                                                                                          
Transient Scan ID: 211     Time: 2019-11-25 10:14:47
Persistent Unique Scan ID: 'c278bc5b-76ac-4ae5-952c-7a949d036975'
New stream: 'baseline'
Start-of-run baseline readings:
+--------------------------------+--------------------------------+
+--------------------------------+--------------------------------+
New stream: 'aps_current_monitor'
New stream: 'primary'                                                                                                                                                                             
+-----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
|   seq_num |       time | samplestage_x |   Timebase |      pind1 |      pind2 | pind3_APD_ |      pind4 |       pdbs |      I_APS |      I0Mon |        APD |    cstar_l |    cstar_h |     oxygen |
+-----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
|         1 | 10:14:54.4 |      7.584250 |    1000000 |        104 |         77 |          1 |        266 |         59 |      50979 |          2 |          1 |          2 |          0 |          1 |
|         2 | 10:14:55.6 |      7.684250 |    1000000 |        104 |         78 |          1 |        265 |         60 |      50964 |          2 |          1 |          2 |          0 |          1 |
|         3 | 10:14:56.7 |      7.784250 |    1000000 |        104 |         78 |          1 |        266 |         59 |      50960 |          1 |          1 |          1 |          0 |          1 |
|         4 | 10:14:57.9 |      7.884250 |    1000000 |        104 |         78 |          1 |        266 |         60 |      50954 |          3 |          1 |          3 |          0 |          1 |
|         5 | 10:14:59.0 |      7.984250 |    1000000 |        104 |         78 |          1 |        265 |         60 |      50954 |          2 |          1 |          2 |          0 |          1 |
|         6 | 10:15:00.2 |      8.084250 |    1000000 |        104 |         78 |          1 |        265 |         59 |      50953 |          4 |          1 |          5 |          1 |          1 |
|         7 | 10:15:01.3 |      8.184250 |    1000000 |        104 |         78 |          1 |        265 |         60 |      50952 |          0 |          1 |          0 |          0 |          1 |
|         8 | 10:15:02.5 |      8.284250 |    1000000 |        104 |         78 |          1 |        265 |         60 |      50952 |          3 |          1 |          3 |          0 |          1 |
|         9 | 10:15:03.6 |      8.384250 |    1000000 |        104 |         78 |          1 |        265 |         59 |      50952 |          0 |          1 |          0 |          0 |          1 |
|        10 | 10:15:04.7 |      8.484250 |    1000000 |        104 |         78 |          1 |        265 |         59 |      50952 |          0 |          1 |          0 |          0 |          1 |
/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/epics/ca.py:1459: UserWarning: ca.get('8idi:m54.EGU') timed out after 1.00 seconds.
  warnings.warn(msg % (name(chid), timeout))
[E 10:15:08.818 run_engine:1261] Run aborted                                                                                                                                                      
    Traceback (most recent call last):
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/run_engine.py", line 1150, in _run
        msg = self._plan_stack[-1].send(resp)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 1294, in __call__
        return (yield from plan)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 1149, in baseline_wrapper
        return (yield from plan_mutator(plan, insert_baseline))
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 169, in plan_mutator
        raise ex
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 122, in plan_mutator
        msg = plan_stack[-1].send(ret)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 790, in monitor_during_wrapper
        return (yield from plan2)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 169, in plan_mutator
        raise ex
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 122, in plan_mutator
        msg = plan_stack[-1].send(ret)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 169, in plan_mutator
        raise ex
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 122, in plan_mutator
        msg = plan_stack[-1].send(ret)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 848, in fly_during_wrapper
        return (yield from plan2)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 169, in plan_mutator
        raise ex
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 122, in plan_mutator
        msg = plan_stack[-1].send(ret)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 169, in plan_mutator
        raise ex
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 122, in plan_mutator
        msg = plan_stack[-1].send(ret)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/plans.py", line 1023, in rel_scan
        return (yield from inner_rel_scan())
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/utils.py", line 1011, in dec_inner
        return (yield from plan)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 1107, in reset_positions_wrapper
        reset()))
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 500, in finalize_wrapper
        ret = yield from plan
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 77, in plan_mutator
        msg = plan_stack[-1].throw(exception)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/utils.py", line 1011, in dec_inner
        return (yield from plan)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 1062, in relative_set_wrapper
        return (yield from plan)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 244, in msg_mutator
        ret = yield msg
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 194, in plan_mutator
        inner_ret = yield msg
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 194, in plan_mutator
        inner_ret = yield msg
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py", line 194, in plan_mutator
        inner_ret = yield msg
      [Previous line repeated 3 more times]
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/run_engine.py", line 1208, in _run
        new_response = yield from coro(msg)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/asyncio/coroutines.py", line 120, in coro
        res = func(*args, **kw)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/run_engine.py", line 1955, in _set
        ret = msg.obj.set(*msg.args, **kwargs)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/positioner.py", line 88, in set
        timeout=timeout)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/utils/epics_pvs.py", line 198, in wrapper
        return fcn(self, *args, **kwargs)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/epics_motor.py", line 149, in move
        status = super().move(position, **kwargs)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/positioner.py", line 189, in move
        settle_time=self._settle_time)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/status.py", line 437, in __init__
        self._unit = getattr(self.pos, 'egu', None)
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/epics_motor.py", line 94, in egu
        return self.motor_egu.get()
      File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/signal.py", line 830, in get
        raise TimeoutError(f'Failed to read {self._read_pvname} within '
    TimeoutError: Failed to read 8idi:m54.EGU within None sec
+-----------+------------+---------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
generator rel_scan ['c278bc5b'] (scan num: 211)



---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
<ipython-input-3-64bd7e33779e> in <module>
----> 1 RE(bp.rel_scan([scaler1,lakeshore],samplestage.x,-2.0, 2.0, 41))

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/run_engine.py in __call__(self, *args, **metadata_kw)
    751                     # it (unless it is a canceled error)
    752                     if exc is not None:
--> 753                         raise exc
    754 
    755             if self._interrupted:

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/run_engine.py in _run(self)
   1260             self._reason = str(err)
   1261             self.log.exception("Run aborted")
-> 1262             raise err
   1263         finally:
   1264             # Some done_callbacks may still be alive in other threads.

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/run_engine.py in _run(self)
   1148                     else:
   1149                         try:
-> 1150                             msg = self._plan_stack[-1].send(resp)
   1151                         # We have exhausted the top generator
   1152                         except StopIteration:

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in __call__(self, plan)
   1292         plan = monitor_during_wrapper(plan, self.monitors)
   1293         plan = baseline_wrapper(plan, self.baseline)
-> 1294         return (yield from plan)

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in baseline_wrapper(plan, devices, name)
   1147         return (yield from plan)
   1148     else:
-> 1149         return (yield from plan_mutator(plan, insert_baseline))
   1150 
   1151 

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    167                     continue
    168                 else:
--> 169                     raise ex
    170         # if inserting / mutating, put new generator on the stack
    171         # and replace the current msg with the first element from the

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    120             ret = result_stack.pop()
    121             try:
--> 122                 msg = plan_stack[-1].send(ret)
    123             except StopIteration as e:
    124                 # discard the exhausted generator

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in monitor_during_wrapper(plan, signals)
    788     plan1 = plan_mutator(plan, insert_after_open)
    789     plan2 = plan_mutator(plan1, insert_before_close)
--> 790     return (yield from plan2)
    791 
    792 

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    167                     continue
    168                 else:
--> 169                     raise ex
    170         # if inserting / mutating, put new generator on the stack
    171         # and replace the current msg with the first element from the

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    120             ret = result_stack.pop()
    121             try:
--> 122                 msg = plan_stack[-1].send(ret)
    123             except StopIteration as e:
    124                 # discard the exhausted generator

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    167                     continue
    168                 else:
--> 169                     raise ex
    170         # if inserting / mutating, put new generator on the stack
    171         # and replace the current msg with the first element from the

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    120             ret = result_stack.pop()
    121             try:
--> 122                 msg = plan_stack[-1].send(ret)
    123             except StopIteration as e:
    124                 # discard the exhausted generator

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in fly_during_wrapper(plan, flyers)
    846     plan1 = plan_mutator(plan, insert_after_open)
    847     plan2 = plan_mutator(plan1, insert_before_close)
--> 848     return (yield from plan2)
    849 
    850 

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    167                     continue
    168                 else:
--> 169                     raise ex
    170         # if inserting / mutating, put new generator on the stack
    171         # and replace the current msg with the first element from the

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    120             ret = result_stack.pop()
    121             try:
--> 122                 msg = plan_stack[-1].send(ret)
    123             except StopIteration as e:
    124                 # discard the exhausted generator

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    167                     continue
    168                 else:
--> 169                     raise ex
    170         # if inserting / mutating, put new generator on the stack
    171         # and replace the current msg with the first element from the

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    120             ret = result_stack.pop()
    121             try:
--> 122                 msg = plan_stack[-1].send(ret)
    123             except StopIteration as e:
    124                 # discard the exhausted generator

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/plans.py in rel_scan(detectors, num, per_step, md, *args)
   1021                                 per_step=per_step, md=_md))
   1022 
-> 1023     return (yield from inner_rel_scan())
   1024 
   1025 

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/utils.py in dec_inner(*inner_args, **inner_kwargs)
   1009                 plan = gen_func(*inner_args, **inner_kwargs)
   1010                 plan = wrapper(plan, *args, **kwargs)
-> 1011                 return (yield from plan)
   1012             return dec_inner
   1013         return dec

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in reset_positions_wrapper(plan, devices)
   1105 
   1106     return (yield from finalize_wrapper(plan_mutator(plan, insert_reads),
-> 1107                                         reset()))
   1108 
   1109 

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in finalize_wrapper(plan, final_plan, pause_for_debug)
    498     cleanup = True
    499     try:
--> 500         ret = yield from plan
    501     except GeneratorExit:
    502         cleanup = False

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
     75             # if we have a stashed exception, pass it along
     76             try:
---> 77                 msg = plan_stack[-1].throw(exception)
     78             except StopIteration as e:
     79                 # discard the exhausted generator

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/utils.py in dec_inner(*inner_args, **inner_kwargs)
   1009                 plan = gen_func(*inner_args, **inner_kwargs)
   1010                 plan = wrapper(plan, *args, **kwargs)
-> 1011                 return (yield from plan)
   1012             return dec_inner
   1013         return dec

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in relative_set_wrapper(plan, devices)
   1060     plan = plan_mutator(plan, insert_reads)
   1061     plan = msg_mutator(plan, rewrite_pos)
-> 1062     return (yield from plan)
   1063 
   1064 

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in msg_mutator(plan, msg_proc)
    242                 ret = None
    243                 continue
--> 244             ret = yield msg
    245         except StopIteration as e:
    246             return e.value

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    192         try:
    193             # yield out the 'current message' and collect the return
--> 194             inner_ret = yield msg
    195         except GeneratorExit:
    196             # special case GeneratorExit.  We must clean up all of our plans

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    192         try:
    193             # yield out the 'current message' and collect the return
--> 194             inner_ret = yield msg
    195         except GeneratorExit:
    196             # special case GeneratorExit.  We must clean up all of our plans

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    192         try:
    193             # yield out the 'current message' and collect the return
--> 194             inner_ret = yield msg
    195         except GeneratorExit:
    196             # special case GeneratorExit.  We must clean up all of our plans

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    192         try:
    193             # yield out the 'current message' and collect the return
--> 194             inner_ret = yield msg
    195         except GeneratorExit:
    196             # special case GeneratorExit.  We must clean up all of our plans

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    192         try:
    193             # yield out the 'current message' and collect the return
--> 194             inner_ret = yield msg
    195         except GeneratorExit:
    196             # special case GeneratorExit.  We must clean up all of our plans

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/preprocessors.py in plan_mutator(plan, msg_proc)
    192         try:
    193             # yield out the 'current message' and collect the return
--> 194             inner_ret = yield msg
    195         except GeneratorExit:
    196             # special case GeneratorExit.  We must clean up all of our plans

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/run_engine.py in _run(self)
   1206                         # exceptions (coming in via throw) can be
   1207                         # raised
-> 1208                         new_response = yield from coro(msg)
   1209 
   1210                     # special case `CancelledError` and let the outer

~/.conda/envs/bluesky/lib/python3.7/asyncio/coroutines.py in coro(*args, **kw)
    118         @functools.wraps(func)
    119         def coro(*args, **kw):
--> 120             res = func(*args, **kw)
    121             if (base_futures.isfuture(res) or inspect.isgenerator(res) or
    122                     isinstance(res, CoroWrapper)):

~/.conda/envs/bluesky/lib/python3.7/site-packages/bluesky/run_engine.py in _set(self, msg)
   1953         group = kwargs.pop('group', None)
   1954         self._movable_objs_touched.add(msg.obj)
-> 1955         ret = msg.obj.set(*msg.args, **kwargs)
   1956         p_event = asyncio.Event(loop=self.loop)
   1957         pardon_failures = self._pardon_failures

~/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/positioner.py in set(self, new_position, timeout, moved_cb, wait)
     86         """
     87         return self.move(new_position, wait=wait, moved_cb=moved_cb,
---> 88                          timeout=timeout)
     89 
     90     def stop(self, *, success: bool =False):

~/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/utils/epics_pvs.py in wrapper(self, *args, **kwargs)
    196     def wrapper(self, *args, **kwargs):
    197         if self.connected:
--> 198             return fcn(self, *args, **kwargs)
    199         else:
    200             raise DisconnectedError('{} is not connected'.format(self.name))

~/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/epics_motor.py in move(self, position, wait, **kwargs)
    147         self._started_moving = False
    148 
--> 149         status = super().move(position, **kwargs)
    150         self.user_setpoint.put(position, wait=False)
    151         try:

~/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/positioner.py in move(self, position, moved_cb, timeout)
    187 
    188         status = MoveStatus(self, position, timeout=timeout,
--> 189                             settle_time=self._settle_time)
    190 
    191         if moved_cb is not None:

~/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/status.py in __init__(self, positioner, target, start_ts, **kwargs)
    435         self.finish_pos = None
    436 
--> 437         self._unit = getattr(self.pos, 'egu', None)
    438         self._precision = getattr(self.pos, 'precision', None)
    439         self._name = self.pos.name

~/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/epics_motor.py in egu(self)
     92     def egu(self):
     93         '''The engineering units (EGU) for a position'''
---> 94         return self.motor_egu.get()
     95 
     96     @property

~/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/signal.py in get(self, as_string, connection_timeout, form, **kwargs)
    828         if info is None:
    829             timeout = kwargs.get('timeout', None)
--> 830             raise TimeoutError(f'Failed to read {self._read_pvname} within '
    831                                f'{timeout} sec')
    832 

TimeoutError: Failed to read 8idi:m54.EGU within None sec

In [4]:                                                                                                                                                                                           

@danielballan
Copy link
Member

I suspect this is a bug in ophyd, pyepics, or more specifically the integration between the two (which lives in ophyd). I notice you are running on an ophyd prerelease, 1.4.0rc1. Have you ever seen this on a version in the 1.3.x series?

@prjemian
Copy link
Contributor Author

prjemian commented Nov 25, 2019 via email

@prjemian
Copy link
Contributor Author

prjemian commented Nov 25, 2019 via email

@prjemian
Copy link
Contributor Author

BUT, this line in the error dump:

  File "/home/beams/8IDIUSER/.conda/envs/bluesky/lib/python3.7/site-packages/ophyd/epics_motor.py", line 246, in _move_changed

calls .get() (from if self.direction_of_travel.get() == 0:) from the underlying pyepics support. That support should supply a valid default for the timeout kwarg according to this documentation: timeout: maximum time to wait for value to be received. (default = 0.5 + log10(count) seconds)

Within PyEpics, there are PV objects and Motor objects. This is a PV object. The call chain appears to be:

  • PV.get() # (no timeout kwarg, becomes None by default)
  • -> PV.get_with_metadata() # (timeout None input)
  • -> ca.get_with_metadata() # (timeout=None input)
  • -> ca.get_complete_with_metadata() # (timeout=None input)

Here, timeout is None is handled:

if timeout is None:
    timeout = 1.0 + log10(max(1, count))

and then watches for a timeout.

Is there code in ophyd that raises the TimeoutError exception? Not seeing that exception from PyEpics. see this table:

repo TimeoutError in code
PyEpics 0 instances
ophyd 17 instances

Time to look for the relevant TimeoutError in ophyd.

@prjemian
Copy link
Contributor Author

TimeoutError not raised in ophyd/epics_motor.py, only passed along. It comes from this set of lines:

ophyd/ophyd/signal.py

Lines 824 to 831 in 9da7884

self.wait_for_connection(timeout=connection_timeout)
info = self._read_pv.get_with_metadata(as_string=as_string, form=form,
**kwargs)
if info is None:
timeout = kwargs.get('timeout', None)
raise TimeoutError(f'Failed to read {self._read_pvname} within '
f'{timeout} sec')

In the wrong set of circumstances that trigger our problem, the info object is set to None. Then, on this line: timeout = kwargs.get('timeout', None) the timeout (not specified) will be reported incorrectly as None. Shouldn't it be reported based on the actual time waited (since PyEpics has default for timeout=None)? Could be done with an actual timer here.

The value of the timeout is not the real problem, though, only a symptom. Why do we get a timeout at all? First off, though, we don't know how long we have waited.

@prjemian
Copy link
Contributor Author

prjemian commented Nov 26, 2019

Actually, why is None being returned from self._read_pv.get_with_metadata()? PyEpics has called warnings with this code:

    if time.time()-t0 > timeout:
        msg = "ca.get('%s') timed out after %.2f seconds."
        warnings.warn(msg % (name(chid), timeout))
        return None

The ophyd.signal.EpicsSignalBase.get_complete_with_metadata() code should handle a possible result from warnings. One way is outlined in the SO post. I suspect this change will help to reveal the underlying problem where self._read_pv.get_with_metadata() returns None.

@prjemian
Copy link
Contributor Author

Here's the simple example from SO:

import warnings

def fxn():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings(record=True) as w:
    # Cause all warnings to always be triggered.
    warnings.simplefilter("always")
    # Trigger a warning.
    fxn()
    # Verify some things
    assert len(w) == 1
    assert issubclass(w[-1].category, DeprecationWarning)
    assert "deprecated" in str(w[-1].message)

@prjemian
Copy link
Contributor Author

Using warnings.catch_warnings(record=True) comes at a price we cannot pay while waiting for EPICS PV values to come in from the network environment:

The catch_warnings() manager works by replacing and then later restoring the module’s showwarning() function and internal list of filter specifications. This means the context manager is modifying global state and therefore is not thread-safe.

@prjemian
Copy link
Contributor Author

Perhaps converting these warnings to UserWarning exceptions is thread-safe:

code:

import warnings

def function1():
    for action in ("error", "default"):
        print("-"*20)
        warnings.filterwarnings(action)
        try:
            function2()
            print(f"no exception from action {action}")
        except UserWarning as exc:
            print(f"caught: {exc}")

def function2():
    warnings.warn("example warning message")

function2()
function1()

output:

In [2]: function2()                                                                                                                                                     
/home/mintadmin/Apps/anaconda/envs/db-test/bin/ipython:14: UserWarning: example warning message

In [3]: function1()                                                                                                                                                     
--------------------
caught: example warning message
--------------------
/home/mintadmin/Apps/anaconda/envs/db-test/bin/ipython:14: UserWarning: example warning message
no exception from action default

In [4]:                                                                                                                                                                 

@prjemian
Copy link
Contributor Author

One possible revision of

ophyd/ophyd/signal.py

Lines 824 to 831 in 9da7884

self.wait_for_connection(timeout=connection_timeout)
info = self._read_pv.get_with_metadata(as_string=as_string, form=form,
**kwargs)
if info is None:
timeout = kwargs.get('timeout', None)
raise TimeoutError(f'Failed to read {self._read_pvname} within '
f'{timeout} sec')
is as follows:

    self.wait_for_connection(timeout=connection_timeout) 
    try:
        warnings.filterwarnings("error")
        t0 = time.time()
        info = self._read_pv.get_with_metadata(
                    as_string=as_string, form=form, **kwargs) 
    except UserWarning as exc:
        print(f"while getting from {self._read_pvname}, caught '{exc}'")
    finally:
        dt = time.time() - t0
        warnings.filterwarnings("default")
    
    if info is None: 
        timeout = kwargs.get('timeout', None) 
        raise TimeoutError(f'Failed to read {self._read_pvname} in ' 
                           f'{dt:.3f} sec (with timeout={timeout})') 

@prjemian
Copy link
Contributor Author

As commented for APS 8-ID-I XPCS, a ...

combination of [connect() or get()] retries and direct ophyd handling of the various timeouts (connection, get, put) is the first hypothesis to resolve this issue.

@prjemian
Copy link
Contributor Author

@kmpeters reported similar experience with PyEpics (not bluesky):

I've seen TimeoutError exceptions in fly-scan python code using pyepics running on Windows at (APS) 34ID-E. The problem occurred infrequently (less frequently than once every few weeks). The PV that was being accessed at the time of the timeout would change each time. Sometimes the timeout was on a get and other times it was on put. The PV that was being accessed when the exception occurred was always available via the command line immediately after the exception occurred. I wrote safer put and get functions that handled the timeout exception and printed info about the problem and I haven't heard from users about the problem since then.

I don't remember the python, pyepics3, and EPICS lib versions. I can look them up on the next shutdown day.

@prjemian
Copy link
Contributor Author

@kmpeters - Can you post a snippet of the safer put() and get() codes? That would make a great starting point for this issue.

@kmpeters
Copy link

kmpeters commented Dec 10, 2019

It looks like the latest version of my fly-scan code only has a safer put():

	#
	# Debugging
	#

	def debugPut(self, pv, val, wait=False, use_complete=False, timeout=30.0):
		#
		exitFlag = False

		while ( not exitFlag ):
			try:
				pv.put(val, wait=wait, use_complete=use_complete, timeout=timeout)
			except WindowsError as e:
				print
				print "EPICS put FAILED:"
				print "\tPV =", pv.pvname
				print "\tval =", val
				print "\twait =", wait, "; use_complete =", use_complete, "; timeout =", timeout
				print "\tconnected =", pv.connected, "; host =", pv.host
				print
				print "A WindowsError occurred:"
				# The following line results in an AttributeError. Apparently e.type doesn't exist
				#!print "\t", e.type
				print "\t", e
				print "\t", e.args
				print
				print "Backtrace:"
				traceback.print_stack()
				print
				time.sleep(15.00)
			else:
				exitFlag = True

It also looks like it wasn't a TimeoutError

@prjemian
Copy link
Contributor Author

prjemian commented Dec 10, 2019 via email

@kmpeters
Copy link

The safer get() function I was remembering was much older. It was used to handle exceptions on read timeouts using CaChannel.

@prjemian
Copy link
Contributor Author

@kmpeters : Looking over the code you supplied to make a safer put(), it seems somewhat similar to the method I described above, which converts the warnings posted by PyEpics into an actual UserError which can be trapped. In your code, a WindowsError is trapped.

I'll begin testing the idea of trapping a warnings call as an exception (UserError).

@prjemian
Copy link
Contributor Author

@kmpeters: Trapping the warnings call from PyEpics as a UserException was successful. Then, implementing the safer_get() including retries with shorter, but escalating, timeout was successfule as well. Thanks, again, for the assist!

@tacaswell
Copy link
Contributor

I think bandaid_get is a better name than safer_get. I left a long comment over at aps-8id-dys/ipython-8idiuser#114 (comment) about why I am not sure re-trying is the best solution.

@prjemian
Copy link
Contributor Author

The first item to resolve in this issue is to capture the warnings content from PyEpics when it is posted. I'll file a PR on this specifically as resolution to this issue. That won't stop the FailedStatus exception noted in aps-8id-dys/ipython-8idiuser#114 but at least we'll know for sure that PyEpics has given us a timeout.

@prjemian
Copy link
Contributor Author

prjemian commented Dec 20, 2019

note: not seeing any TimeoutError exceptions on my VM, need to try at APS beam line 8-ID-I

test code: https://github.com/prjemian/bstesting/blob/master/test4.py

test database when not at APS 8-ID-I: https://github.com/prjemian/bstesting/blob/master/8idi.db

softIoc -d 8idi.db

@prjemian
Copy link
Contributor Author

With #788, I'm declaring victory here and closing the issue as SOLVED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants