Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution of Processing #481

Draft
wants to merge 164 commits into
base: master
Choose a base branch
from
Draft

Distribution of Processing #481

wants to merge 164 commits into from

Conversation

tfarago
Copy link
Contributor

@tfarago tfarago commented Apr 7, 2022

This will enable us to acquire data on a remote server and push it there via a ZMQ stream over network to processing nodes which can also be on different machines. Live preview will be able to subscribe to a stream of data and there will also be a stream splitter able to create a whole topology of processing nodes. The control will be done via concert, but only control, meaning mere synchronization of devices and data processing. All the rest will be done remotely, most probably via remote addons implemented as Tango servers for easy RPC access to them. More detailed info and diagrams to follow...

Todo

  • Bring back PCO time stamp addon from master (it was it addons.py and that got deleted) into decentralize-rebased-onto-master
  • swap decentralize by decentralize-rebased-onto-master
  • Release last centralized concert version -> @tfarago
  • make a diagram depicting current situation (Acquisition sends data to a Processor which sends the result to Addon via a Proxy, so the addon doesn't have to care if the processing happens locally or remotely) -> @sarkarchandan
  • update the docs -> @tfarago
  • test local/remote acquisitions and consumers
  • More tests
  • update concert-examples -> @sarkarchandan
  • [ ] add some benchmarks -> @tfarago -> this is far from important right now, do it later
  • fix all TODOs
  • squash all SQUASH commits
  • Replace non-database Tango server run methods with the normal ones or make it a choice
  • Decide if we need ConcertAsyncioRunner
  • Allow the usage of a tango-DB (I (@MarcusZuber) could add this is you like)
  • Clean up await await from "Make walker a ..." 4149771
  • remove_all_endpoints: @tfarago
  • zmq stream timeouts if dead: @tfarago
  • remote addons unregister from camera when detached: @tfarago + @MarcusZuber
  • allow sessions to define if they can be run twice: @tfarago
  • clean up stuff-after-decentralize + flake8: @MarcusZuber
  • merge stuff-after-decentralize here

For discussion:

  • [] @MarcusZuber: The camera.convert is mostly useless now, since the remote camera can not apply it. I don't see a simple way to do the convert on the camera side. How about moving the convert to the viewer? It is mostly used by the "beamline user" to not get confused when the camera is mounted upside down.

@tfarago
Copy link
Contributor Author

tfarago commented Apr 7, 2022

This first commit enables us to have local and remote acquisitions and consumers. It's mostly about interfaces and defines a way how to create acquisitions and addons. It also introduces an explicit _Consumer class which encapsulates the consuming coroutine function which is way more explicit than the weird arbitrary passing of functions before.

@tfarago tfarago force-pushed the decentralize branch 2 times, most recently from b909e95 to ff891b3 Compare April 8, 2022 14:11
@MarcusZuber
Copy link
Member

How about introducing a RemoteCamera, that sends to the Consumers instead of the RemoteAcquisition? Then the Experiments would not be different for local and remote acquisition.
But splitting it like you did looks nicer from the code structure (but most probable I have to create a lot of Experiment classes with all the combinations of remote/local etc. then).

@tfarago
Copy link
Contributor Author

tfarago commented Apr 11, 2022

Why are you doing this to me?! :-) I will try.

@codecov
Copy link

codecov bot commented Jul 1, 2022

Codecov Report

Attention: Patch coverage is 62.59422% with 1191 lines in your changes missing coverage. Please review.

Project coverage is 81.21%. Comparing base (80883d7) to head (7e7c608).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
concert/ext/tangoservers/camera.py 0.00% 182 Missing ⚠️
concert/experiments/addons/tango.py 0.00% 131 Missing ⚠️
concert/ext/tangoservers/walker.py 0.00% 115 Missing ⚠️
concert/ext/tangoservers/reco.py 0.00% 110 Missing ⚠️
concert/experiments/addons/base.py 61.51% 107 Missing ⚠️
concert/experiments/dummy.py 0.00% 75 Missing ⚠️
concert/ext/tangoservers/base.py 0.00% 75 Missing ⚠️
concert/devices/cameras/dummy.py 52.42% 49 Missing ⚠️
concert/experiments/base.py 66.42% 46 Missing ⚠️
concert/experiments/addons/local.py 69.60% 38 Missing ⚠️
... and 18 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #481      +/-   ##
==========================================
- Coverage   88.41%   81.21%   -7.21%     
==========================================
  Files         121      134      +13     
  Lines        8340    10772    +2432     
==========================================
+ Hits         7374     8748    +1374     
- Misses        966     2024    +1058     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

)
await self._params[arg].set(kwargs[arg])

@property
Copy link
Member

@MarcusZuber MarcusZuber Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make all 'properties' concert Quantities/Parameters? Otherwise this will be confusing (since there is no async access, which is mandatory for the others to use).

concert/ext/ufo.py Outdated Show resolved Hide resolved
@@ -232,26 +233,39 @@ async def __call__(self, producer):


class GeneralBackprojectArgs(object):
Copy link
Member

@MarcusZuber MarcusZuber Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you already break (slightly) the interface to the reco-args: I always found it a bit artificial that the reco-args and the Reco were splitted (and not the args where properties of the reco-class). What do you think about merging both classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep it in mind, let me finish up the reco and the remote version of it and then I'll see if it will still be reasonable to merge.


def run_server():
TangoBenchmarker.run_server(
args=['name', '-ORBendPoint', 'giop:tcp::1235', '-v4', '-nodb', '-dlist', 'a/b/c'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this everywhere to a proper DB-based start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe run them all without the DB access? Then we can spawn them without adding them to the DB. This depends how "fixed" a pipeline will be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable on startup. So one must specify that the dev-server runs without db (-nodb) and a port, or the user must provide a class name (or automatically get it, since this should be only Writer or OnlineReco) and an instance name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a db is present we should also use device_properties for default settings (e.g. root folder for the writer).

@tfarago tfarago force-pushed the decentralize branch 3 times, most recently from 0c99603 to 3da5890 Compare July 8, 2022 12:11
@tfarago tfarago marked this pull request as draft July 12, 2022 06:48
Tango device for the dummy concert camera sending images over a zmq socket.
"""

endpoint = attribute(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcusZuber do you know why Tango doesn't like inheriting attributes? If you run the TangoDummyCamera server you can set the endpoint without any problem, but try reading it and it just crashes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this. Could this be caused by the metaclass, that you define here and in the Device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comment is pretty old and I am not sure it's a problem anymore.

self._params_initialized = True

def dynamic_getter(self, attr):
# If this were async def Tango would complain about it never being awaited
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the reasons I will not change the format strings + eval in the reco server right now.

@tfarago
Copy link
Contributor Author

tfarago commented Jul 29, 2022

@MarcusZuber in experiments.base and addons package is a draft of how I see things could go, if you are too bored in the next three weeks you can take a look, I will get back to it then.

tfarago added a commit that referenced this pull request Aug 26, 2022
which will be dealt with in PR #481.
@tfarago tfarago mentioned this pull request Aug 26, 2022
tfarago added a commit that referenced this pull request Sep 12, 2022
which will be dealt with in PR #481.
@@ -124,7 +126,10 @@ async def __ainit__(self, device, experiment, acquisitions=None, do_normalizatio
await TangoMixin.__ainit__(self, device)

# Lock the device to prevent other processes from using it
self._device.lock()
try:
self._device.lock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need await?

@MarcusZuber
Copy link
Member

Last weekend I was no able to get the remote continuous sprial CT to run. Somehow the stream was started twice and then everything was messed up. We need integration tests for all typical use cases of experiments. (I was also quite tired, so there is a chance that I did something stupid.)

@MarcusZuber
Copy link
Member

After working a few days with the remote-addons I have this suggestion (I needed to attach/detatch addons often in run-time, and keeping track of the enpoints is a bit annoying. Especially, since everything breaks if you run an experiment with not matching addons/enpoints):

The experiment has a defined property 'camera' or implements a register/unregister enpoint for the camera.
The remote-addons store a CommData for "their" stream and do the register/unresgister in the attach/detatch functions.

@MarcusZuber
Copy link
Member

About the tango-timeouts: I'm still favoring the buisy-wait a bit since it can handle network problems better than setting the timeout to infinity.
If we stay with the blocking-call we can use this command-call https://pytango.readthedocs.io/en/stable/client_api/device_proxy.html#tango.DeviceProxy.command_inout to specify the timeout only for one command call.

@MarcusZuber
Copy link
Member

I would like to have a camera.get_endpoints and/or camea.remove_all_endpints for "house-keeping". Also a way to stop a RemoteWalker when it is writing. Currently, when the camera crashes you have to kill the walker with -9 since there is no other way to get out of the blocking frame-catching.

@tfarago
Copy link
Contributor Author

tfarago commented Sep 13, 2024

About the tango-timeouts: I'm still favoring the buisy-wait a bit since it can handle network problems better than setting the timeout to infinity. If we stay with the blocking-call we can use this command-call https://pytango.readthedocs.io/en/stable/client_api/device_proxy.html#tango.DeviceProxy.command_inout to specify the timeout only for one command call.

Consensus: motors have timeouts, processing tango servers not and if this becomes a problem future us will take care of this.

@tfarago
Copy link
Contributor Author

tfarago commented Sep 13, 2024

I would like to have a camera.get_endpoints and/or camea.remove_all_endpints for "house-keeping". Also a way to stop a RemoteWalker when it is writing. Currently, when the camera crashes you have to kill the walker with -9 since there is no other way to get out of the blocking frame-catching.

I will do this on the uca-net and concert sides and also implement zmq timeouts to detect dead streams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants