Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Write bigger picture #232

Open
giulioungaretti opened this issue Jun 10, 2016 · 12 comments
Open

docs: Write bigger picture #232

giulioungaretti opened this issue Jun 10, 2016 · 12 comments
Labels
docs Related to docs improvements

Comments

@giulioungaretti
Copy link
Contributor

We definitely need to fill in the doc in the guide before adding features or things.
The reason for p1 is that it can be hard to figure out by using qcodes, what things do and why they are here or implement a certain way.

See f.ex #223.

@giulioungaretti giulioungaretti added this to the Documentation milestone Jun 10, 2016
@giulioungaretti
Copy link
Contributor Author

@MerlinSmiles
Copy link
Contributor

MerlinSmiles commented Jun 10, 2016

Yes, I would love to see a big picture explanation on the qcodes magics 😍
And it will be very crucial in getting new users on track.

I guess @alexcjohnson is the only one who can do this at the moment?

@alexcjohnson
Copy link
Contributor

I'm hoping @giulioungaretti can write it with me kibitzing, that way he learns the code better, we get a chance to talk about what might want to change, and we make sure that the explanations are clear to at least 2 people.

@giulioungaretti
Copy link
Contributor Author

giulioungaretti commented Jun 10, 2016

yeah @alexcjohnson that is a good idea. Maybe it would be nice if you could start with an overview of the current design main ideas, so it's also easier to just start diving into the code for everybody.
Then I will update and deepen as I go, what you think ?

@giulioungaretti
Copy link
Contributor Author

Actually, I would say that I will take for ever for me to figure out a lot of whys just looking at the code.
The what is rather easy, but there's a lot of decisions that were taken which I think I can't get from the code.

@giulioungaretti
Copy link
Contributor Author

First list of why for @alexcjohnson

  • re: instrument/function
    • existence
    • use cases
    • difference between parameter
  • re: command.py
    • why such a wrapper is required
    • example use cases
  • re: instrument/parameter.py
    • design of the functionality (what does it enable, etc )
    • why a class not a property
    • example use cases

General:

  • overview of moving parts, who's responsible for what.

ps, Some of these whys could also be extracted from the code, but I think it neater if we start with this maybe dumber/verbose approach and slim it down later
pps: 🍦 🍦 🍦 🍦 🍦 🍦

@alexcjohnson
Copy link
Contributor

re: instrument/parameter

A Parameter is the interface to some state variable(s) of your system. Any quantity you want to set or measure as part of an experiment loop should be a Parameter. Most often aParameteris connected to anInstrument` but they can also be independent, like "meta-parameters" that measure various values from various instruments and do some calculations to return a higher-level result.

Why a class rather than a property: Parameter combines the actions (get and/or set methods) to use it, plus information (label, units...) that associate meaning with the Parameter (and can be copied to the data structures we make from them). It also handles functionality like validating setpoints, creating sweeps, and (via DeferredOperations) callable calculated values. A property would handle get and set but would be awkward with the rest. There's also a philosophical argument that Python properties, as they look like they're just attributes, would not be expected to do something as active (and potentially time consuming) as calling out to hardware.

A Parameter that is connected to an instrument may have its get and/or set commands defined by a command string (to be sent to the instrument's communication channel) rather than writing a completely new method. This means most parameters (anything that maps directly onto an instrument command) can be defined just be a self.add_parameter call in the instrument constructor.

re: instrument/function

A Function is the interface to some instrument capability that does not relate to a single state variable - like reset, or trigger. A Function only has one operation it supports: __call__, and it can't be used (directly) as a data source, so you might ask why you'd make a Function and not just a method of the instrument. In fact, there isn't a hard distinction between the two, but Function is primarily intended for actions that map directly onto instrument commands, so you can use a command string and avoid writing a new method at all.

Function does support callables for commands, but if you need to make a callable anyway you're probably better off just making a method in the first place, you don't gain much by wrapping it in a Function. Also, Function only supports a fixed number of positional arguments and no kwargs. But one thing you gain from a Function over just a method is simpler argument validation: just specify a Validator object for each argument, and avoid the boilerplate of type and value testing.

re: Command

Command is not really intended for end users - just used internally by Parameter and Function to abstract the creation of a callable from an input command that may either be a callable already, or a string, and if it's a string it may have other pieces for pre- or post-processing.

@giulioungaretti
Copy link
Contributor Author

giulioungaretti commented Jun 18, 2016

Nice! Will start expanding on this soon.
two more things that can be useful (maybe even to sketch ?)

  • data flows
  • control flows

@giulioungaretti
Copy link
Contributor Author

Also, do you think you can sketch a class diagram of some sort ?

@alexcjohnson
Copy link
Contributor

Alright, let me try to describe some of these data and control flows.

Control flow when you create an Instrument

If you specify server_name=None

This makes a local instrument. So if you say instrument = MyInstrument(*args, server_name=None, **kwargs) the normal things happen:

  • instrument is created as an instance of MyInstrument, with all the methods in its and its superclasses' class definitions.
  • MyInstrument.__init__ adds parameters and functions and initializes the communication channel (in the main process), and the DelegateAttributes inheritance ensures that these are accessible as attributes of the instrument.
  • Most likely you can't run a background Loop with this instrument, because something in the communication channel will not be picklable to pass to the Loop. But even if you could, you shouldn't because any state held by the instrument (generally held to mirror what we know is on the hardware after we've set or measured it) will be copied, and will therefore get out of sync with the state held in the main process copy of the instrument
  • This instance is recorded in MyInstrument.instances, and (if it's the most recent one) will be used if you ask to run a test suite for MyInstrument

If you specify a server, or use the default server

This makes a remote instrument. So if you say instrument = MyInstrument(*args, server_name=<'' (default) or 'some_string'>, **kwargs) things are more complicated:

  • instrument is created as an instance of RemoteInstrument (NOT MyInstrument even though that's what you asked to instantiate). RemoteInstrument does NOT inherit from Instrument or MyInstrument, and has very few methods and attributes of its own.
  • RemoteInstrument.__init__ looks for an InstrumentServerManager with the appropriate server_name.
    • If server_name is not specified, the class method MyInstrument.default_server_name is used to generate the name.
    • If a manager with this name is not found, one is created, which spawns a separate process that runs an InstrumentServer. the InstrumentServerManager talks to the InstrumentServer through a pair of multiprocessing.Queue objects. shared_kwargs are passed along to the InstrumentServer on process initialization, so they can contain some things (like Queues) that cannot normally be pickled. Thus the first instrument added to a given server must contain all the shared_kwargs that any instrument you will add to this server will need (but this can be a superset of what later instruments will need).
  • RemoteInstrument.__init__ asks the InstrumentServer (via its manager) to instantiate MyInstrument, with all the same args and kwargs as the initial construction except that it will get server_name=None.
  • So in the server process, an object is created identical to the local instrument you would have gotten. This happens in InstrumentServer.handle_new, and this object includes all the methods, parameters, and functions as a local one would have, and opens the communication channel.
  • InstrumentServer.handle_new calls Instrument.connection_attrs to enumerate the API of this instrument, including its methods, parameters, and functions. Each item in each of those three sets reports a list of attributes that should be proxied in the remote.
  • RemoteInstrument.__init__ then takes those three lists and attempts to recreate the API of the real instrument:
    • Each method of the real instrument becomes a RemoteMethod, that you can call with instrument.<method_name>(*args, **kwargs) as you would a local method. RemoteInstruments delegates method calls the same way Instrument delegates parameters and functions (RemoteInstruments also delegates parameters and functions this way too)
    • Each parameter of the real instrument becomes a RemoteParameter, that supports the normal methods of local parameters, and any (non-method) attributes of the real parameter get proxied so if you get (units = param.units), set (param.units = newunits) or delete (del param.units) them, you are really accessing the server copy of the parameter.
    • Each function of the real instrument becomes a RemoteFunction that you can call with instrument.<function_name>(*args) as you would a local function.
    • Some parts of the regular API are not explicitly proxied - this includes non-method attributes of the instrument and nonstandard methods of parameters (probably some standard ones too). You can get at any of these using NestedAttrAccess (one of the Instrument superclasses) whose methods getattr, setattr, callattr, and delattr get proxied to the instrument. For example, even though visa_handle itself cannot be sent back to the main process, you can call instrument.callattr('visa_handle.close')
    • Docstrings for remote components are not proxied (because their access is handled differently by Python and does not go through the normal attribute lookup process) but they are copied on instantiation and decorated with a note that this object is a remote copy.
  • So now as much as possible the API of this remote instrument matches the API of the real instrument it proxies, so you can use it just like you would the original. But you can also use it in a background Loop, and because the loop process and the main process both just ask the server process whenever they want to know state variables, there is nothing to get out of sync.
  • This instance IS recorded in MyInstrument.instances - even though it is NOT really an instance of MyInstrument, it's just mimicking one. Therefore it will be still be used if you ask to run a test suite for MyInstrument - so if instrument test suites want to access any parts of the API which are NOT proxied, they should use the NestedAttrAccess interface.

@alexcjohnson
Copy link
Contributor

Control and data flow when you run a Loop

Defining the loop and actions

Before you run a measurement loop you do two things:

  1. You describe what parameter(s) to vary and how. This is the creation of a Loop object: loop = Loop(sweep_values, ...)
  2. You describe what to do at each step in the loop. This is loop.each(*actions) which converts the Loop object into an ActiveLoop object. Actions can be:
  • measurements (any object with a .get method will be interpreted as a measurement)
  • Task: some callable (which can have arguments with it) to be executed each time through the loop. Does not generate data.
  • Wait: a specialized Task just to wait a certain time.
  • BreakIf: some condition that, if it returns truthy, breaks (this level of) the loop
  • Another ActiveLoop to nest inside the outer one.

Running the loop - preparation

When you run the loop, there are several arguments that determine if and how extra processes get used:

  • background: does the measurement itself run in a separate process? If your loop involves local instruments (either setting or getting values) then you almost certainly cannot run the loop in the background. Generally we'd like to run it in the background though, so the main process is not blocked, and we can do live plotting and analysis during data acquisition.
  • use_threads: not really about extra processes, but about threads as it says: if the loop actions contain several measurements back to back, will attempt to parallelize them using threading.
  • data_manager: if not False, we use a separate process to hold the data, store it to disk, and provide it to other processes on demand. This offloads the IO tasks from the measurement process so it will run faster and more consistently, and allows live plotting in the main process without constantly reloading the data from disk.

The first thing the loop does is check if there's another background loop already running. If there is, this one blocks until that one finishes. This allows cheap queuing - you lose the benefits of running the measurement in the background (the main process is blocked by this) but you can sequence a whole bunch of loops in one execution cell, like we used to do with nightsweep procedures in Igor, and each one will run in turn.

Next ActiveLoop.containers() inspects all the measurement actions to create the necessary DataArray(s) to hold the data we will generate. Uses attributes .name or .names to determine how many arrays and what to call them, and optional .shape or .shapes to figure out if the values that are going to come back from .get() are single numbers or arrays, then adds a dimension for this measurement loop (if there are nested measurement loops, ActiveLoop.containers() gets called recursively). For now we need to know the exact sizes of the parameters and the measurement loops, although the numpy.ndarray to hold this data is not created right away. Later on we should only create an array of the right dimensionality, and expand it as necessary to hold whatever data arrives.

Now we construct our DataSet by passing all of these DataArrays to new_data. Assuming
data_manager is not False, the new DataSet is created in PUSH_TO_SERVER mode. In this mode (see DataSet._init_push_to_server()), the DataArrays are not initialized - meaning we do not create a numpy.ndarray in them at all, so they do not take much memory and cannot actually hold any data. Instead, we ask the DataManager to copy the new DataSet to the DataServer, where it is created in LOCAL mode which does create the appropriate numpy.ndarrays, allocated to the full size, and filled with NaN. If instead data_manager=False, the DataSet is created in LOCAL mode and initialized right away. After this, the loop itself doesn't care whether a DataManager is being used, the DataSet handles that.

Next, if you're using background=True, the measurement process is started. This ends up copying all the parameters that are involved in the measurement (which is one reason it's important these are all proxies) as well as the DataSet to the new process. If background=False, the measurement loop starts and blocks the main process.

Running the loop - measurement

At the beginning of each level of the loop, the loop actions are 'compiled' into callables, so that each time through the loop you only need to call each of them, all the branching logic is done already. Compiling consists of:

  • collecting sequential measurements into a single _Measure object (so if you have two parameters in a row, they make one _Measure, but if you have a parameter, then a Task or something, then another parameter, they do not get combined).
  • making nested loops into callables that know about where we are in the loop right now.
  • converting a Wait into a Task that can watch the signal_queue for break signals, and will (eventually, when we get to it) be able to call the monitor in idle times.
  • passing Task and BreakIf through, as they're already the right kind of callable.

Then the measurement loop starts and at each step we:

  • set the next sweep value
  • call DataSet.store to store the value you set (more below)
  • call each of the actions in turn.

For measurement actions:

  • If the loop was run with use_threads=True and there are multiple values to get, we start a separate thread to get each value. Otherwise we get them in order in the same thread. Threads only really run the measurements simultaneously if the instruments they call are (a) either local or in separate servers (each server is single-threaded and will not start processing one query until it is done with the previous), and (b) their interfaces are do not inherently block each other.
  • calls DataSet.store to store the new measurements

DataSet.store

All measured and set values go through this routine to get entered into the DataSet. You could imagine having the setpoints already there, but this way we ensure that the stored data is what actually happened, not what we planned to happen, as well as supporting adaptive sweeps.

If you used a DataManager, DataSet.store proxies this call to the DataServer, where the exact same method call is made but on a LOCAL mode DataSet.

A single DataSet.store call can enter data in multiple DataArrays, as long as all of this data corresponds to the same array indices. Note that these indices need not be complete; For example, say you have one parameter 'a' than returns a scalar, and another 'b' that returns a 1D array, and you measure both inside a 1D loop. The array for 'a' will be 1D, and 'b' 2D, but the loop will make calls like:
data_set.store((3,), {'a': 9.9, 'b': [1, 2, 3, 4, 5]})
ie enter 9.9 in element 3 of array 'a', and fill row 3 of array 'b' with the given 1D array.

Finally, subject to rate limiting, DataSet.store writes the updated data to disk.

DataSet.finalize() is called at the very end of the measurement, to ensure the DataSet is marked complete and any last data is written to disk.

Visualization and analysis

ActiveLoop.run() returns the DataSet it created. It is in somewhat of a different state depending on the flags you run with:

  • background=False, data_manager=False: the DataSet is already complete and filled with data by the measurement loop. You can immediately plot or analyze it, it will not change. The DataSet in the main process is the only copy that has ever existed (in memory... all of these options store the data to disk as it arrives).
  • background=False, data_manager=default: during acquisition the data is only kept in the copy on the DataServer, but at the end of the loop, the main process calls data_set.sync() that pulls all the data into the main process copy, and because the measurement is complete, sync() marks it as a LOCAL copy, so again you can immediately plot or analyze it, it will not change.
  • background=True, data_manager=default (this is the default): run() returns the DataSet while the measurement loop is running, but has marked it as PULL_FROM_SERVER and called sync() once to initialize the arrays, but they are mostly empty at that point. You can call sync() manually from time to time, or if you make a plot (qc.QTPlot or qc.MatPlot) in a jupyter notebook, these will automatically call sync() periodically to pull new data into the main process copy of the DataSet. When the measurement is finally done, sync() sees this and marks the DataSet as LOCAL.
  • background=True, data_manager=False: run() returns the DataSet while the measurement loop is running, but has marked it as LOCAL. Not sure why you would use this combination of modes, then the only way to get the data into the main process is to read it in from disk (where it was stored by the DataSet residing in the measurement process). As it is now, I don't think plots will automatically keep syncing in this situation, because the DataSet is not "live" on the DataServer. We could change this, based on file modification times, but this would still be awkward because you would need to reread the entire dataset with every sync.

@giulioungaretti
Copy link
Contributor Author

@alexcjohnson re: Instrument, do you think you could write a little extra after the proxy feature was added ? Like any gotchas, or what gets proxies where ? (kind of a slightly longer of the pull request message in #244.

@guenp guenp modified the milestones: V0.1, Documentation Jul 15, 2016
@astafan8 astafan8 added docs Related to docs improvements and removed enhancement p1 labels Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Related to docs improvements
Projects
None yet
Development

No branches or pull requests

5 participants