-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data processing #22
Data processing #22
Conversation
Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>
@eggerdj Can you post some comments on the design and code examples of how this is intended to be used to make this easier to parse? |
* Removed the node_type and root node. * Amended tests accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eggerdj , this looks really great. Few comments for implementation details.
* Adapted unit tests accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks for addressing all of my concerns and comments.
Co-authored-by: Will Shanks <wshaos@posteo.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have several suggestions that we can discuss more. My main ones are to do with making this perhaps a little more Functional (as in programming), so that individual nodes or the whole data processor could be used interchangeably with regular functions / callables in the code base. (Maybe this is similar to how numpy ufuncs are actually classes that store metadata about their input and output dimensions and dtypes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comments before approval, overall looking good.
* * Carved out data processor from PR qiskit-community#20 Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com> * * Added node_output property. * * Ran Black. * Fixed unit tests. * * Added a better methodology for checking the input requirements. * * Lint fix. * * Added population shots fix and corresponding tests. * * Unified ToReal and ToImag. * * Reformatted the DataProcessor to a list of nodes rather than pointers. * Removed the node_type and root node. * Amended tests accordingly. * * Added history functionality to the data processor. * * Made history of data processor a property. * Adapted unit tests accordingly. * * Fixed docstring. * * Added _process to the IQPart data actions. * * Changed node_output to a class variable. * * Added the option to initialize the DataProcessor with given DataActions. * * Removed Kernel and Discriminator. They will be for a future PR. * * Added docstring from Will. Co-authored-by: Will Shanks <wshaos@posteo.net> * * Moved docstring. * Update qiskit_experiments/data_processing/base.py Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com> * Update qiskit_experiments/data_processing/base.py Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com> * * Aligned code to _process. * * Made data processor callable. * * Renamed base.py to data_action.py. * * Made nodes callable. * * Removed history property, added call_with_history. * * Renamed Population to Probability. * * Metadata in processed_data. * * Refactored _process(Dict[str, Any]) -> Dict[str, Any] to _process(Any) -> Any. * * Added option to specifiy which nodes to include in the history. * Update qiskit_experiments/data_processing/nodes.py Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com> * * Removed __init__ from DataAction. * * Added the option to turn of validation. * Update qiskit_experiments/data_processing/data_processor.py Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com> * * Simplified validation of IQ data. * Update qiskit_experiments/data_processing/nodes.py Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com> * Update qiskit_experiments/data_processing/nodes.py Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com> * * Removed unnecessary wrapping of _process. * * Polished docstrings and ran black. * Update qiskit_experiments/data_processing/data_action.py Co-authored-by: Will Shanks <wshaos@posteo.net> * * Removed unnecessary code in DataProcessingError. * * Rewrote doc string. * * IQ data is now of type float and not complex. * * Fixed validate issue. * * Added error message to __call__ and call_with_history. * * Improved docstring. * * Impoved class docstring. * * Changed how DataProcessor._nodes are initialized in __init__. * * Changed behavior of empty data processor. * * Refactored call and call_with_history to use the call_internal function. * * Fixed, lint, black, and docstrings. * Update qiskit_experiments/data_processing/data_action.py * * Added type hint to call_with_history * Update qiskit_experiments/data_processing/data_processor.py Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com> Co-authored-by: Will Shanks <wshaos@posteo.net> Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>
Co-authored-by: Naoki Kanazawa nkanazawa1989@gmail.com
Summary
This PR introduces the data processing package which was carved out from PR #20.
Details and comments
Data processing is the steps required to prepare the measured data for analysis. This is done using a
DataProcessor
which is a chain ofDataActions
, i.e. transformations applied to the data in place. A user can specify the actions to apply on the data to process it. For example, the codeCreates a data processor that would take level 0 data, apply a kernel to create IQ data, and then take the real part of this IQ Data while scaling it by a factor 1e-3. Similarly, the data processor
would take IQ data as input, discriminate it into counts and then convert these counts to a population.
An instance of
DataProcessor
is then used to process data by doing, for example,Here,
exp_data
is an instance ofExperimentData
. The data processor will modifydata
, an instance ofDict[str, Any]
, in place. Each node in the processor looks for the type of data it uses as input indata
. For example, thePopulation()
node will usedata['counts']
to create populations which it will insert into data by doingdata['populations'] = ...
. This makes an instance ofDataProcessor
reusable on different input data. Furthermore, since the different steps are contained in the data we can easily check the outcome of each processing step. Finally, each node in theDataProcessor
defines the key under which it stores its output data so that we can easily retrieve, from the processor, the output data. Indeedprocessor.output_key()
returns the data output key of the last node in the processing chain. For the examples above this output key would bememory_real
andpopulations
, respectively.