Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data processing #22

Merged
merged 56 commits into from
Apr 23, 2021
Merged

Data processing #22

merged 56 commits into from
Apr 23, 2021

Conversation

eggerdj
Copy link
Contributor

@eggerdj eggerdj commented Mar 26, 2021

Co-authored-by: Naoki Kanazawa nkanazawa1989@gmail.com

Summary

This PR introduces the data processing package which was carved out from PR #20.

Details and comments

Data processing is the steps required to prepare the measured data for analysis. This is done using a DataProcessor which is a chain of DataActions, i.e. transformations applied to the data in place. A user can specify the actions to apply on the data to process it. For example, the code

processor = DataProcessor()
processor.append(Kernel(my_kernel))
processor.append(ToReal(scale=1e-3))

Creates a data processor that would take level 0 data, apply a kernel to create IQ data, and then take the real part of this IQ Data while scaling it by a factor 1e-3. Similarly, the data processor

processor = DataProcessor()
processor.append(Discriminator(my_discriminator))
processor.append(Population())

would take IQ data as input, discriminate it into counts and then convert these counts to a population.
An instance of DataProcessor is then used to process data by doing, for example,

data = exp_data.data[0]
processor.format_data(data)

Here, exp_data is an instance of ExperimentData. The data processor will modify data, an instance of Dict[str, Any], in place. Each node in the processor looks for the type of data it uses as input in data. For example, the Population() node will use data['counts'] to create populations which it will insert into data by doing data['populations'] = .... This makes an instance of DataProcessor reusable on different input data. Furthermore, since the different steps are contained in the data we can easily check the outcome of each processing step. Finally, each node in the DataProcessor defines the key under which it stores its output data so that we can easily retrieve, from the processor, the output data. Indeed processor.output_key() returns the data output key of the last node in the processing chain. For the examples above this output key would be memory_real and populations, respectively.

Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>
@eggerdj eggerdj changed the title * Carved out data processor from PR #20 Data processing Mar 26, 2021
@chriseclectic
Copy link
Collaborator

@eggerdj Can you post some comments on the design and code examples of how this is intended to be used to make this easier to parse?

Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eggerdj , this looks really great. Few comments for implementation details.

qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/base.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for addressing all of my concerns and comments.

qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
Co-authored-by: Will Shanks <wshaos@posteo.net>
Copy link
Collaborator

@chriseclectic chriseclectic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have several suggestions that we can discuss more. My main ones are to do with making this perhaps a little more Functional (as in programming), so that individual nodes or the whole data processor could be used interchangeably with regular functions / callables in the code base. (Maybe this is similar to how numpy ufuncs are actually classes that store metadata about their input and output dimensions and dtypes.)

qiskit_experiments/data_processing/__init__.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/base.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/base.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/base.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/base.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/nodes.py Outdated Show resolved Hide resolved
This was referenced Apr 18, 2021
Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor comments before approval, overall looking good.

qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
qiskit_experiments/data_processing/data_processor.py Outdated Show resolved Hide resolved
@coruscating coruscating merged commit d387f67 into qiskit-community:main Apr 23, 2021
@yaelbh yaelbh mentioned this pull request May 2, 2021
@eggerdj eggerdj mentioned this pull request May 16, 2021
4 tasks
@coruscating coruscating added this to the Release 0.1 milestone Jun 16, 2021
@eggerdj eggerdj deleted the data_processor branch July 16, 2021 13:50
paco-ri pushed a commit to paco-ri/qiskit-experiments that referenced this pull request Jul 11, 2022
* * Carved out data processor from PR qiskit-community#20

Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>

* * Added node_output property.

* * Ran Black.
* Fixed unit tests.

* * Added a better methodology for checking the input requirements.

* * Lint fix.

* * Added population shots fix and corresponding tests.

* * Unified ToReal and ToImag.

* * Reformatted the DataProcessor to a list of nodes rather than pointers.
* Removed the node_type and root node.
* Amended tests accordingly.

* * Added history functionality to the data processor.

* * Made history of data processor a property.
* Adapted unit tests accordingly.

* * Fixed docstring.

* * Added _process to the IQPart data actions.

* * Changed node_output to a class variable.

* * Added the option to initialize the DataProcessor with given DataActions.

* * Removed Kernel and Discriminator. They will be for a future PR.

* * Added docstring from Will.

Co-authored-by: Will Shanks <wshaos@posteo.net>

* * Moved docstring.

* Update qiskit_experiments/data_processing/base.py

Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>

* Update qiskit_experiments/data_processing/base.py

Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>

* * Aligned code to _process.

* * Made data processor callable.

* * Renamed base.py to data_action.py.

* * Made nodes callable.

* * Removed history property, added call_with_history.

* * Renamed Population to Probability.

* * Metadata in processed_data.

* * Refactored _process(Dict[str, Any]) -> Dict[str, Any] to _process(Any) -> Any.

* * Added option to specifiy which nodes to include in the history.

* Update qiskit_experiments/data_processing/nodes.py

Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>

* * Removed __init__ from DataAction.

* * Added the option to turn of validation.

* Update qiskit_experiments/data_processing/data_processor.py

Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>

* * Simplified validation of IQ data.

* Update qiskit_experiments/data_processing/nodes.py

Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>

* Update qiskit_experiments/data_processing/nodes.py

Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>

* * Removed unnecessary wrapping of _process.

* * Polished docstrings and ran black.

* Update qiskit_experiments/data_processing/data_action.py

Co-authored-by: Will Shanks <wshaos@posteo.net>

* * Removed unnecessary code in DataProcessingError.

* * Rewrote doc string.

* * IQ data is now of type float and not complex.

* * Fixed validate issue.

* * Added error message to __call__ and call_with_history.

* * Improved docstring.

* * Impoved class docstring.

* * Changed how DataProcessor._nodes are initialized in __init__.

* * Changed behavior of empty data processor.

* * Refactored call and call_with_history to use the call_internal function.

* * Fixed, lint, black, and docstrings.

* Update qiskit_experiments/data_processing/data_action.py

* * Added type hint to call_with_history

* Update qiskit_experiments/data_processing/data_processor.py

Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>
Co-authored-by: Will Shanks <wshaos@posteo.net>
Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants