M. Sohaib Alam, Noah F. Berthusen, Peter P. Orth
Reinforcement learning has witnessed recent applications to a variety of tasks in quantum programming. The underlying assumption is that those tasks could be modeled as Markov Decision Processes (MDPs). Here, we investigate the feasibility of this assumption by exploring its consequences for two fundamental tasks in quantum programming: state preparation and gate compilation. By forming discrete MDPs, focusing exclusively on the single-qubit case (both with and without noise), we solve for the optimal policy exactly through policy iteration. We find optimal paths that correspond to the shortest possible sequence of gates to prepare a state, or compile a gate, up to some target accuracy. As an example, we find sequences of
This repository includes information, code, and data to generate the figures in the paper.
The main files to perform the algorithm detailed in the paper are described below. Generated data can be found in the results folder. The following files were designed to be ran on a computing cluster, and they may need to be modified to run on other systems.
transitions.py
Generates conditional probability distributions for a noiseless MDP. Can specify Bloch sphere discretization and available actions.noisy_transitions.py
Same as above, although we now allow for a noise channel to affect the transitions.iteration.py
Given a probability distribution and a goal state, runs policy iteration to find the optimal policy.