QPager and approximation techniques
The QPager
type from earlier development was revived and better integrated to give major performance boosts and typically 2 additional qubits of GPU maximum width, for common 4 segment GPU RAM architectures! It has been integrated as part of the default layer stack, and it does not require multi-device OpenCL environments to enjoy a performance boost.
We also have begun to expand into approximate simulation techniques, besides the major improvements in exact techniques. TrySeparate()
has finally demonstrated returns under the right conditions, as it permutes through X/Y/Z single qubit and Bell pair bases as a "reactive" separability method, (as opposed to the "proactive" methods QUnit
has always prioritized). Rounding tolerance can be set via constructor argument or universally overridden with the QRACK_QUNIT_SEPARABILITY_THRESHOLD
environment variable.
16-bit floating point accuracy is now supported for both CPU and GPU, (when available for the OpenCL device and environment,) and FP16 build has been added to CI.