Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Jittor backend #210

Open
xiazhuo opened this issue Apr 19, 2024 · 7 comments
Open

Add Jittor backend #210

xiazhuo opened this issue Apr 19, 2024 · 7 comments

Comments

@xiazhuo
Copy link

xiazhuo commented Apr 19, 2024

Issue Description

The TensorCircuit library currently supports the PyTorch and TensorFlow backends. We are interested in extending this support to include Jittor, a domestically developed deep learning framework that utilizes dynamic compilation (Just-in-Time). Jittor is promising but still in the nascent stages, particularly in its support for complex numbers.

To facilitate the integration of Jittor with TensorCircuit, we need to identify specific complex number functionalities that are essential. Given that tensorcircuit.backends already accommodates a versatile API compatible with Numpy, Jax, TensorFlow, and PyTorch, I am optimistic about our potential to include Jittor with relative ease once these complex number capabilities are enhanced.

Contribution and Collaboration

Could you provide some insights or suggestions on the critical complex number functionalities that Jittor needs to support for a seamless integration with TensorCircuit? Your expertise and suggestions will be invaluable as we work towards this extension.

I am eager to contribute to this development and would greatly appreciate your guidance and collaboration.

Additional References

Jittor: https://cg.cs.tsinghua.edu.cn/jittor/

@refraction-ray
Copy link
Contributor

For the list of functionalities that a backend framework has to support, try
[s for s in dir(tc.backend) if not s.startswith("_")].

The critical operations include the support for complex valued matrix multiplication, addition, inversion, eigen decomposition, QR decomposition, and sigular value decompistion. In addition, automatic differentiation for complex operations and complex input is also subtle, the gradients and derivatives are different upto complex conjugations.

@xiazhuo
Copy link
Author

xiazhuo commented May 7, 2024

Do the eig, qr, and svd need to support backpropagation? We notice that the reverse of the complex eig decomposition in PyTorch is ill-conditioned.

@refraction-ray
Copy link
Contributor

Yes, they need to support AD. For numerically stability, we can further customize their AD rules, see https://github.com/tencent-quantum-lab/tensorcircuit/blob/master/tensorcircuit/backends/pytorch_ops.py

@xiazhuo
Copy link
Author

xiazhuo commented Jun 15, 2024

Hello,

I'm currently contributing to the project and attempting to set up my local development environment. I've encountered some issues while running the tests using pytest. Below are the details of my environment and the steps I've taken:

  • OS: Ubuntu 18.04
  • Python Version: 3.10.14
  • GPU Driver Version: 550.67

Steps Taken:

  1. Cloned the latest version of the repository.

  2. Followed the latest GitHub CI configuration for "test (ubuntu-20.04, 3.10)"

  3. Installed dependencies using the following commands:

    python -m pip install --upgrade pip
    pip install --no-cache-dir -r requirements/requirements.txt
    pip install --no-cache-dir -r requirements/requirements-extra.txt
    pip install --no-cache-dir -r requirements/requirements-dev.txt
    pip install --no-cache-dir -r requirements/requirements-types.txt
  4. Have tried to set environment variables:

    export XLA_PYTHON_CLIENT_PREALLOCATE=false
    export TF_FORCE_GPU_ALLOW_GROWTH=true
  5. Ran the tests using:

    pytest --cov=tensorcircuit --cov-report=xml -svv --benchmark-skip

Issue:

Despite following these steps, I encountered multiple errors during the test execution. Below are the relevant parts of the error logs:

2024-06-15 22:59:56.019344: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-15 22:59:56.019428: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-15 22:59:56.021177: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-15 22:59:57.493948: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
...
================================================ test session starts ================================================
platform linux -- Python 3.10.14, pytest-6.2.4, py-1.11.0, pluggy-0.13.1 -- /home/xiazhuo/.miniconda3/envs/jittorquantum/bin/python
cachedir: .pytest_cache
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /mnt/nas2/home/xiazhuo/tensorcircuit, configfile: pytest.ini
plugins: anyio-4.4.0, xdist-3.5.0, lazy-fixture-0.6.3, cov-5.0.0, benchmark-4.0.0
collecting ... 2024-06-15 23:00:07.331048: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-06-15 23:00:07.331783: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-06-15 23:00:07.332353: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-06-15 23:00:07.332898: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-06-15 23:00:07.333541: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
collected 580 items / 1 skipped / 579 selected                                                                       
.......
.......
.......
---------- coverage: platform linux, python 3.10.14-final-0 ----------
Coverage XML written to file coverage.xml

============================================== short test summary info ==============================================

FAILED tests/test_backends.py::test_device_cpu_gpu[jaxb] - RuntimeError: Unknown backend: 'gpu' requested, but no ...

FAILED tests/test_backends.py::test_qr[torchb] - RuntimeError: clone is not supported by NestedIntSymNode

FAILED tests/test_backends.py::test_optimizers[torchb] - AttributeError: partially initialized module 'torch._dyna...

FAILED tests/test_circuit.py::test_circuit_inverse_2[npb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'p...

FAILED tests/test_circuit.py::test_circuit_inverse_2[tfb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'p...

FAILED tests/test_circuit.py::test_circuit_inverse_2[jaxb] - qiskit.exceptions.MissingOptionalLibraryError: "The '...

FAILED tests/test_circuit.py::test_draw_cond_measure - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylate...

FAILED tests/test_circuit.py::test_circuit_to_json[npb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'pyl...

FAILED tests/test_circuit.py::test_circuit_to_json[tfb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'pyl...

FAILED tests/test_circuit.py::test_circuit_to_json[jaxb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'py...

FAILED tests/test_circuit.py::test_to_openqasm - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylatexenc' ...

FAILED tests/test_circuit.py::test_initial_mapping - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylatexe...

FAILED tests/test_compiler.py::test_qsikit_compiler - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylatex...

FAILED tests/test_compiler.py::test_composed_compiler - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylat...

FAILED tests/test_compiler.py::test_replace_r - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylatexenc' l...

FAILED tests/test_compiler.py::test_default_compiler - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylate...

FAILED tests/test_dmcircuit.py::test_dm_circuit_draw - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylate...

FAILED tests/test_interfaces.py::test_dlpack_transformation[tfb] - jaxlib.xla_extension.XlaRuntimeError: INVALID_A...

FAILED tests/test_shadows.py::test_jit[tfb] - tensorflow.python.framework.errors_impl.ResourceExhaustedError: Grap...

======================== 19 failed, 543 passed, 17 skipped, 2 xfailed in 1295.04s (0:21:35) =========================

Request:

Could you please help me identify what might be causing these issues and how I can resolve them? Any guidance on additional steps or configurations that I might need to set up would be greatly appreciated.

If you need more information about my configuration and the full logs, please let me know.

Thank you for your assistance and for all your hard work on this project!

@refraction-ray
Copy link
Contributor

Seems these errors are from different sources:

  • qiskit.exceptions.MissingOptionalLibraryError: "The 'pyl, this is due to the lack of package pylatexenc, pip install this package will solve most errors

  • RuntimeError: clone is not supported by NestedIntSymNode, AttributeError: partially initialized module 'torch._dyna these two seem to be related to incompatibility with torch>=2.3, you can try lower the version of pytorch

For remaining errors, I would like to see the full exception and error output to figure out the source of errors. I guess some of the remaining error might be related to breaking changes in device management API in these ML packages, since GPU related code is not tested on GitHub.

@xiazhuo
Copy link
Author

xiazhuo commented Jun 16, 2024

Thank you very much for your patience! Following your previous suggestions, I have installed the pylatexenc package and downgraded torch to version 2.1. However, I am still encountering several errors. Below are the error output:

====================================================== FAILURES =======================================================
______________________________________________ test_device_cpu_gpu[jaxb] ______________________________________________
backend = None

    @pytest.mark.skipif(
        len(tf.config.list_physical_devices()) == 1, reason="no GPU detected"
    )
    @pytest.mark.parametrize("backend", [lf("tfb"), lf("jaxb"), lf("torchb")])
    def test_device_cpu_gpu(backend):
        a = tc.backend.ones([])
>       a1 = tc.backend.device_move(a, "gpu:0")

tests/test_backends.py:330: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tensorcircuit/backends/jax_backend.py:639: in device_move
    dev = self._str2dev(dev)
tensorcircuit/backends/jax_backend.py:654: in _str2dev
    return libjax.devices("gpu")[_id]
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/xla_bridge.py:1077: in devices
    return get_backend(backend).devices()
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/xla_bridge.py:1011: in get_backend
    return _get_backend_uncached(platform)
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/xla_bridge.py:992: in _get_backend_uncached
    platform = canonicalize_platform(platform)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
platform = 'gpu'

    def canonicalize_platform(platform: str) -> str:
      """Replaces platform aliases with their concrete equivalent.
    
      In particular, replaces "gpu" with either "cuda" or "rocm", depending on which
      hardware is actually present. We want to distinguish "cuda" and "rocm" for
      purposes such as MLIR lowering rules, but in many cases we don't want to
      force users to care.
      """
      platforms = _alias_to_platforms.get(platform, None)
      if platforms is None:
        return platform
    
      b = backends()
      for p in platforms:
        if p in b.keys():
          return p
>     raise RuntimeError(f"Unknown backend: '{platform}' requested, but no "
                         f"platforms that are instances of {platform} are present. "
                         "Platforms present are: " + ",".join(b.keys()))
E     RuntimeError: Unknown backend: 'gpu' requested, but no platforms that are instances of gpu are present. Platforms present are: cpu

/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/xla_bridge.py:793: RuntimeError
___________________________________________ test_dlpack_transformation[tfb] ___________________________________________
backend = None

    @pytest.mark.parametrize("backend", [lf("tfb"), lf("jaxb"), lf("torchb")])
    def test_dlpack_transformation(backend):
        blist = ["tensorflow", "jax"]
        if is_torch is True:
            blist.append("pytorch")
        for b in blist:
>           ans = tc.interfaces.general_args_to_backend(
                args=tc.backend.ones([2], dtype="float32"),
                target_backend=b,
                enable_dlpack=True,
            )

tests/test_interfaces.py:363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tensorcircuit/interfaces/tensortrans.py:136: in general_args_to_backend
    return backend.tree_map(target_backend.from_dlpack, caps)
tensorcircuit/backends/abstract_backend.py:841: in tree_map
    return tf.nest.map_structure(f, *pytrees)
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/tensorflow/python/util/nest.py:631: in map_structure
    return nest_util.map_structure(
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/tensorflow/python/util/nest_util.py:1066: in map_structure
    return _tf_core_map_structure(func, *structure, **kwargs)
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/tensorflow/python/util/nest_util.py:1106: in _tf_core_map_structure
    [func(*x) for x in entries],
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/tensorflow/python/util/nest_util.py:1106: in <listcomp>
    [func(*x) for x in entries],
tensorcircuit/backends/jax_backend.py:434: in from_dlpack
    return jax.dlpack.from_dlpack(a)
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/dlpack.py:278: in from_dlpack
    return _legacy_from_dlpack(external_array, device, copy)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dlpack = <capsule object "dltensor" at 0x7f271efb9c80>, device = None, copy = None

    def _legacy_from_dlpack(dlpack, device: xla_client.Device | None = None,
                            copy: bool | None = None):
      preferred_platform = getattr(device, "platform", None)
      if device and preferred_platform == "gpu":
        preferred_platform = "cuda" if "cuda" in device.client.platform_version else "rocm"
    
      cpu_backend = xla_bridge.get_backend("cpu")
      gpu_backend = None
    
      if preferred_platform in {"cuda", "rocm"}:
        try:
          gpu_backend = xla_bridge.get_backend(preferred_platform)
        except RuntimeError:
          raise TypeError(
            f"A {str.upper(preferred_platform)} device was specified, however no "
            f"{str.upper(preferred_platform)} backend was found."
          )
    
      if preferred_platform is None:
        try:
          gpu_backend = xla_bridge.get_backend("cuda")
        except RuntimeError:
          pass
        # Try ROCm if CUDA backend not found
        if gpu_backend is None:
          try:
            gpu_backend = xla_bridge.get_backend("rocm")
          except RuntimeError:
            pass
    
>     _arr = jnp.asarray(xla_client._xla.dlpack_managed_tensor_to_buffer(
          dlpack, cpu_backend, gpu_backend)) # type: ignore
E     jaxlib.xla_extension.XlaRuntimeError: INVALID_ARGUMENT: DLPack tensor is on GPU, but no GPU backend was provided.
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/dlpack.py:195: XlaRuntimeError

---------- coverage: platform linux, python 3.10.14-final-0 ----------
Coverage XML written to file coverage.xml

================================================ short test summary info ================================================
FAILED tests/test_backends.py::test_device_cpu_gpu[jaxb] - RuntimeError: Unknown backend: 'gpu' requested, but no plat...
FAILED tests/test_interfaces.py::test_dlpack_transformation[tfb] - jaxlib.xla_extension.XlaRuntimeError: INVALID_ARGUM...
=========================== 2 failed, 560 passed, 17 skipped, 2 xfailed in 1168.15s (0:19:28) ===========================

Additionally, if it is convenient, could you update the contribution guidelines and the requirements files to reflect the new steps and dependencies for setting up the environment?

@refraction-ray
Copy link
Contributor

The above errors seem to be the misconfiguration of jax+GPU, i.e. the installed jax doesn't have a well configured GPU backend somehow

Additionally, if it is convenient, could you update the contribution guidelines and the requirements files to reflect the new steps and dependencies for setting up the environment?

will done, thanks for the advice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants