DOC: Using Zarr with GPU arrays #2574

jhamman · 2024-12-18T23:21:41Z

Describe the issue linked to the documentation

Zarr Python 3 has new support for working with CuPy arrays. The only problem is no one knows how to use it!

@akshaysubr / @madsbk / @jakirkham - would you be up for adding a section to the docs? See #2568 for a in-progress docs refactor that should make it easy to add a page on this.

Suggested fix for documentation

No response

akshaysubr · 2024-12-19T18:10:29Z

@jhamman I can work on this section. I'd like to keep it brief for now and expand it once we integrate GPU compression and storage solutions since that's when users will see the full benefit of this pipeline.

jhamman · 2025-01-04T07:19:29Z

Sounds great @akshaysubr! The docs refactor landed this week so you could pick up and add something to the advanced topics user-guide section.

TomAugspurger · 2025-01-21T17:07:17Z

I looked into this a bit today. I think there are ~3 things to implement / document:

Using Zarr with GPU memory (CuPy arrays). This is implemented.
GPU-accelerated I/O (GPU direct storage / kvikio). This isn't implemented yet.
GPU-accelerated decompression / decoding. This isn't implemented yet.

I can put up a PR for the first one. But I wanted to confirm that using zarr.config.set({"buffer": "zarr.core.buffer.gpu.Buffer", "ndbuffer": "zarr.core.buffer.gpu.NDBuffer"}) is the recommended approach (edit: see below).

If that is, then does zarr.storage.GpuMemoryStore provide anything over MemoryStore? If I don't update the config to use GPU Buffers, then I think we get some weird behavior with GpuMemoryStore, where the data is automatically converted between host and device on set, but converted back to host memory on get through the Zarr API:

In [1]: import zarr

In [2]: store = zarr.storage.GpuMemoryStore()

In [3]: z = zarr.create_array(store=store, shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
/home/coder/zarr-python/src/zarr/core/buffer/gpu.py:99: UserWarning: Creating a zarr.buffer.gpu.Buffer with an array that does not support the __cuda_array_interface__ for zero-copy transfers, falling back to slow copy based path
  return cls(buffer.as_array_like())

In [4]: z[:100, :100] = 1
/home/coder/zarr-python/src/zarr/core/buffer/gpu.py:99: UserWarning: Creating a zarr.buffer.gpu.Buffer with an array that does not support the __cuda_array_interface__ for zero-copy transfers, falling back to slow copy based path
  return cls(buffer.as_array_like())

In [5]: type(z[:1000, :1000])
Out[5]: numpy.ndarray

In [6]: type(store._store_dict["c/0/0"])
Out[6]: zarr.core.buffer.gpu.Buffer

I think that's not what we want. Using a zarr.storage.MemoryStore (with the updated buffer prototype) seems to gives us what we want (data stored in memory on the device, and CuPy arrays when reading data).

Edit: Reading through #2473 (comment), this might not be ideal:

Agree, you do not want to change the global default_buffer_prototype. This was never my intention, maybe we should rename it tocpu_buffer_prototype?.

Since IIUC that will also put metadata in GPU memory. I'll do a bit more exploration.

dcherian · 2025-01-21T18:45:13Z

So does the current implementation only include GpuMemoryStore? For some reason, I thought this was superceding Kvikio.

TomAugspurger · 2025-01-21T19:00:28Z

Here's what currently works:

import cupy as cp

import zarr.api.asynchronous
import zarr.core.buffer.cpu
import zarr.core.buffer.gpu
import zarr.storage


async def main() -> None:
    src = cp.random.uniform(size=(100, 100))  # allocate on the device
    store = zarr.storage.LocalStore("data.zarr")
    z = await zarr.api.asynchronous.create_array(
        store=store, name="a", shape=src.shape, chunks=(10, 10), dtype=src.dtype, overwrite=True
    )

    await z.setitem(
        (slice(10), slice(10)), src[:10, :10], prototype=zarr.core.buffer.gpu.buffer_prototype
    )

    result = await z.getitem(
        (slice(10), slice(10)), prototype=zarr.core.buffer.gpu.buffer_prototype
    )
    print(type(result))  # <class 'cupy.ndarray'>

if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

There's also GpuMemoryStore, but right now I think that's mostly just a convenience that automatically moves incoming data to on-device memory.

Notably, this uses the prototype argument to getitem and setitem, and so is only available through the async API. I think we'll want a single configuration option to control this, so that it's usable through the rest of Zarr's API (I'll write up a separate issue for designing this).

For some reason, I thought this was superceding Kvikio.

I think kvikio will still be useful for users who want to want to read data directly into device memory (avoiding any intermediate allocations on the host before copying to the device). kvikio isn't currently compatible with zarr v3, but I'm hoping to look into that sometime.

akshaysubr · 2025-01-21T19:35:44Z

Yes, the prototype argument to getitem and setitem is how to control what array type to use internally. This should automatically be changed to a CuPy array by setting the config default buffer and ndbuffer to the GPU versions.

from zarr.core.config import config
import zarr.core.buffer.gpu as gpu

# GPU buffers
gpu_buffer_mapping = {"buffer":  gpu.Buffer, "ndbuffer": gpu.NDBuffer}

config.set(gpu_buffer_mapping)

This will then set the default_buffer_prototype to the GPU version and therefore all API calls should be able to automatically use it (assuming they are all using this method to get the prototype argument).

akshaysubr · 2025-01-21T20:17:11Z

Looks like Array.__setitem__ calls Array.set_basic_selection which internally uses the default_buffer_prototype method to get the buffer classes set in the config. So just setting up the default configs should be sufficient.

zarr-python/src/zarr/core/array.py

Line 2691 in e9772ac

prototype = default_buffer_prototype()

madsbk · 2025-01-21T20:33:08Z

The problem is that default_buffer_prototype also changes how the metadata is read, I think?

akshaysubr · 2025-01-21T20:49:39Z

Would it make sense to always use the numpy prototype for metadata? So essentially pass in zarr.core.buffer.cpu.numpy_buffer_prototype in this get call?

TomAugspurger · 2025-01-21T21:14:11Z

Would it make sense to always use the numpy prototype for metadata?

That's what I'm leaning towards. The one thing I wanted to understand before suggesting that is the difference between BufferPrototype.buffer and BufferPrototype.nd_buffer. Is it possible that .buffer is intended to be used for metadata, and nd_buffer is intended for data chunks? I haven't had a chance to go through the usage and history yet.

jhamman · 2025-01-21T21:48:59Z

Would it make sense to always use the numpy prototype for metadata? So essentially pass in zarr.core.buffer.cpu.numpy_buffer_prototype in this get call?

I would actually like to see us implement a set of metadata specific methods on the store. I started down this path in #1851.

TomAugspurger · 2025-01-22T03:01:39Z

Is it possible that .buffer is intended to be used for metadata, and .nd_buffer is intended for data chunks? I haven't had a chance to go through the usage and history yet.

I don't think this is an option. Some data buffers, those involving codecs before the ArrayBytesCodecs, will just be dealing with plain buffers rather than ndarrays.

@akshaysubr your suggestion works, with one other tweak to ensure that host memory is used when writing:

diff --git a/src/zarr/core/array.py b/src/zarr/core/array.py
index 632e8221..75bb344b 100644
--- a/src/zarr/core/array.py
+++ b/src/zarr/core/array.py
@@ -38,6 +38,7 @@ from zarr.core.buffer import (
     NDBuffer,
     default_buffer_prototype,
 )
+from zarr.core.buffer.cpu import buffer_prototype as cpu_buffer_prototype
 from zarr.core.chunk_grids import RegularChunkGrid, _auto_partition, normalize_chunks
 from zarr.core.chunk_key_encodings import (
     ChunkKeyEncoding,
@@ -168,7 +169,7 @@ async def get_array_metadata(
         if zarray_bytes is None:
             raise FileNotFoundError(store_path)
     elif zarr_format == 3:
-        zarr_json_bytes = await (store_path / ZARR_JSON).get()
+        zarr_json_bytes = await (store_path / ZARR_JSON).get(prototype=cpu_buffer_prototype)
         if zarr_json_bytes is None:
             raise FileNotFoundError(store_path)
     elif zarr_format is None:
@@ -1295,7 +1296,7 @@ class AsyncArray(Generic[T_ArrayMetadata]):
         """
         Asynchronously save the array metadata.
         """
-        to_save = metadata.to_buffer_dict(default_buffer_prototype())
+        to_save = metadata.to_buffer_dict(cpu_buffer_prototype)
         awaitables = [set_or_delete(self.store_path / key, value) for key, value in to_save.items()]
 
         if ensure_parents:
@@ -1307,7 +1308,7 @@ class AsyncArray(Generic[T_ArrayMetadata]):
                     [
                         (parent.store_path / key).set_if_not_exists(value)
                         for key, value in parent.metadata.to_buffer_dict(
-                            default_buffer_prototype()
+                            cpu_buffer_prototype
                         ).items()
                     ]
                 )

With that, we get cpu.Buffer used for metadata and gpu.Buffer used for data when users configure things properly.

import cupy as cp

import zarr.core.buffer.cpu
import zarr.core.buffer.gpu
import zarr.storage
from zarr.registry import (
    register_buffer,
    register_ndbuffer,
)

register_buffer(zarr.core.buffer.gpu.Buffer)
register_ndbuffer(zarr.core.buffer.gpu.NDBuffer)

zarr.config.set({"buffer": "zarr.core.buffer.gpu.Buffer"})
zarr.config.set({"ndbuffer": "zarr.core.buffer.gpu.NDBuffer"})

src = cp.random.uniform(size=(100, 100))  # allocate on the device
store = zarr.storage.MemoryStore()
z = zarr.create_array(
    store=store, name="a", shape=src.shape, chunks=(10, 10), dtype=src.dtype, overwrite=True
)

z[:10, :10] = src[:10, :10]  # errors
assert isinstance(store._store_dict["zarr.json"], zarr.core.buffer.cpu.Buffer), type(store._store_dict["zarr.json"])
assert type(z[:10, :10]) is cp.ndarray

If we're OK with that (host memory for metadata, GPU memory for data if requested) then I can clean this up and make a PR.

I would actually like to see us implement a set of metadata specific methods on the store.

I don't think we need GPU stuff to motivate that, but IMO it's probably worth doing.

akshaysubr · 2025-01-22T20:17:06Z

@TomAugspurger This looks good. Would it also be useful to add a helper method to set up the zarr configs for the GPU? Something like zarr.config.set(default_gpu_configs) that does all of the setup? This would be useful when we have GPU codecs so the codec mapping can also be updated easily this way.

TomAugspurger · 2025-01-22T20:58:29Z

Would it also be useful to add a helper method to set up the zarr configs for the GPU?

Yeah, I think so. It wasn't obvious to me why you might need to update both BufferPrototype.buffer and .nd_buffer, so others might be confused as well. Having a single configuration switch for "do the right thing if you have a gpu" will be valuable I think.

jhamman added documentation Improvements to the documentation help wanted Issue could use help from someone with familiarity on the topic labels Dec 18, 2024

github-actions bot mentioned this issue Jan 1, 2025

Monthly issue metrics report sanketverma1704/zarr-python#6

Open

nenb mentioned this issue Jan 6, 2025

Device support in zarr-python (especially for GPU) #2658

Open

TomAugspurger mentioned this issue Jan 22, 2025

Update and document GPU buffer handling #2751

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Using Zarr with GPU arrays #2574

DOC: Using Zarr with GPU arrays #2574

jhamman commented Dec 18, 2024

akshaysubr commented Dec 19, 2024

jhamman commented Jan 4, 2025

TomAugspurger commented Jan 21, 2025 •

edited

Loading

dcherian commented Jan 21, 2025

TomAugspurger commented Jan 21, 2025

akshaysubr commented Jan 21, 2025

akshaysubr commented Jan 21, 2025

madsbk commented Jan 21, 2025

akshaysubr commented Jan 21, 2025

TomAugspurger commented Jan 21, 2025

jhamman commented Jan 21, 2025

TomAugspurger commented Jan 22, 2025

akshaysubr commented Jan 22, 2025

TomAugspurger commented Jan 22, 2025

DOC: Using Zarr with GPU arrays #2574

DOC: Using Zarr with GPU arrays #2574

Comments

jhamman commented Dec 18, 2024

Describe the issue linked to the documentation

Suggested fix for documentation

akshaysubr commented Dec 19, 2024

jhamman commented Jan 4, 2025

TomAugspurger commented Jan 21, 2025 • edited Loading

dcherian commented Jan 21, 2025

TomAugspurger commented Jan 21, 2025

akshaysubr commented Jan 21, 2025

akshaysubr commented Jan 21, 2025

madsbk commented Jan 21, 2025

akshaysubr commented Jan 21, 2025

TomAugspurger commented Jan 21, 2025

jhamman commented Jan 21, 2025

TomAugspurger commented Jan 22, 2025

akshaysubr commented Jan 22, 2025

TomAugspurger commented Jan 22, 2025

TomAugspurger commented Jan 21, 2025 •

edited

Loading