allow ReferenceFileSystem to hold dicts, which are treated as JSON files #1562

bendichter · 2024-04-04T02:15:05Z

Currently, when a ReferenceFileSystem wants to create an inline JSON file, the value needs to be a JSON string, e.g.

{
  "version": 1,
  "refs": {
    ".zgroup": "{\"zarr_format\": 2}",
    "data/.zarray": "{\"chunks\": [100, 100], \"compressor\": null, \"dtype\": \"<i8\", \"fill_value\": null, \"filters\": null, \"order\": \"C\", \"shape\": [100, 100], \"zarr_format\": 2}",
    "data/.zattrs": "{\"_ARRAY_DIMENSIONS\": [\"a\", \"b\"]}",
    "data/0.0": [
      "example4.h5",
      2048,
      80000
    ]
  }
}

The proposed change allows the JSON string to instead be dicts, which would allow the RFS to be:

{
  "version": 1,
  "refs": {
    ".zgroup": {
      "zarr_format": 2
    },
    "data/.zarray": {
      "chunks": [
        100,
        100
      ],
      "compressor": null,
      "dtype": "<i8",
      "fill_value": null,
      "filters": null,
      "order": "C",
      "shape": [
        100,
        100
      ],
      "zarr_format": 2
    },
    "data/.zattrs": {
      "_ARRAY_DIMENSIONS": [
        "a",
        "b"
      ]
    },
    "data/0.0": [
      "example4.h5",
      2048,
      80000
    ]
  }
}

This allows for easier reading, writing, manipulation, and JSON-specific search tools

bendichter · 2024-04-04T02:27:41Z

This would clearly benefit from tests but I just wanted to see if there was interest in supporting this feature before continuing

rly · 2024-04-04T02:38:53Z

The Neurodata Without Borders project is moving toward using Zarr and ReferenceFileSystem for accessing large-scale neurophysiology data stored in the cloud and locally. This feature would make reading/inspecting, writing, editing, and querying these data much easier.

martindurant · 2024-04-04T15:03:48Z

I am fine with this, but a test would be nice. Since it only activates for the small JSON metadata files within a zarr dataset, the cost at runtime should be minimal. I suppose this JSON-as-dict representation doesn't survive loading into referenceFS and saving again; it could be a valid option you might want to provide.

Would love to hear more about the Neurodata Without Borders use case.

add tests json-as-dict for RFS ver0 and ver1

magland · 2024-04-04T16:03:47Z

Would love to hear more about the Neurodata Without Borders use case.

@martindurant This is the early-stage project that uses reference file systems (.zarr.json) for NWB. NWB is traditionally built on hdf5, but there are advantages of using Zarr as the backend and the kerchunk approach for utilizing data chunks from existing files on DANDI.

martindurant · 2024-05-30T00:14:45Z

Sorry I forgot about this! Looks good.

allow ReferenceFileSystem to hold dicts, which are treated as JSON files

3efcbeb

bendichter added 2 commits April 4, 2024 11:33

add json-as-dicts for rfs ver0

eb45ae8

add tests json-as-dict for RFS ver0 and ver1

blackify

06e57d8

magland mentioned this pull request Apr 4, 2024

support dicts in reference file system for json files NeurodataWithoutBorders/lindi#40

Merged

martindurant merged commit 463e2ce into fsspec:master May 30, 2024
10 checks passed

DarkLight1337 mentioned this pull request May 30, 2024

Add dependencies for building docs #1613

Merged

rly mentioned this pull request Jun 2, 2024

KeyError when dict values and templates are used in ReferenceFileSystem #1615

Closed

magland mentioned this pull request Jun 12, 2024

proposed functionality for loading/saving analyses flatironinstitute/stan-playground#50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow ReferenceFileSystem to hold dicts, which are treated as JSON files #1562

allow ReferenceFileSystem to hold dicts, which are treated as JSON files #1562

bendichter commented Apr 4, 2024 •

edited

Loading

bendichter commented Apr 4, 2024

rly commented Apr 4, 2024

martindurant commented Apr 4, 2024

magland commented Apr 4, 2024 •

edited

Loading

martindurant commented May 30, 2024

allow ReferenceFileSystem to hold dicts, which are treated as JSON files #1562

allow ReferenceFileSystem to hold dicts, which are treated as JSON files #1562

Conversation

bendichter commented Apr 4, 2024 • edited Loading

bendichter commented Apr 4, 2024

rly commented Apr 4, 2024

martindurant commented Apr 4, 2024

magland commented Apr 4, 2024 • edited Loading

martindurant commented May 30, 2024

bendichter commented Apr 4, 2024 •

edited

Loading

magland commented Apr 4, 2024 •

edited

Loading