Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Improve virtual ref docs #284

Merged
merged 3 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions docs/docs/icechunk-python/configuration.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Configuration

When creating and opening Icechunk stores, there are a two different sets of configuration to be aware of:
- `StorageConfig` - for configuring access to the object store or filesystem
- `StoreConfig` - for configuring the behavior of the Icechunk Store itself

- [`StorageConfig`](./reference.md#icechunk.StorageConfig) - for configuring access to the object store or filesystem
- [`StoreConfig`](./reference.md#icechunk.StoreConfig) - for configuring the behavior of the Icechunk Store itself

## Storage Config

Expand All @@ -15,7 +16,7 @@ When using Icechunk with s3 compatible storage systems, credentials must be prov
=== "From environment"

With this option, the credentials for connecting to S3 are detected automatically from your environment.
This is usually the best choice if you are connecting from within an AWS environment (e.g. from EC2).
This is usually the best choice if you are connecting from within an AWS environment (e.g. from EC2). [See the API](./reference.md#icechunk.StorageConfig.s3_from_env)

```python
icechunk.StorageConfig.s3_from_env(
Expand All @@ -26,7 +27,7 @@ When using Icechunk with s3 compatible storage systems, credentials must be prov

=== "Provide credentials"

With this option, you provide your credentials and other details explicitly.
With this option, you provide your credentials and other details explicitly. [See the API](./reference.md#icechunk.StorageConfig.s3_from_config)

```python
icechunk.StorageConfig.s3_from_config(
Expand All @@ -47,7 +48,7 @@ When using Icechunk with s3 compatible storage systems, credentials must be prov
=== "Anonymous"

With this option, you connect to S3 anonymously (without credentials).
This is suitable for public data.
This is suitable for public data. [See the API](./reference.md#icechunk.StorageConfig.s3_anonymous)

```python
icechunk.StorageConfig.s3_anonymous(
Expand All @@ -59,7 +60,7 @@ When using Icechunk with s3 compatible storage systems, credentials must be prov

### Filesystem Storage

Icechunk can also be used on a local filesystem by providing a path to the location of the store
Icechunk can also be used on a [local filesystem](./reference.md#icechunk.StorageConfig.filesystem) by providing a path to the location of the store

=== "Local filesystem"

Expand Down
45 changes: 44 additions & 1 deletion docs/docs/icechunk-python/virtual.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,4 +156,47 @@ Finally, let's make a plot of the sea surface temperature!
ds.sst.isel(time=26, zlev=0).plot(x='lon', y='lat', vmin=0)
```

![oisst](../assets/datasets/oisst.png)
![oisst](../assets/datasets/oisst.png)

## Virtual Reference API

While `VirtualiZarr` is the easiest way to create virtual datasets with Icechunk, the Store API that it uses to create the datasets in Icechunk is public. `IcechunkStore` contains a [`set_virtual_ref`](./reference.md#icechunk.IcechunkStore.set_virtual_ref) method that specifies a virtual ref for a specified chunk.

### Virtual Reference Storage Support

Currently, Icechunk supports two types of storage for virtual references:

#### S3 Compatible

References to files accessible via S3 compatible storage.

##### Example

Here is how we can set the chunk at key `c/0` to point to a file on an s3 bucket,`mybucket`, with the prefix `my/data/file.nc`:

```python
store.set_virtual_ref('c/0', 's3://mybucket/my/data/file.nc', offset=1000, length=200)
```

##### Configuration

S3 virtual references require configuring credential for the store to be able to access the specified s3 bucket. See [the configuration docs](./configuration.md#virtual-reference-storage-config) for instructions.


#### Local Filesystem

References to files accessible via local filesystem. This requires any file paths to be **absolute** at this time.

##### Example

Here is how we can set the chunk at key `c/0` to point to a file on my local filesystem located at `/path/to/my/file.nc`:

```python
store.set_virtual_ref('c/0', 'file:///path/to/my/file.nc', offset=20, length=100)
```

No extra configuration is necessary for local filesystem references.

### Virtual Reference File Format Support

Currently, Icechunk supports `HDF5` and `netcdf4` files for use in virtual references. See the [tracking issue](https://github.com/earth-mover/icechunk/issues/197) for more info.
25 changes: 24 additions & 1 deletion icechunk-python/python/icechunk/_icechunk_python.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,8 @@ class KeyNotFound(Exception):
): ...

class StoreConfig:
"""Configuration for an IcechunkStore"""

# The number of concurrent requests to make when fetching partial values
get_partial_values_concurrency: int | None
# The threshold at which to inline chunks in the store in bytes. When set,
Expand All @@ -270,7 +272,28 @@ class StoreConfig:
inline_chunk_threshold_bytes: int | None = None,
unsafe_overwrite_refs: bool | None = None,
virtual_ref_config: VirtualRefConfig | None = None,
): ...
):
"""Create a StoreConfig object with the given configuration options

Parameters
----------
get_partial_values_concurrency: int | None
The number of concurrent requests to make when fetching partial values
inline_chunk_threshold_bytes: int | None
The threshold at which to inline chunks in the store in bytes. When set,
chunks smaller than this threshold will be inlined in the store. Default is
512 bytes when not specified.
unsafe_overwrite_refs: bool | None
Whether to allow overwriting refs in the store. Default is False. Experimental.
virtual_ref_config: VirtualRefConfig | None
Configurations for virtual references such as credentials and endpoints

Returns
-------
StoreConfig
A StoreConfig object with the given configuration options
"""
...

async def async_pyicechunk_store_exists(storage: StorageConfig) -> bool: ...
def pyicechunk_store_exists(storage: StorageConfig) -> bool: ...
Expand Down
Loading