Skip to content

Commit

Permalink
(fix): num_workers->max_workers (#49)
Browse files Browse the repository at this point in the history
  • Loading branch information
ilan-gold authored Nov 18, 2024
1 parent 36f3d62 commit fca5fc3
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ We export a `ZarrsCodecPipeline` class so that `zarr-python` can use the class b
`ZarrsCodecPipeline` options are exposed through `zarr.config`.

Standard `zarr.config` options control some functionality (see the defaults in the [config.py](https://github.com/zarr-developers/zarr-python/blob/main/src/zarr/core/config.py) of `zarr-python`):
- `threading.num_workers`: the maximum number of threads used internally by the `ZarrsCodecPipeline` on the Rust side.
- `threading.max_workers`: the maximum number of threads used internally by the `ZarrsCodecPipeline` on the Rust side.
- Defaults to the number of threads in the global `rayon` thread pool if set to `None`, which is [typically the number of logical CPUs](https://docs.rs/rayon/latest/rayon/struct.ThreadPoolBuilder.html#method.num_threads).
- `array.write_empty_chunks`: whether or not to store empty chunks.
- Defaults to false if `None`. Note that checking for emptiness has some overhead, see [here](https://docs.rs/zarrs/latest/zarrs/config/struct.Config.html#store-empty-chunks) for more info.
- This option name is proposed in [zarr-python #2429](https://github.com/zarr-developers/zarr-python/pull/2429)

The `ZarrsCodecPipeline` specific options are:
- `codec_pipeline.chunk_concurrent_maximum`: the maximum number of chunks stored/retrieved concurrently.
- Defaults to the number of logical CPUs if `None`. It is constrained by `threading.num_workers` as well.
- Defaults to the number of logical CPUs if `None`. It is constrained by `threading.max_workers` as well.
- `codec_pipeline.chunk_concurrent_minimum`: the minimum number of chunks retrieved/stored concurrently when balancing chunk/codec concurrency.
- Defaults to 4 if `None`. See [here](https://docs.rs/zarrs/latest/zarrs/config/struct.Config.html#chunk-concurrent-minimum) for more info
- `codec_pipeline.validate_checksums`: enable checksum validation (e.g. with the CRC32C codec).
Expand All @@ -45,7 +45,7 @@ The `ZarrsCodecPipeline` specific options are:
For example:
```python
zarr.config.set({
"threading.num_workers": None,
"threading.max_workers": None,
"array.write_empty_chunks": False,
"codec_pipeline": {
"path": "zarrs.ZarrsCodecPipeline",
Expand All @@ -66,7 +66,7 @@ Concurrency can be classified into two types:
- codec (inner) concurrency: the number of threads encoding/decoding a chunk.
- This is chosen automatically in combination with the chunk concurrency.

The product of the chunk and codec concurrency will approximately match `threading.num_workers`.
The product of the chunk and codec concurrency will approximately match `threading.max_workers`.

Chunk concurrency is typically favored because:
- parallel encoding/decoding can have a high overhead with some codecs, especially with small chunks, and
Expand Down

0 comments on commit fca5fc3

Please sign in to comment.