Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public s3 prefix without auth #26

Merged
merged 10 commits into from
Jul 24, 2024
Merged

Conversation

pgarrison
Copy link
Collaborator

Trying again to do #25 but with the tests fixed this time!

Context

Previously bioio could not read from a public s3:// URL without AWS authentication due to this ome-zarr issue. As of 0.9.0 the issue is resolved.

Changes

  • Upgrade ome-zarr dependency
  • Pass an fsspec instance to ome-zarr instead of just a path
  • Add a test with a public s3:// URL

Testing

pdm run pytest bioio_ome_zarr/tests/test_s3_read.py

@pgarrison
Copy link
Collaborator Author

Draft PR until #24 is merged.

@pgarrison
Copy link
Collaborator Author

Previously Dan asked:

LGTM. So if someone wants BioImage(zarr_s3_url), do they really have to do BioImage(zarr_s3_url, anon=True) ?

And I answered:

They must either provide AWS credentials or set anon=True. I agree the usability could be improved because setting anon=True for the same with an https URL will cause an error. I believe this should be the case not just for OME ZARRs but for any file read from a public s3:// path, since I expect all our readers use s3fs under the hood.

So if we wanted to improve the usability I think it happens in bioio or bioio-base by wrapping the BioImage __init__ logic with something like this:

try:
    # __init__ with user's fs_kwargs
except SomethingSpecific as e:
    if protocol == "s3://":
        # __init__ with user's fs_kwargs plus {anon: True}
    else:
        raise e

@pgarrison pgarrison marked this pull request as ready for review July 23, 2024 20:26
@pgarrison pgarrison requested a review from a team as a code owner July 23, 2024 20:26
"fsspec>=2022.8.0",
"ome-zarr>=0.9.0",
"xarray>=0.16.1",
"zarr>=2.18.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt this a dependency of ome-zarr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe so, but now we are using it directly

```python
from bioio import BioImage
path = "https://allencell.s3.amazonaws.com/aics/nuc-morph-dataset/hipsc_fov_nuclei_timelapse_dataset/hipsc_fov_nuclei_timelapse_data_used_for_analysis/baseline_colonies_fov_timelapse_dataset/20200323_09_small/raw.ome.zarr"
image = BioImage(path)
print(image.get_image_dask_data())
```
If using an `s3://` path to access a public S3 bucket, the `BioImage` constructor must be given a dictionary with `anon: True` in the `fs_kwargs` argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good and valuable documentation but slightly unfortunate that it lives here in the reader's README. It would be great if somewhere in the bioio docs we said something like "be sure to check the readmes or documentation for individual readers, for their specific parameters." It might already be there... This is just an occupational hazard of having readers in separate packages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular piece of documentation may apply to all/many readers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed an issue for it bioio-devs/bioio#60

@pgarrison pgarrison merged commit b5f7f74 into main Jul 24, 2024
13 checks passed
@pgarrison pgarrison deleted the feature/public-s3-prefix-without-auth branch July 24, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants