-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs on accessing Azure blob storage through fsspec #836
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thank you. And, do you mind opening an issue about your minor note? I think it's a bug.
And, adding original author @NivekT of this tutorial to review.
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit change. LGTM! Thank you so much for submitting this!
cc: @msaroufim
Thank you for the speed @sgrigory! Let me know if you want more stuff to do! |
Co-authored-by: Kevin Tse <NivekT@users.noreply.github.com>
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@sgrigory Ops, I thought this PR came from OSS. Feel free to take over the imported Diff and land it. I don't want to steal credit from your valuable work! Thanks! |
Summary: ### Changes Adding an example of DataPipe usage with Azure Blob storage via `fsspec`, similar to pytorch#812. The example is placed into a new section in `docs/source/tutorial.rst` Here is the screenshot showing that code snippets in the tutorial work as expected: <img width="1569" alt="Screenshot 2022-10-18 at 19 33 49" src="https://user-images.githubusercontent.com/23200558/196503562-034162c0-6dde-4749-adc7-5e081ff2c19f.png"> #### Minor note Technically, `fsspec` [allows both path prefixes `abfs://` or `az://`](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/README.md?plain=1#L33) for Azure Blob storage Gen2 as synonyms. However, only `abfs://` works for us for the following reason: - If a path starts with `az`, the variable `fs.protocol` [here](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L82) is still `abfs` - So the condition `root.startswith(protocol)` is false, and `is_local` is true - As a result the path "doubles" in [this line](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L95), like on this screenshot: <img width="754" alt="Screenshot 2022-10-18 at 19 50 56" src="https://user-images.githubusercontent.com/23200558/196506965-697eb2d7-8f84-4536-972b-7081e55e1ff5.png"> This won't have any effect for the users, however, as long as they use `abfs://` prefix recommended in the tutorial Pull Request resolved: pytorch#836 Reviewed By: NivekT Differential Revision: D40483505 Pulled By: sgrigory fbshipit-source-id: f03373aa4b376af8ea2ac3480fc133067caaa0ce
Summary: ### Changes Adding an example of DataPipe usage with Azure Blob storage via `fsspec`, similar to #812. The example is placed into a new section in `docs/source/tutorial.rst` Here is the screenshot showing that code snippets in the tutorial work as expected: <img width="1569" alt="Screenshot 2022-10-18 at 19 33 49" src="https://user-images.githubusercontent.com/23200558/196503562-034162c0-6dde-4749-adc7-5e081ff2c19f.png"> #### Minor note Technically, `fsspec` [allows both path prefixes `abfs://` or `az://`](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/README.md?plain=1#L33) for Azure Blob storage Gen2 as synonyms. However, only `abfs://` works for us for the following reason: - If a path starts with `az`, the variable `fs.protocol` [here](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L82) is still `abfs` - So the condition `root.startswith(protocol)` is false, and `is_local` is true - As a result the path "doubles" in [this line](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L95), like on this screenshot: <img width="754" alt="Screenshot 2022-10-18 at 19 50 56" src="https://user-images.githubusercontent.com/23200558/196506965-697eb2d7-8f84-4536-972b-7081e55e1ff5.png"> This won't have any effect for the users, however, as long as they use `abfs://` prefix recommended in the tutorial Pull Request resolved: #836 Reviewed By: NivekT Differential Revision: D40483505 Pulled By: sgrigory fbshipit-source-id: f03373aa4b376af8ea2ac3480fc133067caaa0ce
Summary: ### Changes Adding an example of DataPipe usage with Azure Blob storage via `fsspec`, similar to pytorch#812. The example is placed into a new section in `docs/source/tutorial.rst` Here is the screenshot showing that code snippets in the tutorial work as expected: <img width="1569" alt="Screenshot 2022-10-18 at 19 33 49" src="https://user-images.githubusercontent.com/23200558/196503562-034162c0-6dde-4749-adc7-5e081ff2c19f.png"> #### Minor note Technically, `fsspec` [allows both path prefixes `abfs://` or `az://`](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/README.md?plain=1#L33) for Azure Blob storage Gen2 as synonyms. However, only `abfs://` works for us for the following reason: - If a path starts with `az`, the variable `fs.protocol` [here](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L82) is still `abfs` - So the condition `root.startswith(protocol)` is false, and `is_local` is true - As a result the path "doubles" in [this line](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L95), like on this screenshot: <img width="754" alt="Screenshot 2022-10-18 at 19 50 56" src="https://user-images.githubusercontent.com/23200558/196506965-697eb2d7-8f84-4536-972b-7081e55e1ff5.png"> This won't have any effect for the users, however, as long as they use `abfs://` prefix recommended in the tutorial Pull Request resolved: pytorch#836 Reviewed By: NivekT Differential Revision: D40483505 Pulled By: sgrigory fbshipit-source-id: f03373aa4b376af8ea2ac3480fc133067caaa0ce
Changes
Adding an example of DataPipe usage with Azure Blob storage via
fsspec
, similar to #812. The example is placed into a new section indocs/source/tutorial.rst
Here is the screenshot showing that code snippets in the tutorial work as expected:
Minor note
Technically,
fsspec
allows both path prefixesabfs://
oraz://
for Azure Blob storage Gen2 as synonyms. However, onlyabfs://
works for us for the following reason:az
, the variablefs.protocol
here is stillabfs
root.startswith(protocol)
is false, andis_local
is trueThis won't have any effect for the users, however, as long as they use
abfs://
prefix recommended in the tutorial