Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs on accessing Azure blob storage through fsspec #836

Closed
wants to merge 5 commits into from

Conversation

sgrigory
Copy link

@sgrigory sgrigory commented Oct 18, 2022

Changes

Adding an example of DataPipe usage with Azure Blob storage via fsspec, similar to #812. The example is placed into a new section in docs/source/tutorial.rst

Here is the screenshot showing that code snippets in the tutorial work as expected:

Screenshot 2022-10-18 at 19 33 49

Minor note

Technically, fsspec allows both path prefixes abfs:// or az:// for Azure Blob storage Gen2 as synonyms. However, only abfs:// works for us for the following reason:

  • If a path starts with az, the variable fs.protocol here is still abfs
  • So the condition root.startswith(protocol) is false, and is_local is true
  • As a result the path "doubles" in this line, like on this screenshot:

Screenshot 2022-10-18 at 19 50 56

This won't have any effect for the users, however, as long as they use abfs:// prefix recommended in the tutorial

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 18, 2022
@sgrigory sgrigory changed the title [WIP] Add docs on accessing Azure blob storage through fsspec Add docs on accessing Azure blob storage through fsspec Oct 18, 2022
@sgrigory sgrigory marked this pull request as ready for review October 18, 2022 18:27
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thank you. And, do you mind opening an issue about your minor note? I think it's a bug.

And, adding original author @NivekT of this tutorial to review.

@ejguan ejguan requested a review from NivekT October 18, 2022 18:36
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@NivekT NivekT mentioned this pull request Oct 18, 2022
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit change. LGTM! Thank you so much for submitting this!

cc: @msaroufim

docs/source/tutorial.rst Outdated Show resolved Hide resolved
@msaroufim
Copy link
Member

Thank you for the speed @sgrigory! Let me know if you want more stuff to do!

@msaroufim msaroufim self-requested a review October 18, 2022 20:51
Co-authored-by: Kevin Tse <NivekT@users.noreply.github.com>
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan
Copy link
Contributor

ejguan commented Oct 18, 2022

@sgrigory Ops, I thought this PR came from OSS. Feel free to take over the imported Diff and land it. I don't want to steal credit from your valuable work! Thanks!

@sgrigory
Copy link
Author

This is great! Thank you. And, do you mind opening an issue about your minor note? I think it's a bug.

And, adding original author @NivekT of this tutorial to review.

Thanks for the review - opened an issue #840

ejguan pushed a commit to ejguan/data that referenced this pull request Oct 21, 2022
Summary:
### Changes

Adding an example of DataPipe usage with Azure Blob storage via `fsspec`, similar to pytorch#812. The example is placed into a new section in `docs/source/tutorial.rst`

Here is the screenshot showing that code snippets in the tutorial work as expected:

<img width="1569" alt="Screenshot 2022-10-18 at 19 33 49" src="https://user-images.githubusercontent.com/23200558/196503562-034162c0-6dde-4749-adc7-5e081ff2c19f.png">

####  Minor note

Technically, `fsspec` [allows both path prefixes `abfs://` or `az://`](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/README.md?plain=1#L33) for Azure Blob storage Gen2 as synonyms. However, only `abfs://` works for us for the following reason:
- If a path starts with `az`, the variable `fs.protocol` [here](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L82) is still `abfs`
- So the condition `root.startswith(protocol)` is false, and `is_local` is true
- As a result the path "doubles" in [this line](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L95), like on this screenshot:
<img width="754" alt="Screenshot 2022-10-18 at 19 50 56" src="https://user-images.githubusercontent.com/23200558/196506965-697eb2d7-8f84-4536-972b-7081e55e1ff5.png">

This won't have any effect for the users, however, as long as they use `abfs://` prefix recommended in the tutorial

Pull Request resolved: pytorch#836

Reviewed By: NivekT

Differential Revision: D40483505

Pulled By: sgrigory

fbshipit-source-id: f03373aa4b376af8ea2ac3480fc133067caaa0ce
ejguan pushed a commit that referenced this pull request Oct 21, 2022
Summary:
### Changes

Adding an example of DataPipe usage with Azure Blob storage via `fsspec`, similar to #812. The example is placed into a new section in `docs/source/tutorial.rst`

Here is the screenshot showing that code snippets in the tutorial work as expected:

<img width="1569" alt="Screenshot 2022-10-18 at 19 33 49" src="https://user-images.githubusercontent.com/23200558/196503562-034162c0-6dde-4749-adc7-5e081ff2c19f.png">

####  Minor note

Technically, `fsspec` [allows both path prefixes `abfs://` or `az://`](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/README.md?plain=1#L33) for Azure Blob storage Gen2 as synonyms. However, only `abfs://` works for us for the following reason:
- If a path starts with `az`, the variable `fs.protocol` [here](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L82) is still `abfs`
- So the condition `root.startswith(protocol)` is false, and `is_local` is true
- As a result the path "doubles" in [this line](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L95), like on this screenshot:
<img width="754" alt="Screenshot 2022-10-18 at 19 50 56" src="https://user-images.githubusercontent.com/23200558/196506965-697eb2d7-8f84-4536-972b-7081e55e1ff5.png">

This won't have any effect for the users, however, as long as they use `abfs://` prefix recommended in the tutorial

Pull Request resolved: #836

Reviewed By: NivekT

Differential Revision: D40483505

Pulled By: sgrigory

fbshipit-source-id: f03373aa4b376af8ea2ac3480fc133067caaa0ce
ejguan pushed a commit to ejguan/data that referenced this pull request Oct 23, 2022
Summary:
### Changes

Adding an example of DataPipe usage with Azure Blob storage via `fsspec`, similar to pytorch#812. The example is placed into a new section in `docs/source/tutorial.rst`

Here is the screenshot showing that code snippets in the tutorial work as expected:

<img width="1569" alt="Screenshot 2022-10-18 at 19 33 49" src="https://user-images.githubusercontent.com/23200558/196503562-034162c0-6dde-4749-adc7-5e081ff2c19f.png">

####  Minor note

Technically, `fsspec` [allows both path prefixes `abfs://` or `az://`](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/README.md?plain=1#L33) for Azure Blob storage Gen2 as synonyms. However, only `abfs://` works for us for the following reason:
- If a path starts with `az`, the variable `fs.protocol` [here](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L82) is still `abfs`
- So the condition `root.startswith(protocol)` is false, and `is_local` is true
- As a result the path "doubles" in [this line](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L95), like on this screenshot:
<img width="754" alt="Screenshot 2022-10-18 at 19 50 56" src="https://user-images.githubusercontent.com/23200558/196506965-697eb2d7-8f84-4536-972b-7081e55e1ff5.png">

This won't have any effect for the users, however, as long as they use `abfs://` prefix recommended in the tutorial

Pull Request resolved: pytorch#836

Reviewed By: NivekT

Differential Revision: D40483505

Pulled By: sgrigory

fbshipit-source-id: f03373aa4b376af8ea2ac3480fc133067caaa0ce
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants