Skip to content

Commit

Permalink
[DataPipe] Small doc improvement for S3
Browse files Browse the repository at this point in the history
ghstack-source-id: 3892ecb9a7da3e8b7833347dafae3defcfd09366
Pull Request resolved: #784
  • Loading branch information
NivekT committed Sep 20, 2022
1 parent 2212cb7 commit 0988ad6
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 1 deletion.
5 changes: 4 additions & 1 deletion torchdata/datapipes/iter/load/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,16 @@ Please refer to the documentation:

### Note

Your environment must be properly configured for AWS to use the DataPipes. It is possible to do that via the AWS Command
Line Interface (`aws configure`).

It's recommended to set up a detailed configuration file with the `AWS_CONFIG_FILE` environment variable. The following
environment variables are also parsed: `HOME`, `S3_USE_HTTPS`, `S3_VERIFY_SSL`, `S3_ENDPOINT_URL`, `AWS_REGION` (would
be overwritten by the `region` variable).

### Troubleshooting

If you get `Access Denied`, it's very possibly a
If you get `Access Denied` or no response, it's very possibly a
[wrong region configuration](https://github.com/aws/aws-sdk-cpp/issues/1211) or an
[accessing issue with `aws-sdk-cpp`](https://aws.amazon.com/premiumsupport/knowledge-center/s3-access-denied-aws-sdk/).

Expand Down
10 changes: 10 additions & 0 deletions torchdata/datapipes/iter/load/s3io.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ class S3FileListerIterDataPipe(IterDataPipe[str]):
until all files are iterated.
3. ``request_timeout_ms`` and ``region`` will overwrite settings in the configuration file or
environment variables.
4. The lack of AWS proper configuration can lead empty response. For more details related to S3 IO DataPipe
setup and AWS config, please see the `README file`_.
.. _README file:
https://github.com/pytorch/data/tree/main/torchdata/datapipes/iter/load#s3-io-datapipe-documentation
Args:
source_datapipe: a DataPipe that contains URLs/URL prefixes to s3 files
Expand Down Expand Up @@ -77,6 +82,11 @@ class S3FileLoaderIterDataPipe(IterDataPipe[Tuple[str, StreamWrapper]]):
1. ``source_datapipe`` **must** contain a list of valid S3 URLs.
2. ``request_timeout_ms`` and ``region`` will overwrite settings in the
configuration file or environment variables.
3. The lack of AWS proper configuration can lead empty response. For more details related to S3 IO DataPipe
setup and AWS config, please see the `README file`_.
.. _README file:
https://github.com/pytorch/data/tree/main/torchdata/datapipes/iter/load#s3-io-datapipe-documentation
Args:
source_datapipe: a DataPipe that contains URLs to s3 files
Expand Down

0 comments on commit 0988ad6

Please sign in to comment.