From 755a18609e74ac6797a2bfad4f5d1c23d51c79fb Mon Sep 17 00:00:00 2001 From: Kevin Liu Date: Mon, 31 Jul 2023 22:14:27 -0700 Subject: [PATCH] GH-35409: [Python][Docs] Clarify S3FileSystem Credentials chain for EC2 (#35312) ### Rationale for this change When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. I want to document this behavior for `pyarrow`. The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2. ### What changes are included in this PR? Documentation for the behavior described above. #### Technical Details `S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor. It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213) The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). ### Are these changes tested? No, just documentation changes ### Are there any user-facing changes? Yes, changing public documentation * Closes: #35409 ### Render Changes Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): `docs/source/python/filesystems.rst`: ![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6) `python/pyarrow/_s3fs.pyx`: ![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d) Lead-authored-by: Kevin Liu Co-authored-by: Sutou Kouhei Signed-off-by: Sutou Kouhei --- docs/source/python/filesystems.rst | 5 +++-- python/pyarrow/_s3fs.pyx | 12 +++++++++--- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/docs/source/python/filesystems.rst b/docs/source/python/filesystems.rst index 40656f6b76f43..3fc10dc7718d3 100644 --- a/docs/source/python/filesystems.rst +++ b/docs/source/python/filesystems.rst @@ -153,8 +153,9 @@ PyArrow implements natively a S3 filesystem for S3 compatible storage. The :class:`S3FileSystem` constructor has several options to configure the S3 connection (e.g. credentials, the region, an endpoint override, etc). In addition, the constructor will also inspect configured S3 credentials as -supported by AWS (for example the ``AWS_ACCESS_KEY_ID`` and -``AWS_SECRET_ACCESS_KEY`` environment variables). +supported by AWS (such as the ``AWS_ACCESS_KEY_ID`` and +``AWS_SECRET_ACCESS_KEY`` environment variables, AWS configuration files, +and EC2 Instance Metadata Service for EC2 nodes). Example how you can read contents from a S3 bucket:: diff --git a/python/pyarrow/_s3fs.pyx b/python/pyarrow/_s3fs.pyx index e76c7b9ffa730..51c248d147828 100644 --- a/python/pyarrow/_s3fs.pyx +++ b/python/pyarrow/_s3fs.pyx @@ -140,14 +140,20 @@ cdef class S3FileSystem(FileSystem): """ S3-backed FileSystem implementation - If neither access_key nor secret_key are provided, and role_arn is also not - provided, then attempts to initialize from AWS environment variables, - otherwise both access_key and secret_key must be provided. + AWS access_key and secret_key can be provided explicitly. If role_arn is provided instead of access_key and secret_key, temporary credentials will be fetched by issuing a request to STS to assume the specified role. + If neither access_key nor secret_key are provided, and role_arn is also not + provided, then attempts to establish the credentials automatically. + S3FileSystem will try the following methods, in order: + + * ``AWS_ACCESS_KEY_ID``, ``AWS_SECRET_ACCESS_KEY``, and ``AWS_SESSION_TOKEN`` environment variables + * configuration files such as ``~/.aws/credentials`` and ``~/.aws/config`` + * for nodes on Amazon EC2, the EC2 Instance Metadata Service + Note: S3 buckets are special and the operations available on them may be limited or more expensive than desired.