Skip to content

Commit

Permalink
apacheGH-35409: [Python][Docs] Clarify S3FileSystem Credentials chain…
Browse files Browse the repository at this point in the history
… for EC2 (apache#35312)

### Rationale for this change

When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. 

I want to document this behavior for `pyarrow`.  The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2.

### What changes are included in this PR?

Documentation for the behavior described above. 

#### Technical Details
`S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor.  It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213)

The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). 

### Are these changes tested?

No, just documentation changes

### Are there any user-facing changes?

Yes, changing public documentation

* Closes: apache#35409

### Render Changes
Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): 
`docs/source/python/filesystems.rst`:
![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6)

`python/pyarrow/_s3fs.pyx`:
![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d)

Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
kevinjqliu and kou authored Aug 1, 2023
1 parent 112f949 commit 334b46d
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 5 deletions.
5 changes: 3 additions & 2 deletions docs/source/python/filesystems.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,9 @@ PyArrow implements natively a S3 filesystem for S3 compatible storage.
The :class:`S3FileSystem` constructor has several options to configure the S3
connection (e.g. credentials, the region, an endpoint override, etc). In
addition, the constructor will also inspect configured S3 credentials as
supported by AWS (for example the ``AWS_ACCESS_KEY_ID`` and
``AWS_SECRET_ACCESS_KEY`` environment variables).
supported by AWS (such as the ``AWS_ACCESS_KEY_ID`` and
``AWS_SECRET_ACCESS_KEY`` environment variables, AWS configuration files,
and EC2 Instance Metadata Service for EC2 nodes).


Example how you can read contents from a S3 bucket::
Expand Down
12 changes: 9 additions & 3 deletions python/pyarrow/_s3fs.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -140,14 +140,20 @@ cdef class S3FileSystem(FileSystem):
"""
S3-backed FileSystem implementation
If neither access_key nor secret_key are provided, and role_arn is also not
provided, then attempts to initialize from AWS environment variables,
otherwise both access_key and secret_key must be provided.
AWS access_key and secret_key can be provided explicitly.
If role_arn is provided instead of access_key and secret_key, temporary
credentials will be fetched by issuing a request to STS to assume the
specified role.
If neither access_key nor secret_key are provided, and role_arn is also not
provided, then attempts to establish the credentials automatically.
S3FileSystem will try the following methods, in order:
* ``AWS_ACCESS_KEY_ID``, ``AWS_SECRET_ACCESS_KEY``, and ``AWS_SESSION_TOKEN`` environment variables
* configuration files such as ``~/.aws/credentials`` and ``~/.aws/config``
* for nodes on Amazon EC2, the EC2 Instance Metadata Service
Note: S3 buckets are special and the operations available on them may be
limited or more expensive than desired.
Expand Down

0 comments on commit 334b46d

Please sign in to comment.