Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35409: [Python][Docs] Clarify S3FileSystem Credentials chain for EC2 #35312

Merged
merged 2 commits into from
Aug 1, 2023

Conversation

kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Apr 24, 2023

Rationale for this change

When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service.

I want to document this behavior for pyarrow. The s3fs documentation mention this specific case for EC2.

What changes are included in this PR?

Documentation for the behavior described above.

Technical Details

S3FileSystem uses the CS3Options.Defaults() option when no credentials are passed into the constructor. It utilizes the Aws::Auth::DefaultAWSCredentialsProviderChain

The C++ implementation of DefaultAWSCredentialsProviderChain not only reads the environment variable when trying to resolve AWS credentials, but also looks at profile config and the EC2 Instance Metadata Service.

Are these changes tested?

No, just documentation changes

Are there any user-facing changes?

Yes, changing public documentation

Render Changes

Render the changes locally via Building the doc:
docs/source/python/filesystems.rst:
Screenshot 2023-07-30 at 6 22 02 PM

python/pyarrow/_s3fs.pyx:
Screenshot 2023-07-31 at 3 31 30 PM

@github-actions

This comment was marked as outdated.

@kevinjqliu kevinjqliu changed the title Clarify S3FileSystem Credentials chain for EC2 MINOR: [S3FileSystem] Clarify S3FileSystem Credentials chain for EC2 Apr 24, 2023
@jorisvandenbossche jorisvandenbossche changed the title MINOR: [S3FileSystem] Clarify S3FileSystem Credentials chain for EC2 GH-35409: [Docs] Clarify S3FileSystem Credentials chain for EC2 May 4, 2023
@github-actions
Copy link

github-actions bot commented May 4, 2023

@github-actions
Copy link

github-actions bot commented May 4, 2023

⚠️ GitHub issue #35409 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche
Copy link
Member

@kevinjqliu thanks for the contribution!

We have a section about S3FileSystem in the user guide as well: https://arrow.apache.org/docs/dev/python/filesystems.html#s3. Could you have a look to see if that might require an equivalent update?

python/pyarrow/_s3fs.pyx Outdated Show resolved Hide resolved
python/pyarrow/_s3fs.pyx Outdated Show resolved Hide resolved
@kevinjqliu
Copy link
Contributor Author

@jorisvandenbossche thanks for the review. I've updated the formatting issues and made changes to the S3 section of filesystems.html docs as well.

PTAL

@kevinjqliu
Copy link
Contributor Author

@jorisvandenbossche fixed a formatting issue. there was something wrong with CI too, hence the force pushes.

@pitrou pitrou changed the title GH-35409: [Docs] Clarify S3FileSystem Credentials chain for EC2 GH-35409: [Doc] Clarify S3FileSystem Credentials chain for EC2 May 16, 2023
@pitrou
Copy link
Member

pitrou commented May 16, 2023

@jorisvandenbossche Would you like to take another look?

@kevinjqliu
Copy link
Contributor Author

hey, @jorisvandenbossche wanted to ping on this again. thanks!

@kou kou changed the title GH-35409: [Doc] Clarify S3FileSystem Credentials chain for EC2 GH-35409: [Python][Docs] Clarify S3FileSystem Credentials chain for EC2 Jul 31, 2023
@github-actions
Copy link

⚠️ GitHub issue #35409 has been automatically assigned in GitHub to PR creator.

@kou
Copy link
Member

kou commented Jul 31, 2023

@github-actions crossbow submit docs-preview

@github-actions
Copy link

Unable to match any tasks for `docs-preview`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/5709116257

@kou
Copy link
Member

kou commented Jul 31, 2023

@github-actions crossbow submit preview-docs

@github-actions
Copy link

Revision: a2748ff

Submitted crossbow builds: ursacomputing/crossbow @ actions-4f25ed291d

Task Status
preview-docs Github Actions

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 31, 2023
@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Jul 31, 2023

thanks, @kou I've applied your suggestions. Do the commits need to be squashed?

@kevinjqliu kevinjqliu requested a review from kou July 31, 2023 00:43
@kou
Copy link
Member

kou commented Jul 31, 2023

@github-actions crossbow submit preview-docs

@github-actions
Copy link

Revision: 4cdd0d2

Submitted crossbow builds: ursacomputing/crossbow @ actions-889dfc34d1

Task Status
preview-docs Github Actions

@kevinjqliu
Copy link
Contributor Author

@kou built the changes locally, I've attached a screenshot above

@kou
Copy link
Member

kou commented Jul 31, 2023

Thanks but we have 2 changed pages but the screenshot shows only 1 page.

@kevinjqliu
Copy link
Contributor Author

The visible change is from docs/source/python/filesystems.rst.
python/pyarrow/_s3fs.pyx is a code comment change.

@kou
Copy link
Member

kou commented Jul 31, 2023

@kevinjqliu
Copy link
Contributor Author

We use it for API reference: https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html#pyarrow.fs.S3FileSystem

I see. I'm following Building a single directory for dev purposes without all the pre-requisites to render the changes for docs/source/python.

The changes in python/pyarrow/_s3fs.pyx does not show up. How can I render that part?

@kou
Copy link
Member

kou commented Jul 31, 2023

We need pyarrow to build documents in python/pyarrow/_s3fs.pyx. So we can't use the approach.

We need to use the full (at least including pyarrow) build approach.
https://arrow.apache.org/docs/developers/documentation.html#building-with-docker is easier than https://arrow.apache.org/docs/developers/documentation.html#building .

@kou
Copy link
Member

kou commented Jul 31, 2023

Could you rebase on main to use #36943 ?

kevinjqliu and others added 2 commits July 31, 2023 22:00
The C implementation of `DefaultAWSCredentialsProviderChain` not only reads the environment variable when trying to resolve AWS credentials, but also looks at profile config and the EC2 Instance Metadata Service.

I want to document this behavior in the documentation.
The `s3fs` documentation mentions the specific case for EC2
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@kevinjqliu
Copy link
Contributor Author

@kou I've rebased the branch and was able to build it locally using docker. Updated the description above to render changes to both files.

Thanks for guiding me through the process

@kou
Copy link
Member

kou commented Aug 1, 2023

@github-actions crossbow submit preview-docs

@github-actions
Copy link

github-actions bot commented Aug 1, 2023

Revision: f4b66d7

Submitted crossbow builds: ursacomputing/crossbow @ actions-84183cdf1d

Task Status
preview-docs Github Actions

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 334b46d into apache:main Aug 1, 2023
14 of 15 checks passed
@kou kou removed the awaiting change review Awaiting change review label Aug 1, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Aug 1, 2023
@kevinjqliu kevinjqliu deleted the patch-1 branch August 1, 2023 15:46
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 334b46d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this pull request Aug 20, 2023
… for EC2 (apache#35312)

### Rationale for this change

When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. 

I want to document this behavior for `pyarrow`.  The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2.

### What changes are included in this PR?

Documentation for the behavior described above. 

#### Technical Details
`S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor.  It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213)

The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). 

### Are these changes tested?

No, just documentation changes

### Are there any user-facing changes?

Yes, changing public documentation

* Closes: apache#35409

### Render Changes
Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): 
`docs/source/python/filesystems.rst`:
![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6)

`python/pyarrow/_s3fs.pyx`:
![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d)

Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
… for EC2 (apache#35312)

### Rationale for this change

When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. 

I want to document this behavior for `pyarrow`.  The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2.

### What changes are included in this PR?

Documentation for the behavior described above. 

#### Technical Details
`S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor.  It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213)

The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). 

### Are these changes tested?

No, just documentation changes

### Are there any user-facing changes?

Yes, changing public documentation

* Closes: apache#35409

### Render Changes
Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): 
`docs/source/python/filesystems.rst`:
![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6)

`python/pyarrow/_s3fs.pyx`:
![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d)

Lead-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Docs] Document S3FileSystem behavior in EC2
4 participants