Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44286: [C++][FS][Azure] Improve error handling #44287

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lupko
Copy link
Contributor

@lupko lupko commented Oct 2, 2024

Rationale for this change

The AzureFileSystem implementation does not catch all the possible exceptions that may be thrown by the Azure SDK - the code only catches StorageException, while there may be other reasons for the calls to fail. Most notably, Azure SDK communicates network-related errors via a TransportException.

Since the implementation does not capture these exceptions, the caller code has to do that otherwise these uncaught exceptions will cause process to abort.

What changes are included in this PR?

  • Added general purpose error converter that can convert std::exception_ptr to Arrow Status
  • Modified the implementation so that:
    • Any catch (StorageException) statements which merely convert the exc to status now become catch (...) and call out to the error converter and pass the current exception ptr
    • Any catch (StorageException) statements which also inspect & act on the information included in the exception are left as is. An additional catch (...) is added, calls to converter and passes the current exception ptr.

Are these changes tested?

TODO: need extra tests

Are there any user-facing changes?

No

Copy link

github-actions bot commented Oct 2, 2024

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@lupko lupko changed the title Azure err GD-44286: [C++][FS][Azure] Improve error handling Oct 2, 2024
@lupko
Copy link
Contributor Author

lupko commented Oct 2, 2024

hi @kou , here is followup from #44274 .

From the PoC, I have tweaked the code changes:

  • the general purpose convertor always passes prefix_args to Status. This is to have as much detail as possible in the exception.
    • for StorageException, it just adds the azure error & what() as before
    • for other exceptions from azure hierarchy, it adds info about exception type and then error & what
    • for std exceptions, it adds the typeid (lifted this from your merged changes)
    • for everything else, there is just extra generic message (this could perhaps be improved by i don't know how :))
  • all the newly added catch(...) blocks modified so that they pass extra error details (usually same as what is sent on StorageException)

I think this makes somewhat more sense compared to what was in the PoC.

What is outstanding are the tests. I have not run the existing tests locally because i don't know how / need to read up on that in the docs (i have build working and is green). Once i get that going, i will add a couple of new tests - any suggestions from your side?

@kou kou changed the title GD-44286: [C++][FS][Azure] Improve error handling GH-44286: [C++][FS][Azure] Improve error handling Oct 2, 2024
Copy link

github-actions bot commented Oct 2, 2024

⚠️ GitHub issue #44286 has been automatically assigned in GitHub to PR creator.

@lupko
Copy link
Contributor Author

lupko commented Oct 2, 2024

i also went through the FS implementation, looking for any methods that may be using azure client's methods that can lead to exceptions and are outside of try-catch. have not discovered any gaps.

@kou
Copy link
Member

kou commented Oct 3, 2024

Here are generic developer documents:

FYI: You can build and run only Azure FS tests by the following command line (note that I use cpp.build as my build directory):

ninja -C cpp.build debug/arrow-azurefs-test && LD_LIBRARY_PATH=$PWD/cpp.build/debug cpp.build/debug/arrow-azurefs-test

@kou
Copy link
Member

kou commented Oct 3, 2024

New tests should cover at least Http::TransportException and non std::exception (is it happen?) cases.
Core::RequestFailedException and std::exception cases should be covered with existing tests.

template <typename... PrefixArgs>
Status ExceptionToStatus(const std::exception_ptr& ex_ptr, PrefixArgs&&... prefix_args) {
try {
if (ex_ptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 305 to 307
return Status::UnknownError(
std::forward<PrefixArgs>(prefix_args)...,
" An unexpected exception has occurred. No further detail is available.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the last catch (...) {...} block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Oct 3, 2024
@github-actions github-actions bot added Component: Python awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 3, 2024
@lupko
Copy link
Contributor Author

lupko commented Oct 3, 2024

hi @kou , sorry for delays on my end - i had other engagements. i started looking into the extra tests yesterday. Turns out i had build of tests already setup from before and just had to find how to run them :) Thanks for the pointers.

I will look into the C++ tests as you suggested - hopefully tomorrow if time allows. So far, I have tried and added a test to PyArrow; dumb part is it takes about 20secs to run because of the underlying timeouts/retries or whatever Azure SDK does before it raises TransportError. I can keep it or drop it - let me know.

@kou
Copy link
Member

kou commented Oct 3, 2024

No problem. Thanks for working on this.

Sorry, I missed one part for testing.
You need to install Azurite https://github.com/Azure/Azurite for local testing.
See also: https://github.com/Azure/Azurite?tab=readme-ov-file#npm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants