Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(4981): incorrect error wrapping in OnceFut #4983

Merged
merged 2 commits into from
Jan 20, 2023

Conversation

DDtKey
Copy link
Contributor

@DDtKey DDtKey commented Jan 19, 2023

Which issue does this PR close?

Closes #4981

Rationale for this change

I used approach with errors wrapped to Arc (I see it's common pattern in arrow & datafusion), because it should be shareable.
I also created separate type alias for that

What changes are included in this PR?

Are these changes tested?

I've added test which failed before these changes.

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Jan 19, 2023
@avantgardnerio
Copy link
Contributor

@DDtKey does #4992 accomplish the same desired outcome?

@DDtKey
Copy link
Contributor Author

DDtKey commented Jan 19, 2023

@DDtKey does #4992 accomplish the same desired outcome?

Don't think so, it was literally string instead of error. Take a look at code before these changes. It had e.to_string(), it was impossible to take root error at all by downcasting

But #4992 looks really helpful as well 🔥

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, I think the string hack was a clear shortcut, and removing it LGTM.

If it is possible to remove the double-arc, I'd like to do that before merging.

@@ -438,7 +438,7 @@ impl<T: 'static> OnceAsync<T> {
}

/// The shared future type used internally within [`OnceAsync`]
type OnceFutPending<T> = Shared<BoxFuture<'static, Arc<Result<T>>>>;
type OnceFutPending<T> = Shared<BoxFuture<'static, Arc<SharedResult<T>>>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that SharedResult has an Arc, can we avoid the double-arc here?

Copy link
Contributor Author

@DDtKey DDtKey Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnceFut is cloneable, while SharedResult wraps only Err into Arc to allow cloning the error itself 🤔

I decided to declare it as SharedResult<Arc<T>> instead, so it won't be Arc inside Arc. It will be either content of Ok or Error wrapped into Arc.

datafusion/core/src/physical_plan/joins/utils.rs Outdated Show resolved Hide resolved
@@ -36,6 +36,9 @@ use sqlparser::parser::ParserError;
/// Result type for operations that could result in an [DataFusionError]
pub type Result<T> = result::Result<T, DataFusionError>;

/// Result type for operations that could result in an [DataFusionError] and needs to be shared (wrapped into `Arc`).
pub type SharedResult<T> = result::Result<T, Arc<DataFusionError>>;
Copy link
Contributor

@tustvold tustvold Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this PR never actually returns this type, do we need to declare it here? Or could we just move it into utils?

Copy link
Contributor Author

@DDtKey DDtKey Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I observed that Result<T, Arc<Error> is quite common approach to allow clonning errors in datafustion & arrow.

And not to miss such kind of errors, I think it would be nice to have it as general type.
Take a look at #4992 and even current find_root impl, it's easy to forget that error could be wrapped into Arc actually (IMO)

I believe this type should be used when it's needed to explicitly declare that error is wrapped into Arc. And could be used in other places later.

And this PR doesn't return this type, however it will be inside ArrowError::ExternalError(..), so actually it's way of explicit usage to be able to check all usages of this pattern

Copy link
Contributor

@tustvold tustvold Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow clonning errors in datafustion & arrow.

I don't see any other examples of this pattern?

so actually it's way of explicit usage to be able to check all usages of this pattern

I'd hope that any codepath that cares about these variants would walk the source tree and effectively blindly skip over the Arc...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #4992 for the code (that does indeed walk over Arcs). It was non ideal however. Figuring out some way to avoid Arc'ing errors would be better in my opinion (not for this PR)

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I would suggest keeping SharedResult as an implementation detail, i.e. within utils have something like

type SharedResult<T> = std::result::Result<Arc<T>,Arc<DataFusionError>>;

@alamb
Copy link
Contributor

alamb commented Jan 20, 2023

Thanks @DDtKey

@alamb alamb merged commit 22d106a into apache:master Jan 20, 2023
@ursabot
Copy link

ursabot commented Jan 20, 2023

Benchmark runs are scheduled for baseline = 6dce728 and contender = 22d106a. 22d106a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@DDtKey DDtKey deleted the fix-nested-error branch January 20, 2023 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect nested error wrapped to ArrowError:External variant for joins
5 participants