-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow ObjectStoreProvider to return None (return Result<Option> rather than Result) #3595
Conversation
5378981
to
9cda524
Compare
9cda524
to
e855faf
Compare
Codecov Report
@@ Coverage Diff @@
## master #3595 +/- ##
==========================================
- Coverage 86.06% 86.05% -0.01%
==========================================
Files 300 300
Lines 56328 56333 +5
==========================================
Hits 48479 48479
- Misses 7849 7854 +5
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Why is this necessary, if returning None will just get converted into an error anyway? |
I don't think convert None into an error is a good idea. Otherwise, why don't you just convert catch the error in the provider and convert error into None. One example usage is apache/datafusion-ballista#260. I think for some usage, returning None is reasonable. |
Because you want to provide context on what the error was? As far as I see it the provider has more context about why an object store couldn't be found and will produce strictly better errors than Returning |
The ObjectStoreProvider is a new added way to find an object store. In the future, there may be other ways besides ObjectStoreProvider to find an object store. Then we can add similar logic as follows:
Therefore, I don't think it's a good idea for the provider to throw an error directly if there's no object store find. And we should not mix the None and error in ObjectStoreProvider. It's the ObjectStoreRegistry's duty to decide whether convert None to error. |
A result is just an option with context on why it is empty? I think I'm missing something, what you describe would be easier without another layer of optionality? On error you could easily try a different approach, and if that also fails return the union of the errors? Perhaps we could make this change as and when we add this new discovery mechanism, if it proves necessary. FWIW I'm sceptical that composition within ObjectStoreProvider wouldn't be sufficient, as opposed to adding more options to ObjectStoreRegistry |
Could you take a look of apache/datafusion-ballista#260? It's one example usage of the ObjectStoreProvider. I just don't think it's a good idea to mix error and none in the ObjectStoreProvider |
I already left a comment on it - apache/datafusion-ballista#260 (comment) 😅 It would be superior imo if it matched the scheme and returned an error saying "need to compile with hdfs feature" if a hdfs url was requested but the feature was not enabled at compile time. Similar approaches could be taken for S3, GCP, etc... |
For a standalone system, we should know which object stores to be used before running it. Therefore, it's OK to make the decision at the compiling phase by enabling different features. If we want to support multiple object stores, like S3, HDFS, we can just enable multiple features there. And let the ObjectStoreProvider parse and decide which one to return. |
An error is just an option with more context, if you want to discard that context and use it as an option, you can use |
What if we could encode the same information of So if we had fn get_by_url(&self, url: &Url) -> Result<dyn ObjectStore> {
...
} Then the code in ballista might look like: // If no store found, then try to find and register one through XXXXX.
match store {
Err(DataFusionError::StoreNotFound(_)) => {
store = self.register_through_XXXXX(url)?;
}
_ => return store; Or something similar The benefit of this, as @tustvold mentions, is that the reason for the store not being found (like support wasn't compiled in) could be included That being said, I don't personally have a preference between |
A structured error would be a good solution if we need to respond differently to the not found error 👍, we may be able to keep it even simpler though if all error variants are to be handled in the same manner, which I suspect is the case |
Thanks @alamb. The structured error is better than throw a general error. However, I still don't think it should be an error when no object store found through the ObjectStoreProvider. It's the ObjectStoreRegistry's duty to decide when to throw an error and when to continue in case of that we add new ways to find and register object stores in the future beside ObjectStoreProvider. |
What do you mean by not found? I think this is key, I was under the perhaps mistaken impression it was any condition that meant the ObjectStoreProvider was unable to provide an ObjectStore, be it invalid URL, scheme, environment, etc..., basically any error. What is the special "not found" semantic and why is it special? |
Hi @tustvold, thanks for point it out. I think the main bifurcation points is the role of the ObjectStoreProvider. The initial purpose of introducing ObjectStoreProvider is to make it act as an supplementary way to find an object store besides the manual registration way. However, if we hope to make ObjectStoreProvider able to find all of the object stores used, then I will agree with your suggestion. |
Aah, I hadn't understood this. I was viewing |
It also makes sense. If only we make an agreement of the role of ObjectStoreProvider, I'm OK with the related implementation. Then maybe it's better for us to add some comments for the ObjectStoreProvider |
Sounds good to me, I would be happy to do this if you would like? Sorry for the somewhat protracted discussion, it would appear we were both operating with differing sets of assumptions 😅 |
Assumption is important😅. It's my fault for not clearing my assumption in the beginning. I'm OK with your assumption, if you can refine the current comments. |
Which issue does this PR close?
Closes #3594.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?