-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(csharp/src/Drivers): introduce drivers for Apache systems built on Thrift #1710
Conversation
…dbc into dev/introdrivers
Added implementation for table schema in SparkConnection
Adding getObjects impl
Adding back vikrant's changes
Fixing compile error
…nto dev/apache-drivers
…ldb/arrow-adbc into dev/apache-drivers"" This reverts commit f7ca693.
I think one of the nodes in the plan is just an arbitrary SQL statement. #323 (comment) |
I'd be interested in the relative performance. But given this works, I'd also like to figure out if we can package a standalone/AOT compiled version of this and ship a new Python package... |
It looks like the current code can indeed be AOT-compiled, but the currently-checked-in "export C API" code in the base library is still experimental quality at best. I haven't had much incentive to go back and clean it up because the BigQuery dependencies don't currently work with AOT compilation so this will be the first C# implementation where AOT actually makes sense. |
well, that was a fun exercise to get the checks to pass. should be ready now. |
Add tests for * Binary/Boolean * Date/Timestamp/Interval * Complex Types (ARRAY/MAP/STRUCT)
* Removed tests for complex types. * Added tests for long and null.
Added tests for complex types
…e-tests test(csharp/test/Drivers/Apache/Spark): more coverage for value tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a good start! I think the single biggest piece of feedback I have is that it's hard to tell what's done/implemented and what's not yet done. This is a problem we also have for the initial C API export code, and I very much regret not having done a better job of documenting that when I implemented it rather than rediscovering it later. I think the best way to address this is with a README.md in the Apache directory which says more explicitly what still needs to be implemented.
It would also be good to mention explicitly that this only supports little-endian platforms, though I think that's actually true of the C# Arrow implementation in general.
I know the Impala functionality isn't testing yet, but has it been shown to work at all? If not, this too shoud be mentioned.
Once the code is checked in, we can turn the gaps into follow-on issues -- perhaps with a single omnibus issue listing the details -- and track their completion that way.
csharp/test/Drivers/Apache/Apache - Backup.Arrow.Adbc.Tests.Drivers.Apache.csproj
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,237 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a README.md to this directory indicating which files have been hand-edited and perhaps how the other files were generated. (If these were the files I originally generated, then I guess I'll have to remember... :/.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added information in the README about the manually maintained files.
for (int i = 0; i < thriftSchema.Columns.Count; i++) | ||
{ | ||
TColumnDesc column = thriftSchema.Columns[i]; | ||
fields[i] = new Field(column.ColumnName, GetArrowType(column.TypeDesc.Types[0]), nullable: true /* ??? */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Thrift type doesn't include a nullable indicator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no nullable metadata returned in the Thrift API call to GetResultSetMetadata
. Will add a comment to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a code comment and documentation in README about possible inaccurate nullable indicator.
{ | ||
return GetArrowType(thriftType.PrimitiveEntry); | ||
} | ||
throw new InvalidOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For gaps like "doesn't support structured types", please make sure there are issues filed in GitHub. For major gaps (this may be one) it would also be good to call them out in a README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added documentation in README on how structured interval data types are handled.
protected abstract TProtocol CreateProtocol(); | ||
protected abstract TOpenSessionReq CreateSessionRequest(); | ||
|
||
public override IArrowArrayStream GetObjects(GetObjectsDepth depth, string catalogPattern, string dbSchemaPattern, string tableNamePattern, List<string> tableTypes, string columnNamePattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be quite incomplete. Consider removing it from this PR and submitting separately once it's complete and tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't remove it. HiveServer2 is the base class that Spark and Impala build on, but I will add details to the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the body, or at least the parts of the body that aren't implemented. It could always throw a NotImplementedException for when e.g. depth != GetObjectsDepth.All.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, removed the implementation here. It was not working and not useful as base functionality.
TGetOperationStatusResp statusResponse = null; | ||
do | ||
{ | ||
if (statusResponse != null) { Thread.Sleep(500); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good reminder for me that we really need a more async-friendly API, ideally a cross-process one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean at the ADBC API definition level? CC @zeroshade who has been thinking about this. It's something I would like to tackle, but there are questions about compatibility and what happens to the sync API afterwards and if we also want to try to 'fix' other things at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put some thoughts into #811 which is the only existing issue we have that sort of tracks async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CurtHagenlocher - I have some private WIP to use async
as much as possible in this C# implementation. It covers end-to-end async support for ExecuteQueryAsync
/ExecuteUpdateAsync
. I've put it to side to work on the GetObjects implementation.
* Code Review Improvements * Fixed line ending.
@CurtHagenlocher - I believe we've responded to all your comments, so far. Let us know if there is anything else you think we should address in this PR. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think that we'll check this in as-is and then I'll start filing followup issues.
…on Thrift (apache#1710) This PR introduces new drivers built on the Thrift protocol: Hive, Impala, and Spark. The main focus has been on Spark (including Databricks) for the initial set of tests. --------- Co-authored-by: David Coe <coedavid@umich.edu> Co-authored-by: vikrantpuppala <vikrant.puppala@databricks.com> Co-authored-by: Gopal Lal <gopal.lal@databricks.com> Co-authored-by: Vikrant Puppala <vikrantpuppala@gmail.com> Co-authored-by: Gopal Lal <135012033+gopalldb@users.noreply.github.com> Co-authored-by: Jade Wang <111902719+jadewang-db@users.noreply.github.com> Co-authored-by: yunbodeng-db <104732431+yunbodeng-db@users.noreply.github.com> Co-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Bruce Irschick <bruce.irschick@improving.com>
…on Thrift (apache#1710) This PR introduces new drivers built on the Thrift protocol: Hive, Impala, and Spark. The main focus has been on Spark (including Databricks) for the initial set of tests. --------- Co-authored-by: David Coe <coedavid@umich.edu> Co-authored-by: vikrantpuppala <vikrant.puppala@databricks.com> Co-authored-by: Gopal Lal <gopal.lal@databricks.com> Co-authored-by: Vikrant Puppala <vikrantpuppala@gmail.com> Co-authored-by: Gopal Lal <135012033+gopalldb@users.noreply.github.com> Co-authored-by: Jade Wang <111902719+jadewang-db@users.noreply.github.com> Co-authored-by: yunbodeng-db <104732431+yunbodeng-db@users.noreply.github.com> Co-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Bruce Irschick <bruce.irschick@improving.com>
This PR introduces new drivers built on the Thrift protocol: Hive, Impala, and Spark. The main focus has been on Spark (including Databricks) for the initial set of tests.