docs(csharp/src/Drivers/Apache/Spark): document connection properties (…

…apache#2019) Add documentation for connection properties * updates existing documentation for Apache/Thrift-based drivers
birschick-bq · Jul 18, 2024 · 2435619 · 2435619
1 parent db0852c
commit 2435619
Show file tree

Hide file tree

Showing 2 changed files with 90 additions and 33 deletions.
diff --git a/csharp/src/Drivers/Apache/Spark/README.md b/csharp/src/Drivers/Apache/Spark/README.md
@@ -0,0 +1,84 @@
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+-->
+
+# Spark Driver
+
+## Database and Connection Properties
+
+Properties should be passed in the call to `SparkDriver.Open`,
+but can also be passed in the call to `AdbcDatabase.Connect`.
+
+| Property            | Description | Default |
+| :---                | :---        | :---    |
+| `adbc.spark.host`   | Host name for the data source. Do no include scheme or port number. Example: `sparkserver.region.cloudapp.azure.com` |  |
+| `adbc.spark.port`   | The port number the data source is listen on for new connections. | `443` |
+| `adbc.spark.path`   | The URI path on the data source server. Example: `sql/protocolv1/o/0123456789123456/01234-0123456-source` | |
+| `adbc.spark.token`  | For token-based authentication, the token to be authenticated on the data source. Example: `abcdef0123456789` | |
+<!-- Add these properties when basic authentication is available.
+| `adbc.spark.scheme` | The HTTP or HTTPS scheme to use. Allowed values: `http`, `https`. | `https` - when port is 443 or empty, `http`, otherwise. |
+| `auth_type`         | An indicator of the intended type of authentication. Allowed values: `basic`, `token`. This property is optional. The authentication type can be inferred from `token`, `username`, and `password`. If a `token` value is provided, token authentication is used. Otherwise, if both `username` and `password` values are provided, basic authentication is used. | |
+| `username`          | The user name used for basic authentication | |
+| `password`          | The password for the user name used for basic authentication. | |
+-->
+
+## Spark Types
+
+The following table depicts how the Spark ADBC driver converts a Spark type to an Arrow type and a .NET type:
+
+| Spark Type           | Arrow Type | C# Type |
+| :---                 | :---:      | :---:   |
+| ARRAY*               | String     | string  |
+| BIGINT               | Int64      | long |
+| BINARY               | Binary     | byte[] |
+| BOOLEAN              | Boolean    | bool |
+| CHAR                 | String     | string |
+| DATE                 | Date32     | DateTime |
+| DECIMAL              | Decimal128 | SqlDecimal |
+| DOUBLE               | Double     | double |
+| FLOAT                | Float      | float |
+| INT                  | Int32      | int |
+| INTERVAL_DAY_TIME+   | String     | string |
+| INTERVAL_YEAR_MONTH+ | String     | string |
+| MAP*                 | String     | string |
+| NULL                 | Null       | null |
+| SMALLINT             | Int16      | short |
+| STRING               | String     | string |
+| STRUCT*              | String     | string |
+| TIMESTAMP            | Timestamp  | DateTimeOffset |
+| TINYINT              | Int8       | sbyte |
+| UNION                | String     | string |
+| USER_DEFINED         | String     | string |
+| VARCHAR              | String     | string |
+
+\* Complex types are returned as strings<br>
+\+ Interval types are returned as strings
+
+## Supported Variants
+
+### Spark on Databricks
+
+Support for Spark on Databricks is the most mature.
+
+The Spark ADBC driver supports token-based authentiation using the
+[Databricks personal access token](https://docs.databricks.com/en/dev-tools/auth/pat.html).
+Basic (username and password) authenication is not supported, at this time.
+
+### Native Apache Spark
+
+This is currently unsupported.
diff --git a/csharp/src/Drivers/Apache/readme.md b/csharp/src/Drivers/Apache/readme.md
@@ -18,6 +18,7 @@
 -->
 
 # Thrift-based Apache connectors
+
 This library contains code for ADBC drivers built on top of the Thrift protocol with Arrow support:
 
 - Hive
@@ -27,6 +28,7 @@ This library contains code for ADBC drivers built on top of the Thrift protocol
 Each driver is at a different state of implementation.
 
 ## Custom generation
+
 Typically, [Thrift](https://thrift.apache.org/) code is generated from the Thrift compiler. And that is mostly true here as well. However, some files were further edited to include Arrow support. These contain the phrase `BUT THIS FILE HAS BEEN HAND EDITED TO SUPPORT ARROW SO REGENERATE AT YOUR OWN RISK` at the top. Some of these files include:
 
 ```
@@ -41,55 +43,26 @@ arrow-adbc/csharp/src/Drivers/Apache/Thrift/Service/Rpc/Thrift/TStringColumn.cs
 ```
 
 # Hive
+
 The Hive classes serve as the base class for Spark and Impala, since both of those platform implement Hive capabilities.
 
 Core functionality of the Hive classes beyond the base library implementation is under development, has limited functionality, and may produce errors.
 
 # Impala
+
 The Imapala classes are under development, have limited functionality, and may produce errors.
 
 # Spark
-The Spark classes are intended for use against native Spark and Spark on Databricks.
 
-## Spark Types
-
-The following table depicts how the Spark ADBC driver converts a Spark type to an Arrow type and a .NET type:
-
-| Spark Type           | Arrow Type | C# Type |
-| :---                 | :---:      | :---:   |
-| ARRAY*               | String     | string  |
-| BIGINT               | Int64      | long |
-| BINARY               | Binary     | byte[] |
-| BOOLEAN              | Boolean    | bool |
-| CHAR                 | String     | string |
-| DATE                 | Date32     | DateTime |
-| DECIMAL              | Decimal128 | SqlDecimal |
-| DOUBLE               | Double     | double |
-| FLOAT                | Float      | float |
-| INT                  | Int32      | int |
-| INTERVAL_DAY_TIME+   | String     | string |
-| INTERVAL_YEAR_MONTH+ | String     | string |
-| MAP*                 | String     | string |
-| NULL                 | Null       | null |
-| SMALLINT             | Int16      | short |
-| STRING               | String     | string |
-| STRUCT*              | String     | string |
-| TIMESTAMP            | Timestamp  | DateTimeOffset |
-| TINYINT              | Int8       | sbyte |
-| UNION                | String     | string |
-| USER_DEFINED         | String     | string |
-| VARCHAR              | String     | string |
-
-\* Complex types are returned as strings<br>
-\+ Interval types are returned as strings
+The Spark classes are intended for use against native Spark and Spark on Databricks.
 
+For more details, see [Spark Driver](Spark/README.md)
 
 ## Known Limitations
 
 1. The API `SparkConnection.GetObjects` is not fully tested at this time
    1. It may not return all catalogs and schema in the server.
    1. It may throw an exception when returning object metadata from multiple catalog and schema.
-1. API `Connection.GetTableSchema` does not return correct precision and scale for `NUMERIC`/`DECIMAL` types.
 1. When a `NULL` value is returned for a `BINARY` type it is instead being returned as an empty array instead of the expected `null`.
 1. Result set metadata does not provide information about the nullability of each column. They are marked as `nullable`    by default, which may not be accurate.
 1. The **Impala** driver is untested and is currently unsupported.