- 
                Notifications
    
You must be signed in to change notification settings  - Fork 161
 
feat(csharp/src/Drivers/Databricks): Optimize GetColumnsExtendedAsync via DESC TABLE EXTENDED #2953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…xtended to implement GetColumnsExtendedAsync
81b326f    to
    7d83b75      
    Compare
  
    49b3f6e    to
    8c73c0b      
    Compare
  
    | * limitations under the License. | ||
| */ | ||
| 
               | 
          ||
| using Apache.Arrow.Adbc.Drivers.Databricks.Result; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mocify/improve the existing GetColumnExtended tests to verify that:
With the flag on/off, they get same number of columns, same schema, and same data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it to support test for both flag on/off as below, they are almost same except for the IS_NULLABLE on primary key.
        [InlineData("all_column_types", "Resources/create_table_all_types.sql", "Resources/result_get_column_extended_all_types.json", true, new[] { "PK_IS_NULLABLE:NO" })]
        [InlineData("all_column_types", "Resources/create_table_all_types.sql", "Resources/result_get_column_extended_all_types.json", false, new[] { "PK_IS_NULLABLE:YES" })]
        public async Task CanGetColumnsExtended(string tableName,string createTableSqlLocation, string resultLocation, bool useDescTableExtended, string[] ?extraPlaceholdsInResult = null)4d4af63    to
    4a0d05a      
    Compare
  
    4a0d05a    to
    bb93562      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good! I've just got a few nits about formatting and casing of method names.
        
          
                csharp/src/Drivers/Databricks/Result/DescTableExtendedResult.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                csharp/src/Drivers/Databricks/Result/DescTableExtendedResult.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                csharp/src/Drivers/Databricks/Result/DescTableExtendedResult.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                csharp/src/Drivers/Databricks/Result/DescTableExtendedResult.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
      ba704ef    to
    aa7a763      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The changes look good but there's a merge conflict. Could you rebase on top of the latest sourced?
          
 Done.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Optimize GetColumnsExtendedAsync in Databricks via
DESC TABLE EXTENDEDMotivation
Currently,
HiveServer2Statement.GetColumnsExtendedAsyncwill make 3 thrift callsGetColumns,GetPrimaryKeysandGetCrossReferenceto get all the information and then join them into one result set. Now Databricks introduces a SQLDESC TABLE EXTENDED <table> AS JSONthat will return all of these information in one query, this can save 2 extra roundtrips and improve the performance.Description
Override
HiveServer2Statement.GetColumnsExtendedAsyncinDatabricksStatementby executing single SQLDESC TABLE EXTENDED <table> AS JSONto get all the required column info forGetColumnsExtendedAsyncthen combine and join these info into the result. As this SQLDESC TABLE EXTENDED <table> AS JSONis only available in Databricks Runtime 16.2 or above, it will check theServerProtocolVersion. if it does not meet the requirement, it will fallback the implementation of base class.Change
DescTableExtendedResultthat represents the result ofDESC TABLE EXTENDED <table> AS JSON(see format here). It alsotable_constraintsto the structured primary key and foreign key constraintsDataType,ColumnSize,FullTypeNamewhich are calculated from the column type and type specific propertiesHiveServer2Statementto protected so that they can be used/overridden in subclassDatabricksStatementPrimaryKeyFieldsForeignKeyFieldsPrimaryKeyPrefixForeignKeyPrefixGetColumnsExtendedAsyncCreateEmptyExtendedColumnsResultDatabricksStatementwith the changes belowGetColumnsExtendedAsyncinDatabricksStatement, it executes queryDESC TABLE EXTENDED <table> AS JSONto get all the required info and then json them into the QueryResultGetColumnsAsyncto a reusable methodCreateColumnMetadataSchemaGetColumnsAsyncto a reusable methodCreateColumnMetadataEmptyArrayadbc.databricks.use_desc_table_extendedCanUseDescTableExtendedinDatabricksConnection,DatabricksStatementwill call it to decide if it will overrideGetColumnsExtendedAsyncStatementTest:CanGetColumnsExtendedTesting
DescTableExtendedResultTestto cover all the deserialization and parsing cases from raw result ofDESC TABLE EXTENDED <table> AS JSONColumnsExtendedrelevant test cases inDatabricks/StatementTest