Skip to content

Commit 640cc82

Browse files
SEA: add support for Hybrid disposition (#631)
* Revert "Merge branch 'sea-migration' into exec-models-sea" This reverts commit 8bd12d8, reversing changes made to 030edf8. * Revert "Merge branch 'exec-models-sea' into exec-phase-sea" This reverts commit be1997e, reversing changes made to 37813ba. * change logging level Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove _get_schema_bytes (for now) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * redundant comments Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove fetch phase methods Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce code repetititon + introduce gaps after multi line pydocs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move description extraction to helper func Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * formatting (black) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add more unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * streamline unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * test getting the list of allowed configurations Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce diff Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce diff Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * house constants in enums for readability and immutability Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add note on hybrid disposition Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * [squashed from cloudfetch-sea] introduce external links + arrow functionality Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce responsibility of Queue Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce repetition in arrow tablee creation Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce redundant code in CloudFetchQueue Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move chunk link progression to separate func Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove redundant log Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * improve logging Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove reliance on schema_bytes in SEA Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove redundant note on arrow_schema_bytes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * use more fetch methods Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove redundant schema_bytes from parent constructor Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * only call get_chunk_link with non null chunk index Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * align SeaResultSet structure with ThriftResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remvoe _fill_result_buffer from SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce code repetition Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * align SeaResultSet with ext-links-sea Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove redundant methods Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * update unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove accidental venv changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * pre-fetch next chunk link on processing current Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce nesting Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * line break after multi line pydoc Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * re-introduce schema_bytes for better abstraction (likely temporary) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add fetchmany_arrow and fetchall_arrow Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove accidental changes in sea backend tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove irrelevant changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary test changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary changes in thrift backend tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unimplemented methods test Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unimplemented method tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * modify example scripts to include fetch calls Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add GetChunksResponse Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove changes to sea test Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * re-introduce accidentally removed description extraction method Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix type errors (ssl_options, CHUNK_PATH_WITH_ID..., etc.) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * access ssl_options through connection Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * DEBUG level Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove explicit multi chunk test Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move cloud fetch queues back into utils.py Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess docstrings Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move ThriftCloudFetchQueue above SeaCloudFetchQueue Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix sea connector tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * correct patch module path in cloud fetch queue tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unimplemented methods test Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * correct add_link docstring Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove invalid import Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * better align queries with JDBC impl Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * line breaks after multi-line PRs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix: introduce ExecuteResponse import Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unimplemented metadata methods test, un-necessary imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * introduce unit tests for metadata methods Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove verbosity in ResultSetFilter docstring Co-authored-by: jayant <167047871+jayantsing-db@users.noreply.github.com> * remove un-necessary info in ResultSetFilter docstring Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove explicit type checking, string literals around forward annotations Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * house SQL commands in constants Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * convert complex types to string if not _use_arrow_native_complex_types Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * introduce unit tests for altered functionality Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * Revert "Merge branch 'fetch-json-inline' into ext-links-sea" This reverts commit dabba55, reversing changes made to dd7dc6a. Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce verbosity of ResultSetFilter docstring Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * Revert "Merge branch 'fetch-json-inline' into ext-links-sea" This reverts commit 3a999c0, reversing changes made to a1f9b9c. * Revert "reduce verbosity of ResultSetFilter docstring" This reverts commit a1f9b9c. * Reapply "Merge branch 'fetch-json-inline' into ext-links-sea" This reverts commit 48ad7b3. * Revert "Merge branch 'fetch-json-inline' into ext-links-sea" This reverts commit dabba55, reversing changes made to dd7dc6a. * remove un-necessary filters changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary backend changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove constants changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove changes in filters tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unit test backend and JSON queue changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove changes in sea result set testing Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * Revert "remove changes in sea result set testing" This reverts commit d210ccd. * Revert "remove unit test backend and JSON queue changes" This reverts commit f6c5950. * Revert "remove changes in filters tests" This reverts commit f3f795a. * Revert "remove constants changes" This reverts commit 802d045. * Revert "remove un-necessary backend changes" This reverts commit 20822e4. * Revert "remove un-necessary filters changes" This reverts commit 5e75fb5. * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * working version Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * adopy _wait_until_command_done Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * introduce metadata commands Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * use new backend structure Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * constrain backend diff Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove changes to filters Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * make _parse methods in models internal Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce changes in unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * run small queries with SEA during integration tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * run some tests for sea Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * allow empty schema bytes for alignment with SEA Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * pass is_vl_op to Sea backend ExecuteResponse Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove catalog requirement in get_tables Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move filters.py to SEA utils Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * ensure SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * prevent circular imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove cast, throw error if not SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * pass param as TSparkParameterValue Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove failing test (temp) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove SeaResultSet type assertion Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * change errors to align with spec, instead of arbitrary ValueError Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * make SEA backend methods return SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * use spec-aligned Exceptions in SEA backend Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove defensive row type check Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * raise ProgrammingError for invalid id Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * make is_volume_operation strict bool Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove complex types code Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * Revert "remove complex types code" This reverts commit 138359d. * introduce type conversion for primitive types for JSON + INLINE Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove SEA running on metadata queries (known failures Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary docstrings Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * align expected types with databricks sdk Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * link rest api reference to validate types Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove test_catalogs_returns_arrow_table test metadata commands not expected to pass Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix fetchall_arrow and fetchmany_arrow Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove thrift aligned test_cancel_during_execute from SEA tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary changes in example scripts Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary chagnes in example scripts Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * _convert_json_table -> _create_json_table Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove accidentally removed test Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove new unit tests (to be re-added based on new arch) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove changes in sea_result_set functionality (to be re-added) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * introduce more integration tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove SEA tests in parameterized queries Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove partial parameter fix changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary timestamp tests (pass with minor disparity) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * slightly stronger typing of _convert_json_types Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * stronger typing of json utility func s Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * stronger typing of fetch*_json Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused helper methods in SqlType Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * line breaks after multi line pydocs, remove excess logs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * line breaks after multi line pydocs, reduce diff of redundant changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce diff of redundant changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * mandate ResultData in SeaResultSet constructor Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove complex type conversion Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * correct fetch*_arrow Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * recover old sea tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move queue and result set into SEA specific dir Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * pass ssl_options into CloudFetchQueue Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce diff Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove redundant conversion.py Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix type issues Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * ValueError not ProgrammingError Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce diff Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * introduce SEA cloudfetch e2e tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * allow empty cloudfetch result Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add unit tests for CloudFetchQueue and SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * skip pyarrow dependent tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * simplify download process: no pre-fetching Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * correct class name in logs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * align with old impl Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * align next_n_rows with prev imple Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * align remaining_rows with prev impl Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary Optional params Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary changes in thrift field if tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * init hybrid * run large queries Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * hybrid disposition Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-ncessary log Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * formatting (black) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove redundant tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * multi frame decompression of lz4 Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove custom multi-frame decompressor for lz4 Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move link fetching immediately before table creation so link expiry is not an issue Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * formatting (black) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix types Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix param type in unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * correct param extraction Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove common constructor for databricks client abc Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * make SEA Http Client instance a private member Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * make GetChunksResponse model more robust Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add link to doc of GetChunk response model Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * pass result_data instead of "initial links" into SeaCloudFetchQueue Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move download_manager init into parent CloudFetchQueue Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * raise ServerOperationError for no 0th chunk Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * unused iports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * return None in case of empty respose Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * ensure table is empty on no initial link s Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * account for total chunk count Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * iterate over chunk indexes instead of link Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * stronger typing Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove string literals around type defs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * introduce DownloadManager import Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * return None for immediate out of bounds Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * iterate by chunk index instead of link Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * improve docstring Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary (?) changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * get_chunk_link -> get_chunk_links in unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * align tests with old message Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * simplify attachment handling Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add unit tests for hybrid disposition Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove duplicate total_chunk_count assignment Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> --------- Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
1 parent c07beb1 commit 640cc82

File tree

9 files changed

+236
-48
lines changed

9 files changed

+236
-48
lines changed

examples/experimental/tests/test_sea_async_query.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ def test_sea_async_query_with_cloud_fetch():
4545
use_sea=True,
4646
user_agent_entry="SEA-Test-Client",
4747
use_cloud_fetch=True,
48+
enable_query_result_lz4_compression=False,
4849
)
4950

5051
logger.info(

examples/experimental/tests/test_sea_sync_query.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ def test_sea_sync_query_with_cloud_fetch():
4343
use_sea=True,
4444
user_agent_entry="SEA-Test-Client",
4545
use_cloud_fetch=True,
46+
enable_query_result_lz4_compression=False,
4647
)
4748

4849
logger.info(

src/databricks/sql/backend/sea/backend.py

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,8 @@ def __init__(
130130
"_use_arrow_native_complex_types", True
131131
)
132132

133+
self.use_hybrid_disposition = kwargs.get("use_hybrid_disposition", True)
134+
133135
# Extract warehouse ID from http_path
134136
self.warehouse_id = self._extract_warehouse_id(http_path)
135137

@@ -456,7 +458,11 @@ def execute_command(
456458
ResultFormat.ARROW_STREAM if use_cloud_fetch else ResultFormat.JSON_ARRAY
457459
).value
458460
disposition = (
459-
ResultDisposition.EXTERNAL_LINKS
461+
(
462+
ResultDisposition.HYBRID
463+
if self.use_hybrid_disposition
464+
else ResultDisposition.EXTERNAL_LINKS
465+
)
460466
if use_cloud_fetch
461467
else ResultDisposition.INLINE
462468
).value
@@ -637,7 +643,9 @@ def get_execution_result(
637643
arraysize=cursor.arraysize,
638644
)
639645

640-
def get_chunk_link(self, statement_id: str, chunk_index: int) -> ExternalLink:
646+
def get_chunk_links(
647+
self, statement_id: str, chunk_index: int
648+
) -> List[ExternalLink]:
641649
"""
642650
Get links for chunks starting from the specified index.
643651
Args:
@@ -654,17 +662,7 @@ def get_chunk_link(self, statement_id: str, chunk_index: int) -> ExternalLink:
654662
response = GetChunksResponse.from_dict(response_data)
655663

656664
links = response.external_links or []
657-
link = next((l for l in links if l.chunk_index == chunk_index), None)
658-
if not link:
659-
raise ServerOperationError(
660-
f"No link found for chunk index {chunk_index}",
661-
{
662-
"operation-id": statement_id,
663-
"diagnostic-info": None,
664-
},
665-
)
666-
667-
return link
665+
return links
668666

669667
# == Metadata Operations ==
670668

src/databricks/sql/backend/sea/models/responses.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
These models define the structures used in SEA API responses.
55
"""
66

7+
import base64
78
from typing import Dict, Any, List, Optional
89
from dataclasses import dataclass
910

@@ -91,6 +92,11 @@ def _parse_result(data: Dict[str, Any]) -> ResultData:
9192
)
9293
)
9394

95+
# Handle attachment field - decode from base64 if present
96+
attachment = result_data.get("attachment")
97+
if attachment is not None:
98+
attachment = base64.b64decode(attachment)
99+
94100
return ResultData(
95101
data=result_data.get("data_array"),
96102
external_links=external_links,
@@ -100,7 +106,7 @@ def _parse_result(data: Dict[str, Any]) -> ResultData:
100106
next_chunk_internal_link=result_data.get("next_chunk_internal_link"),
101107
row_count=result_data.get("row_count"),
102108
row_offset=result_data.get("row_offset"),
103-
attachment=result_data.get("attachment"),
109+
attachment=attachment,
104110
)
105111

106112

src/databricks/sql/backend/sea/queue.py

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55

66
from databricks.sql.cloudfetch.download_manager import ResultFileDownloadManager
77

8+
from databricks.sql.cloudfetch.downloader import ResultSetDownloadHandler
9+
810
try:
911
import pyarrow
1012
except ImportError:
@@ -23,7 +25,12 @@
2325
from databricks.sql.exc import ProgrammingError, ServerOperationError
2426
from databricks.sql.thrift_api.TCLIService.ttypes import TSparkArrowResultLink
2527
from databricks.sql.types import SSLOptions
26-
from databricks.sql.utils import CloudFetchQueue, ResultSetQueue
28+
from databricks.sql.utils import (
29+
ArrowQueue,
30+
CloudFetchQueue,
31+
ResultSetQueue,
32+
create_arrow_table_from_arrow_file,
33+
)
2734

2835
import logging
2936

@@ -62,6 +69,18 @@ def build_queue(
6269
# INLINE disposition with JSON_ARRAY format
6370
return JsonQueue(result_data.data)
6471
elif manifest.format == ResultFormat.ARROW_STREAM.value:
72+
if result_data.attachment is not None:
73+
arrow_file = (
74+
ResultSetDownloadHandler._decompress_data(result_data.attachment)
75+
if lz4_compressed
76+
else result_data.attachment
77+
)
78+
arrow_table = create_arrow_table_from_arrow_file(
79+
arrow_file, description
80+
)
81+
logger.debug(f"Created arrow table with {arrow_table.num_rows} rows")
82+
return ArrowQueue(arrow_table, manifest.total_row_count)
83+
6584
# EXTERNAL_LINKS disposition
6685
return SeaCloudFetchQueue(
6786
result_data=result_data,
@@ -150,7 +169,11 @@ def __init__(
150169
)
151170

152171
initial_links = result_data.external_links or []
153-
first_link = next((l for l in initial_links if l.chunk_index == 0), None)
172+
self._chunk_index_to_link = {link.chunk_index: link for link in initial_links}
173+
174+
# Track the current chunk we're processing
175+
self._current_chunk_index = 0
176+
first_link = self._chunk_index_to_link.get(self._current_chunk_index, None)
154177
if not first_link:
155178
# possibly an empty response
156179
return None
@@ -173,21 +196,24 @@ def _convert_to_thrift_link(self, link: ExternalLink) -> TSparkArrowResultLink:
173196
httpHeaders=link.http_headers or {},
174197
)
175198

176-
def _get_chunk_link(self, chunk_index: int) -> Optional[ExternalLink]:
177-
"""Progress to the next chunk link."""
199+
def _get_chunk_link(self, chunk_index: int) -> Optional["ExternalLink"]:
178200
if chunk_index >= self._total_chunk_count:
179201
return None
180202

181-
try:
182-
return self._sea_client.get_chunk_link(self._statement_id, chunk_index)
183-
except Exception as e:
203+
if chunk_index not in self._chunk_index_to_link:
204+
links = self._sea_client.get_chunk_links(self._statement_id, chunk_index)
205+
self._chunk_index_to_link.update({l.chunk_index: l for l in links})
206+
207+
link = self._chunk_index_to_link.get(chunk_index, None)
208+
if not link:
184209
raise ServerOperationError(
185-
f"Error fetching link for chunk {chunk_index}: {e}",
210+
f"Error fetching link for chunk {chunk_index}",
186211
{
187212
"operation-id": self._statement_id,
188213
"diagnostic-info": None,
189214
},
190215
)
216+
return link
191217

192218
def _create_table_from_link(
193219
self, link: ExternalLink

src/databricks/sql/backend/sea/utils/constants.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ class ResultFormat(Enum):
2828
class ResultDisposition(Enum):
2929
"""Enum for result disposition values."""
3030

31-
# TODO: add support for hybrid disposition
31+
HYBRID = "INLINE_OR_EXTERNAL_LINKS"
3232
EXTERNAL_LINKS = "EXTERNAL_LINKS"
3333
INLINE = "INLINE"
3434

src/databricks/sql/client.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,10 @@ def __init__(
9999
Connect to a Databricks SQL endpoint or a Databricks cluster.
100100
101101
Parameters:
102+
:param use_sea: `bool`, optional (default is False)
103+
Use the SEA backend instead of the Thrift backend.
104+
:param use_hybrid_disposition: `bool`, optional (default is False)
105+
Use the hybrid disposition instead of the inline disposition.
102106
:param server_hostname: Databricks instance host name.
103107
:param http_path: Http path either to a DBSQL endpoint (e.g. /sql/1.0/endpoints/1234567890abcdef)
104108
or to a DBR interactive cluster (e.g. /sql/protocolv1/o/1234567890123456/1234-123456-slid123)

tests/unit/test_sea_backend.py

Lines changed: 16 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -959,8 +959,8 @@ def test_get_columns(self, sea_client, sea_session_id, mock_cursor):
959959
)
960960
assert "Catalog name is required for get_columns" in str(excinfo.value)
961961

962-
def test_get_chunk_link(self, sea_client, mock_http_client, sea_command_id):
963-
"""Test get_chunk_link method."""
962+
def test_get_chunk_links(self, sea_client, mock_http_client, sea_command_id):
963+
"""Test get_chunk_links method when links are available."""
964964
# Setup mock response
965965
mock_response = {
966966
"external_links": [
@@ -979,7 +979,7 @@ def test_get_chunk_link(self, sea_client, mock_http_client, sea_command_id):
979979
mock_http_client._make_request.return_value = mock_response
980980

981981
# Call the method
982-
result = sea_client.get_chunk_link("test-statement-123", 0)
982+
results = sea_client.get_chunk_links("test-statement-123", 0)
983983

984984
# Verify the HTTP client was called correctly
985985
mock_http_client._make_request.assert_called_once_with(
@@ -989,7 +989,10 @@ def test_get_chunk_link(self, sea_client, mock_http_client, sea_command_id):
989989
),
990990
)
991991

992-
# Verify the result
992+
# Verify the results
993+
assert isinstance(results, list)
994+
assert len(results) == 1
995+
result = results[0]
993996
assert result.external_link == "https://example.com/data/chunk0"
994997
assert result.expiration == "2025-07-03T05:51:18.118009"
995998
assert result.row_count == 100
@@ -999,30 +1002,14 @@ def test_get_chunk_link(self, sea_client, mock_http_client, sea_command_id):
9991002
assert result.next_chunk_index == 1
10001003
assert result.http_headers == {"Authorization": "Bearer token123"}
10011004

1002-
def test_get_chunk_link_not_found(self, sea_client, mock_http_client):
1003-
"""Test get_chunk_link when the requested chunk is not found."""
1005+
def test_get_chunk_links_empty(self, sea_client, mock_http_client):
1006+
"""Test get_chunk_links when no links are returned (empty list)."""
10041007
# Setup mock response with no matching chunk
1005-
mock_response = {
1006-
"external_links": [
1007-
{
1008-
"external_link": "https://example.com/data/chunk1",
1009-
"expiration": "2025-07-03T05:51:18.118009",
1010-
"row_count": 100,
1011-
"byte_count": 1024,
1012-
"row_offset": 100,
1013-
"chunk_index": 1, # Different chunk index
1014-
"next_chunk_index": 2,
1015-
"http_headers": {"Authorization": "Bearer token123"},
1016-
}
1017-
]
1018-
}
1008+
mock_response = {"external_links": []}
10191009
mock_http_client._make_request.return_value = mock_response
10201010

1021-
# Call the method and expect an exception
1022-
with pytest.raises(
1023-
ServerOperationError, match="No link found for chunk index 0"
1024-
):
1025-
sea_client.get_chunk_link("test-statement-123", 0)
1011+
# Call the method
1012+
results = sea_client.get_chunk_links("test-statement-123", 0)
10261013

10271014
# Verify the HTTP client was called correctly
10281015
mock_http_client._make_request.assert_called_once_with(
@@ -1031,3 +1018,7 @@ def test_get_chunk_link_not_found(self, sea_client, mock_http_client):
10311018
"test-statement-123", 0
10321019
),
10331020
)
1021+
1022+
# Verify the results are empty
1023+
assert isinstance(results, list)
1024+
assert results == []

0 commit comments

Comments
 (0)