-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PySQL Connector split into core and non core part #444
Open
jprakash-db
wants to merge
14
commits into
main
Choose a base branch
from
PECO-1803/connector-split
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,299
−5,539
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ore part (#417) * Implemented ColumnQueue to test the fetchall without pyarrow Removed token removed token * order of fields in row corrected * Changed the folder structure and tested the basic setup to work * Refractored the code to make connector to work * Basic Setup of connector, core and sqlalchemy is working * Basic integration of core, connect and sqlalchemy is working * Setup working dynamic change from ColumnQueue to ArrowQueue * Refractored the test code and moved to respective folders * Added the unit test for column_queue Fixed __version__ Fix * venv_main added to git ignore * Added code for merging columnar table * Merging code for columnar * Fixed the retry_close sesssion test issue with logging * Fixed the databricks_sqlalchemy tests and introduced pytest.ini for the sqla_testing * Added pyarrow_test mark on pytest * Fixed databricks.sqlalchemy to databricks_sqlalchemy imports * Added poetry.lock * Added dist folder * Changed the pyproject.toml * Minor Fix * Added the pyarrow skip tag on unit tests and tested their working * Fixed the Decimal and timestamp conversion issue in non arrow pipeline * Removed not required files and reformatted * Fixed test_retry error * Changed the folder structure to src / databricks * Removed the columnar non arrow flow to another PR * Moved the README to the root * removed columnQueue instance * Revmoved databricks_sqlalchemy dependency in core * Changed the pysql_supports_arrow predicate, introduced changes in the pyproject.toml * Ran the black formatter with the original version * Extra .py removed from all the __init__.py files names * Undo formatting check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * BIG UPDATE * Refeactor code * Refractor * Fixed versioning * Minor refractoring * Minor refractoring
jprakash-db
requested review from
rcypher-databricks,
yunbodeng-db,
andrefurlan-db,
jackyhu-db,
benc-db and
kravets-levko
as code owners
September 24, 2024 05:04
jprakash-db
had a problem deploying
to
azure-prod
September 25, 2024 17:11
— with
GitHub Actions
Failure
jprakash-db
had a problem deploying
to
azure-prod
October 8, 2024 19:02
— with
GitHub Actions
Failure
jprakash-db
had a problem deploying
to
azure-prod
October 8, 2024 19:28
— with
GitHub Actions
Failure
jprakash-db
had a problem deploying
to
azure-prod
October 8, 2024 19:34
— with
GitHub Actions
Failure
jprakash-db
had a problem deploying
to
azure-prod
October 17, 2024 05:22
— with
GitHub Actions
Failure
…ave pyarrow as optional
jprakash-db
had a problem deploying
to
azure-prod
November 6, 2024 08:04
— with
GitHub Actions
Failure
jprakash-db
temporarily deployed
to
azure-prod
November 11, 2024 17:07
— with
GitHub Actions
Inactive
Print warning message if pyarrow is not installed Signed-off-by: Jacky Hu <jacky.hu@databricks.com>
jprakash-db
temporarily deployed
to
azure-prod
November 13, 2024 04:48
— with
GitHub Actions
Inactive
Remove sqlalchemy and update README.md Signed-off-by: Jacky Hu <jacky.hu@databricks.com>
jprakash-db
temporarily deployed
to
azure-prod
November 13, 2024 05:12
— with
GitHub Actions
Inactive
jprakash-db
temporarily deployed
to
azure-prod
November 13, 2024 05:47
— with
GitHub Actions
Inactive
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Links
databricks_sqlalchemy split is present in this PR - https://github.com/databricks/databricks-sqlalchemy/pull/1
Description
databricks-sql-python library is split so that package size can be reduced for the end user based on their requirements
Particularly pyarrow is the heavy component that is planned to be kept optional.
Users View
So for end users, who just want to use the core functionality can use the databricks_sql_connector_core, which does not have pyarrow and thus will be much smaller in size. These users will primarily be interested in dealing with small sizes of data.
For the remaining users can continue using the package as it is
existing library split into
Tasks Completed
How to Test
Testing pipeline remains the same as it is before the split.
pytest can be used to directly run both the integration as well as unit tests, by
pytest [directory_name or file_name]
Performance Comparison - Benchmarking
The pre-split and post-split preformance comparison has been made using the large and small queries to make sure their is no regression of performance
Dashboard has been created so that everytime the benchmarking is run the result are stored in the benchfood, and comparisons can be made easily