-
Notifications
You must be signed in to change notification settings - Fork 33
FEAT: [POC] BCP implementation in mssql-python driver - #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Created rust_bindings/ with PyO3 sample module (RustConnection, helper functions) - Built mssql-tds project from BCPRust repository - Copied compiled binaries to BCPRustLib/: * libmssql_core_tds.so - Core TDS Python extension * libmssql_js.so - JavaScript bindings * mssql-tds-cli - CLI tool * mssql_py_core wheel package - Updated requirements.txt with maturin and setuptools-rust - Modified setup.py to support hybrid C++/Rust builds - Added test scripts and documentation
|
|
||
| # Use local SQL Server or environment variable | ||
| conn_str = os.getenv("DB_CONNECTION_STRING", | ||
| "Server=localhost,1433;Database=master;UID=sa;PWD=uvFvisUxK4En7AAV;TrustServerCertificate=yes;") |
Check notice
Code scanning / devskim
Accessing localhost could indicate debug code, or could hinder scaling. Note
|
|
||
| # Check if bulk_copy is implemented in mssql_core_tds | ||
| conn_dict = { | ||
| 'server': 'localhost', |
Check notice
Code scanning / devskim
Accessing localhost could indicate debug code, or could hinder scaling. Note test
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/cursor.pyLines 2850-2906 2850 >>> cursor = conn.cursor()
2851 >>> data = [[1, 'Alice', 100.50], [2, 'Bob', 200.75]]
2852 >>> cursor.bulk_copy('employees', data)
2853 """
! 2854 self._check_closed()
2855
! 2856 try:
! 2857 import mssql_rust_bindings
! 2858 except ImportError as e:
! 2859 raise ImportError(
2860 f"Bulk copy requires mssql_rust_bindings module: {e}"
2861 ) from e
2862
2863 # Parse connection string to extract parameters
! 2864 conn_str = self._connection.connection_str
! 2865 params = {}
2866
! 2867 for part in conn_str.split(';'):
! 2868 if '=' in part:
! 2869 key, value = part.split('=', 1)
! 2870 key = key.strip().lower()
! 2871 value = value.strip()
2872
! 2873 if key in ['server', 'data source']:
! 2874 params['server'] = value.split(',')[0] # Remove port if present
! 2875 elif key in ['database', 'initial catalog']:
! 2876 params['database'] = value
! 2877 elif key in ['uid', 'user id', 'user']:
! 2878 params['user_name'] = value
! 2879 elif key in ['pwd', 'password']:
! 2880 params['password'] = value
! 2881 elif key == 'trustservercertificate':
! 2882 params['trust_server_certificate'] = value
2883
2884 # Set defaults if not found
! 2885 params.setdefault('server', 'localhost')
! 2886 params.setdefault('database', 'master')
! 2887 params.setdefault('user_name', 'sa')
! 2888 params.setdefault('password', '')
! 2889 params.setdefault('trust_server_certificate', 'yes')
2890
! 2891 try:
2892 # BulkCopyWrapper handles mssql_core_tds connection internally
! 2893 bulk_wrapper = mssql_rust_bindings.BulkCopyWrapper(params)
! 2894 result = bulk_wrapper.bulk_copy(table_name, data)
! 2895 bulk_wrapper.close()
! 2896 return result
! 2897 except AttributeError as e:
! 2898 raise AttributeError(
2899 "bulk_copy method not implemented in mssql_core_tds.DdbcConnection"
2900 ) from e
! 2901 except Exception as e:
! 2902 raise DatabaseError(
2903 driver_error=f"Bulk copy operation failed: {e}",
2904 ddbc_error=str(e)
2905 ) from e📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.hpp: 58.8%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.row.py: 66.2%
mssql_python.helpers.py: 67.5%
mssql_python.pybind.ddbc_bindings.cpp: 69.4%
mssql_python.pybind.ddbc_bindings.h: 71.7%
mssql_python.pybind.connection.connection.cpp: 73.6%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.cursor.py: 81.7%🔗 Quick Links
|
| """ | ||
| # This is a no-op - buffer sizes are managed automatically | ||
|
|
||
| def bulk_copy(self, table_name: str, data: List[List[Any]]) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start!
An important thing that's missing is a target column list. Imagine that your table has columns (a, b, c, d). "b" is nullable and you wish to bulk copy values for only columns "a", "c", and "d". As a user you should be able to supply a positional list of columns that the input data should correspond to.
Another thing: The outer type for the "data" input should not be List, but rather Iterable, and the inner type should be either List or Tuple. The latter is more of a type hinting issue; I suspect either List or Tuple will work already given the implementation. The Iterable, however, might require some work on the Rust side. This is really important from a bulk copy perspective as a user might wish to send many more rows than will fit in memory at once.
Outside of those things it would be nice to see some common control flags, such as the ability to enable or disable trigger firing, but I'd say that those are less important than the top two.
Great PR so far!
Work Item / Issue Reference
Summary
This pull request introduces Rust-based Python bindings for SQL Server bulk copy operations, providing an alternative to the existing C++ bindings. It adds a new
mssql_rust_bindingsmodule using PyO3, integrates aBulkCopyWrapperfor efficient data transfer, and includes comprehensive documentation and build scripts for both Linux and Windows. The main demo script is updated to showcase usage of the new Rust bindings for bulk copy.Key Components:
User/Application: Calls the cursor's bulk_copy method
Cursor (cursor.py):
- Parses connection string
- Creates BulkCopyWrapper with parameters
- Manages the bulk copy lifecycle
BulkCopyWrapper (Rust/PyO3):
- Creates and manages mssql_core_tds connection internally
- Checks if bulk_copy is implemented
- Delegates to mssql_core_tds
DdbcConnection (mssql_core_tds):
- Python bindings to TDS protocol
- Handles actual database communication
- SQL Server: Processes the bulk insert
Sequence Diagram:
Key changes:
1. Rust Bindings for Python (PyO3 Integration)
mssql_rust_bindingsusing PyO3, featuring aRustConnectionclass, utility functions, and aBulkCopyWrapperfor bulk copy operations. Includes module initialization and versioning. [1] [2]Cargo.tomlfor Rust package configuration, specifying dependencies, build profiles, and metadata.2. Bulk Copy Functionality
BulkCopyWrapperin Rust, exposing a Python class that wraps thebulk_copymethod from the underlying TDS connection, enabling high-performance bulk data transfer.main.pyto demonstrate connecting to SQL Server, creating a temp table, generating test data, and performing a bulk copy using the new Rust bindings. Includes verification and cleanup steps.3. Documentation and Build Scripts
README.mdfiles for both the core Rust bindings directory and the distributed binaries, providing build, usage, and integration instructions. [1] [2]build.shfor Linux/macOS,build.batfor Windows) to streamline development and release builds using maturin. [1] [2]4. Distribution Artifacts
BCPRustLibdirectory for easy installation and integration.These changes lay the groundwork for integrating Rust-based performance and safety into SQL Server Python workflows, with clear paths for future expansion and migration from C++ bindings.