-
Couldn't load subscription status.
- Fork 316
Feature | Support SQL Graph column aliases in SqlBulkCopy #3677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Aliases are: $to_id, $from_id, $node_id, $edge_id
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements support for SQL Graph column aliases in SqlBulkCopy, allowing users to map data to SQL Server's pseudo-columns ($edge_id, $from_id, $to_id, $node_id) which are synthesized in SQL Graph tables but map to dynamically-named physical columns internally.
- Adds column alias resolution logic that maps virtual column names to their physical counterparts
- Implements backward compatibility by prioritizing physical columns when they have the same name as aliases
- Updates the initial query to retrieve column alias mappings from SQL Server metadata
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| SqlGraphTables.cs | Adds comprehensive test coverage for column alias functionality and edge cases |
| CopyAllFromReader.cs | Updates expected statistics values to account for additional database queries |
| SqlBulkCopyColumnMapping.cs | Introduces MappedDestinationColumn property to handle alias-to-physical column mapping |
| SqlBulkCopy.cs | Implements core column alias resolution logic and updates metadata query |
| { | ||
| nodeCopy.DestinationTableName = destinationGraphTable; | ||
| nodeCopy.ColumnMappings.Add("Name", "Name"); | ||
| nodeCopy.ColumnMappings.Add("Name", "$node_id"); |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the same source column 'Name' to two different destination columns will cause the second mapping to overwrite the first. This test appears to be verifying that physical columns take precedence over aliases, but the mapping logic may not work as expected with duplicate source column mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operation is supported; the second mapping will not overwrite the first. Instead, it will map the same source field to two destination fields.
| bool canonicalNameExists = unmatchedColumns.Contains(canonical) | ||
| // The destination columns might be escaped. If so, search for those instead | ||
| || unmatchedColumns.Contains(SqlServerEscapeHelper.EscapeIdentifier(canonical)); |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions escaping but the logic checks if escaped canonical names exist in unmatchedColumns. However, unmatchedColumns is populated from DestinationColumn which may or may not be escaped. Consider clarifying the logic or adding a more comprehensive check for both escaped and unescaped variants.
| bool canonicalNameExists = unmatchedColumns.Contains(canonical) | |
| // The destination columns might be escaped. If so, search for those instead | |
| || unmatchedColumns.Contains(SqlServerEscapeHelper.EscapeIdentifier(canonical)); | |
| // Check for both escaped and unescaped variants using the Comparer for consistency | |
| bool canonicalNameExists = false; | |
| string escapedCanonical = SqlServerEscapeHelper.EscapeIdentifier(canonical); | |
| foreach (string col in unmatchedColumns) | |
| { | |
| if (unmatchedColumns.Comparer.Equals(col, canonical) || | |
| unmatchedColumns.Comparer.Equals(col, escapedCanonical)) | |
| { | |
| canonicalNameExists = true; | |
| break; | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replacement code snippet has identical functionality, although calling Contains may be slightly faster. In both the original source code and the replacement, we check for unescaped, then for escaped column names.
| // this local column (by ordinal or name)? | ||
| if (localColumn._destinationColumnOrdinal != metadata.ordinal | ||
| && UnquotedName(localColumn._destinationColumnName) != metadata.column) | ||
| && UnquotedName(localColumn.MappedDestinationColumn) != metadata.column) |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using MappedDestinationColumn here could cause issues if _mappedDestinationColumn is null and falls back to DestinationColumn, but DestinationColumn was already checked for mapping. This could lead to inconsistent behavior when aliases are involved.
| && UnquotedName(localColumn.MappedDestinationColumn) != metadata.column) | |
| && (localColumn._mappedDestinationColumn == null || UnquotedName(localColumn._mappedDestinationColumn) != metadata.column)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mapping two source columns to the same destination column is already an invalid use case.
This specific case also can't happen as described by Copilot - if a physical column with the same name as an alias exists, we don't return the alias from the SQL query to attempt mapping.
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #3677 +/- ##
===========================================
- Coverage 90.82% 70.19% -20.64%
===========================================
Files 6 268 +262
Lines 316 45291 +44975
===========================================
+ Hits 287 31791 +31504
- Misses 29 13500 +13471
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This fixes an older issue requesting support for using SQL Graph column aliases as destinations when mapping columns in
SqlBulkCopy.SQL Server synthesises four pseudo-columns in SQL Graph:
$edge_id,$from_id,$to_id,$node_id. These appear in node and edge tables. It's possible to query these columns, but internally they map to dynamically-named physical columns. Bulk copies only accept physical column names, so this PR introduces the concept of column aliases to bridge that gap when importing data in a similar way to SQL Server (running statements against the destination table manually.)Column aliases are designed to be additive: if a physical column named
$to_idalready exists in a SQL Graph table, the physical column will always be used as the destination. For backwards compatibility purposes, if another source field has already been mapped to a column alias' physical column name, other source fields which might be mapped to the column alias aren't re-mapped;SqlBulkCopywill assume that the user has already done the work needed to get the physical column names and won't trample it.The end result should be as defined in
SqlGraphTables.WriteToServer_CopyToAliasedColumnName_Succeeds: mapping from a field in a DataTable/etc. to$to_idor$from_idwill "just work".The only difference between the way that SQL Server handles these column aliases and the way we handle them is in an edge case: a user creates a SQL Graph table with a column which has an identical name to the column alias. In that specific case, a user can access the column alias by querying
$to_idand the physical column by querying[$to_id]. On the other hand,SqlBulkCopywill handle this case by always mapping the data to the physical column. This difference is so that people presently mapping fields to$to_idaren't forced to change their code so that it maps to[$to_id]. I don't think this is a common use case though - I'd struggle to imagine a situation where anyone needs to do this.Issues
Fixes #123.
Testing
All existing tests pass. I've added two more which check the use cases I can think of: