Skip to content

Conversation

@edwardneal
Copy link
Contributor

Description

This fixes an older issue requesting support for using SQL Graph column aliases as destinations when mapping columns in SqlBulkCopy.

SQL Server synthesises four pseudo-columns in SQL Graph: $edge_id, $from_id, $to_id, $node_id. These appear in node and edge tables. It's possible to query these columns, but internally they map to dynamically-named physical columns. Bulk copies only accept physical column names, so this PR introduces the concept of column aliases to bridge that gap when importing data in a similar way to SQL Server (running statements against the destination table manually.)

Column aliases are designed to be additive: if a physical column named $to_id already exists in a SQL Graph table, the physical column will always be used as the destination. For backwards compatibility purposes, if another source field has already been mapped to a column alias' physical column name, other source fields which might be mapped to the column alias aren't re-mapped; SqlBulkCopy will assume that the user has already done the work needed to get the physical column names and won't trample it.

The end result should be as defined in SqlGraphTables.WriteToServer_CopyToAliasedColumnName_Succeeds: mapping from a field in a DataTable/etc. to $to_id or $from_id will "just work".

The only difference between the way that SQL Server handles these column aliases and the way we handle them is in an edge case: a user creates a SQL Graph table with a column which has an identical name to the column alias. In that specific case, a user can access the column alias by querying $to_id and the physical column by querying [$to_id]. On the other hand, SqlBulkCopy will handle this case by always mapping the data to the physical column. This difference is so that people presently mapping fields to $to_id aren't forced to change their code so that it maps to [$to_id]. I don't think this is a common use case though - I'd struggle to imagine a situation where anyone needs to do this.

Issues

Fixes #123.

Testing

All existing tests pass. I've added two more which check the use cases I can think of:

  • Can we copy to a column name using its alias?
  • If a table has a physical column with the same name as an alias, does the physical column continue to take priority?

Aliases are: $to_id, $from_id, $node_id, $edge_id
@edwardneal edwardneal requested a review from a team as a code owner October 9, 2025 14:14
@paulmedynski paulmedynski added Enhancement 💡 Issues that are feature requests for the drivers we maintain. Public API 🆕 Issues/PRs that introduce new APIs to the driver. labels Oct 10, 2025
@paulmedynski
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@apoorvdeshmukh apoorvdeshmukh self-assigned this Oct 14, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements support for SQL Graph column aliases in SqlBulkCopy, allowing users to map data to SQL Server's pseudo-columns ($edge_id, $from_id, $to_id, $node_id) which are synthesized in SQL Graph tables but map to dynamically-named physical columns internally.

  • Adds column alias resolution logic that maps virtual column names to their physical counterparts
  • Implements backward compatibility by prioritizing physical columns when they have the same name as aliases
  • Updates the initial query to retrieve column alias mappings from SQL Server metadata

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
SqlGraphTables.cs Adds comprehensive test coverage for column alias functionality and edge cases
CopyAllFromReader.cs Updates expected statistics values to account for additional database queries
SqlBulkCopyColumnMapping.cs Introduces MappedDestinationColumn property to handle alias-to-physical column mapping
SqlBulkCopy.cs Implements core column alias resolution logic and updates metadata query

{
nodeCopy.DestinationTableName = destinationGraphTable;
nodeCopy.ColumnMappings.Add("Name", "Name");
nodeCopy.ColumnMappings.Add("Name", "$node_id");
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the same source column 'Name' to two different destination columns will cause the second mapping to overwrite the first. This test appears to be verifying that physical columns take precedence over aliases, but the mapping logic may not work as expected with duplicate source column mappings.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation is supported; the second mapping will not overwrite the first. Instead, it will map the same source field to two destination fields.

Comment on lines +623 to +625
bool canonicalNameExists = unmatchedColumns.Contains(canonical)
// The destination columns might be escaped. If so, search for those instead
|| unmatchedColumns.Contains(SqlServerEscapeHelper.EscapeIdentifier(canonical));
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions escaping but the logic checks if escaped canonical names exist in unmatchedColumns. However, unmatchedColumns is populated from DestinationColumn which may or may not be escaped. Consider clarifying the logic or adding a more comprehensive check for both escaped and unescaped variants.

Suggested change
bool canonicalNameExists = unmatchedColumns.Contains(canonical)
// The destination columns might be escaped. If so, search for those instead
|| unmatchedColumns.Contains(SqlServerEscapeHelper.EscapeIdentifier(canonical));
// Check for both escaped and unescaped variants using the Comparer for consistency
bool canonicalNameExists = false;
string escapedCanonical = SqlServerEscapeHelper.EscapeIdentifier(canonical);
foreach (string col in unmatchedColumns)
{
if (unmatchedColumns.Comparer.Equals(col, canonical) ||
unmatchedColumns.Comparer.Equals(col, escapedCanonical))
{
canonicalNameExists = true;
break;
}
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replacement code snippet has identical functionality, although calling Contains may be slightly faster. In both the original source code and the replacement, we check for unescaped, then for escaped column names.

// this local column (by ordinal or name)?
if (localColumn._destinationColumnOrdinal != metadata.ordinal
&& UnquotedName(localColumn._destinationColumnName) != metadata.column)
&& UnquotedName(localColumn.MappedDestinationColumn) != metadata.column)
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using MappedDestinationColumn here could cause issues if _mappedDestinationColumn is null and falls back to DestinationColumn, but DestinationColumn was already checked for mapping. This could lead to inconsistent behavior when aliases are involved.

Suggested change
&& UnquotedName(localColumn.MappedDestinationColumn) != metadata.column)
&& (localColumn._mappedDestinationColumn == null || UnquotedName(localColumn._mappedDestinationColumn) != metadata.column))

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping two source columns to the same destination column is already an invalid use case.

This specific case also can't happen as described by Copilot - if a physical column with the same name as an alias exists, we don't return the alias from the SQL query to attempt mapping.

@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.19%. Comparing base (b555adf) to head (e00077d).
⚠️ Report is 6 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (b555adf) and HEAD (e00077d). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (b555adf) HEAD (e00077d)
addons 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3677       +/-   ##
===========================================
- Coverage   90.82%   70.19%   -20.64%     
===========================================
  Files           6      268      +262     
  Lines         316    45291    +44975     
===========================================
+ Hits          287    31791    +31504     
- Misses         29    13500    +13471     
Flag Coverage Δ
addons ?
netcore 70.99% <100.00%> (?)
netfx 69.98% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cheenamalhotra cheenamalhotra removed the Public API 🆕 Issues/PRs that introduce new APIs to the driver. label Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement 💡 Issues that are feature requests for the drivers we maintain.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL Bulk Copy does not support SQL Graph edge pseudo-column names

4 participants