Skip to content

Use UTF8 instance that doesn't emit BOM #3399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 10, 2025
Merged

Use UTF8 instance that doesn't emit BOM #3399

merged 8 commits into from
Jun 10, 2025

Conversation

apoorvdeshmukh
Copy link
Contributor

@apoorvdeshmukh apoorvdeshmukh commented Jun 6, 2025

Description

Fixes #3397. Used UTF8Encoding(false) instance that doesn't emit BOM instead of
using System.Encoding.UTF8 which adds BOM while transmitting the data.
Added test case based on the available repro which fails without the fix.

Issues

#3397

Testing

Added sync and async SqlBulkCopy testcase that does bulkcopy from UTF8 source table to another table with UTF8 collation
with MARS = true/false, EnableStraming = true/false.
Without the fix, when the destination table is read after bulkcopy, contains BOM in the data.

@Copilot Copilot AI review requested due to automatic review settings June 6, 2025 20:20
@apoorvdeshmukh apoorvdeshmukh requested a review from a team as a code owner June 6, 2025 20:20
@apoorvdeshmukh apoorvdeshmukh added this to the 6.1-preview2 milestone Jun 6, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces uses of Encoding.UTF8 (which emits a BOM) with a dedicated UTF8Encoding(false) instance that suppresses the BOM, and adds a manual test to verify bulk copy behavior with UTF8 data.

  • Introduced s_utf8EncodingWithoutBom and updated all Encoding.UTF8 references in TdsParser (netfx & netcore)
  • Added TestBulkCopyWithUTF8.cs manual test for streaming bulk copy without BOM
  • Updated test project and solution to include the new test file

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs Added BOM-less UTF8 static field and replaced Encoding.UTF8 usages
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs Same BOM-less UTF8 changes for .NET Core
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs New manual test to verify UTF8 bulk copy without BOM
src/Microsoft.Data.SqlClient/tests/ManualTests/Microsoft.Data.SqlClient.ManualTesting.Tests.csproj Included the new test file
src/Microsoft.Data.SqlClient.sln Added Common project entries
Comments suppressed due to low confidence (2)

src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs:12

  • [nitpick] The class name casing (TestBulkCopyWithUtf8) doesn’t match the file name (TestBulkCopyWithUTF8.cs). Consider renaming the class to TestBulkCopyWithUTF8 for consistency.
public class TestBulkCopyWithUtf8

src/Microsoft.Data.SqlClient.sln:307

  • Entries for a Common project were added to the solution, but the PR description and related changes don't reference this project. If unintentional, consider removing it or splitting it into a separate PR to keep scope focused.
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Common", "Microsoft.Data.SqlClient\tests\Common\Common.csproj", "{67128EC0-30F5-6A98-448B-55F88A1DE707}"

Copy link
Contributor

@saurabh500 saurabh500 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good.
Left some comments on the test approach

@edwardneal
Copy link
Contributor

It might be a good idea to make sure that we have test coverage for scenarios where v6.1 tries to read UTF8 data which starts with a BOM, to handle cases where v6.1+ has to read data which was copied by v6.0...

@saurabh500
Copy link
Contributor

saurabh500 commented Jun 6, 2025

@edwardneal The scenario you're describing is not something we intend to support. Adding a BOM (Byte Order Mark) can make the data unusable in many cases.

We're operating under the assumption that no one has relied on this buggy behavior for bulk copy operations. As such, ensuring interoperability between the old and corrected behavior is not within the scope of this fix.

@saurabh500 saurabh500 self-requested a review June 7, 2025 06:52
@apoorvdeshmukh apoorvdeshmukh requested a review from a team June 7, 2025 12:39
saurabh500
saurabh500 previously approved these changes Jun 7, 2025
Copy link

codecov bot commented Jun 8, 2025

Codecov Report

Attention: Patch coverage is 9.09091% with 10 lines in your changes missing coverage. Please review.

Project coverage is 65.57%. Comparing base (0444198) to head (41cea80).

Files with missing lines Patch % Lines
.../netcore/src/Microsoft/Data/SqlClient/TdsParser.cs 0.00% 5 Missing ⚠️
...nt/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3399      +/-   ##
==========================================
- Coverage   68.04%   65.57%   -2.48%     
==========================================
  Files         301      301              
  Lines       65625    65631       +6     
==========================================
- Hits        44654    43036    -1618     
- Misses      20971    22595    +1624     
Flag Coverage Δ
addons 92.58% <ø> (ø)
netcore 68.73% <16.66%> (-3.67%) ⬇️
netfx 66.96% <16.66%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@apoorvdeshmukh apoorvdeshmukh added the Triage Needed 🆕 For new issues, not triaged yet. label Jun 8, 2025
Copy link
Contributor

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion to harden the tests.

paulmedynski
paulmedynski previously approved these changes Jun 9, 2025
saurabh500
saurabh500 previously approved these changes Jun 9, 2025
@apoorvdeshmukh apoorvdeshmukh added Bug! 🐛 Issues that are bugs in the drivers we maintain. and removed Triage Needed 🆕 For new issues, not triaged yet. labels Jun 9, 2025
Copy link
Member

@cheenamalhotra cheenamalhotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix tests to cleanup tables as well.

@apoorvdeshmukh apoorvdeshmukh dismissed stale reviews from saurabh500 and paulmedynski via c2021f7 June 10, 2025 05:53
@apoorvdeshmukh apoorvdeshmukh requested review from cheenamalhotra and a team June 10, 2025 05:58
@apoorvdeshmukh apoorvdeshmukh self-assigned this Jun 10, 2025
Copy link
Contributor

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes requested.

Copy link
Contributor

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So close! :)

@cheenamalhotra cheenamalhotra merged commit e649a7c into main Jun 10, 2025
248 of 250 checks passed
@cheenamalhotra cheenamalhotra deleted the dev/ad/3397 branch June 10, 2025 16:52
apoorvdeshmukh added a commit that referenced this pull request Jun 11, 2025
This was referenced Jun 11, 2025
apoorvdeshmukh added a commit that referenced this pull request Jun 11, 2025
apoorvdeshmukh added a commit that referenced this pull request Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug! 🐛 Issues that are bugs in the drivers we maintain.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SqlBulkCopy streaming ends up prefixing BOM to UTF8 source column with VARCHAR(MAX) type
6 participants