-
Notifications
You must be signed in to change notification settings - Fork 310
Use UTF8 instance that doesn't emit BOM #3399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces uses of Encoding.UTF8
(which emits a BOM) with a dedicated UTF8Encoding(false)
instance that suppresses the BOM, and adds a manual test to verify bulk copy behavior with UTF8 data.
- Introduced
s_utf8EncodingWithoutBom
and updated allEncoding.UTF8
references in TdsParser (netfx & netcore) - Added
TestBulkCopyWithUTF8.cs
manual test for streaming bulk copy without BOM - Updated test project and solution to include the new test file
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs | Added BOM-less UTF8 static field and replaced Encoding.UTF8 usages |
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs | Same BOM-less UTF8 changes for .NET Core |
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs | New manual test to verify UTF8 bulk copy without BOM |
src/Microsoft.Data.SqlClient/tests/ManualTests/Microsoft.Data.SqlClient.ManualTesting.Tests.csproj | Included the new test file |
src/Microsoft.Data.SqlClient.sln | Added Common project entries |
Comments suppressed due to low confidence (2)
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs:12
- [nitpick] The class name casing (
TestBulkCopyWithUtf8
) doesn’t match the file name (TestBulkCopyWithUTF8.cs
). Consider renaming the class toTestBulkCopyWithUTF8
for consistency.
public class TestBulkCopyWithUtf8
src/Microsoft.Data.SqlClient.sln:307
- Entries for a
Common
project were added to the solution, but the PR description and related changes don't reference this project. If unintentional, consider removing it or splitting it into a separate PR to keep scope focused.
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Common", "Microsoft.Data.SqlClient\tests\Common\Common.csproj", "{67128EC0-30F5-6A98-448B-55F88A1DE707}"
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks good.
Left some comments on the test approach
It might be a good idea to make sure that we have test coverage for scenarios where v6.1 tries to read UTF8 data which starts with a BOM, to handle cases where v6.1+ has to read data which was copied by v6.0... |
@edwardneal The scenario you're describing is not something we intend to support. Adding a BOM (Byte Order Mark) can make the data unusable in many cases. We're operating under the assumption that no one has relied on this buggy behavior for bulk copy operations. As such, ensuring interoperability between the old and corrected behavior is not within the scope of this fix. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3399 +/- ##
==========================================
- Coverage 68.04% 65.57% -2.48%
==========================================
Files 301 301
Lines 65625 65631 +6
==========================================
- Hits 44654 43036 -1618
- Misses 20971 22595 +1624
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion to harden the tests.
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix tests to cleanup tables as well.
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
c2021f7
c2021f7
to
03413c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes requested.
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So close! :)
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/TestBulkCopyWithUTF8.cs
Outdated
Show resolved
Hide resolved
e67b68c
to
f32f2bf
Compare
Description
Fixes #3397. Used UTF8Encoding(false) instance that doesn't emit BOM instead of
using System.Encoding.UTF8 which adds BOM while transmitting the data.
Added test case based on the available repro which fails without the fix.
Issues
#3397
Testing
Added sync and async SqlBulkCopy testcase that does bulkcopy from UTF8 source table to another table with UTF8 collation
with MARS = true/false, EnableStraming = true/false.
Without the fix, when the destination table is read after bulkcopy, contains BOM in the data.