-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking change with string.IndexOf(string) from .NET Core 3.0 -> .NET 5.0 #43736
Comments
This is by design as in .NET 5.0 we have switched using ICU instead of NLS. You can look at https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu for more details. You have the option to use the config switch |
forgot to say, if you want actual.IndexOf(expected, StringComparison.Ordinal) |
Yeah, if you run this code in Unix targeting
and as @tarekgh with
|
I think this is failing because the mix of |
Edit: Above was a misunderstanding. |
@GrabYourPitchforks can we update the issue title then? As this is technical a breaking change, but this happens on Windows and Unix... right? |
I pinged Jimmy offline for clarification. It's possible I misunderstood his original issue report. 280-char forums aren't always efficient at communicating bugs clearly. ;) |
Just to clarify, |
I received clarification on Twitter. The app isn't calling |
@GrabYourPitchforks could you share the link https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu on your twitter replies and mention we have a config switch to go back to old behavior? |
Thanks, @GrabYourPitchforks... based on this, closing it as by design. |
To add more here, if you want to get the old behavior without switching back to NLS, you can do CultureInfo.CurrentCulture.CompareInfo.IndexOf(actual, expected, CompareOptions.IgnoreSymbols) or actual.IndexOf(expected, StringComparison.Ordinal) instead of actual.IndexOf(expected) and you should get the desired behavior. |
I can't see anything about Was it really a planned change? |
@ForNeVeR it will be hard to list every single difference between ICU and NLS. the document is talking about the main change of switching to ICU. As I pointed before earlier, it is not right to compare the results of
Yes switching to ICU is intentional change for different reasons. Windows currently promoting using ICU over NLS. ICU is the future anyway. Also, ICU will give the opportunity to have consistent behaviors across Windows/Linux/OSX or any supported platforms. Using ICU will give the opportunity to the apps to customize the globalization behavior if they desired to. As the document indicated, you still have the option to switch back to old behavior if you want to. |
Ouch, the referenced doc says that ICU/NLS behavior on Windows might silently switch based on the |
The ICU now is published as NuGet package. The apps can use such packages to have the self contained app ensure having ICU. look at the app-local section in the doc. In short, the app has full control on the behavior want to get. |
@tarekgh, I agree that the different results between The problem is clearly This isn't something I would expect from any locale/NLS/ICU-related changes; in fact, I couldn't think of any other programming language/runtime behaving like that. Here's a simplified test case, broken (I mean, giving me totally unexpected result) on .NET 5 RC 2: var actual = "\n\r\nTest";
var expected = "\nTest";
Console.WriteLine($"actual.IndexOf(expected): {actual.IndexOf(expected)}"); // => -1 Should it really work like that? Also, why? What does it trying to do actually?
I'm sorry, but I don't believe this was a planned change, so I'd like to emphasize: I couldn't imagine anyone planning such a change. Like, folks from .NET team sat together and discussed:
And nobody complained? Not a chance! This doesn't look like a planned or expected change, and instead looks like a very serious bug, a big compatibility blocker. Because of that, new and ported .NET applications won't properly work on the new runtime, because they won't be able to find substrings inside of string! Why does ICU care about the line endings anyway? Do some locales have their own locale-specific line endings? P.S. Yes, you could argue that one should really always call some variant of culture-independent Also, I think we all understand that, despite |
@petarrepac Don't get me wrong, I understand that. But as has been pointed out multiple times in this thread:
Given the last two points it is probably reasonable to assume this affects a fairly small percentage of projects. 100% it is fair to ask about this, but the people who write comments like the one I quoted are often just assuming that no consideration was put in and written before trying to understand the bigger picture behind the change. |
Hello, all. We wanted to give a brief summary of the actions that we took when this issue was open and at the end why we decided to keep the default on Windows 10 May 2019 Update or later to be ICU for .NET 5.0. When the issue was opened, we started some internal discussions about the potential impact and pain that this could've had with our customers given the inconsistency in between
From these 2040 packages that had an at-risk callsite, only 539 support some version of .NET Standard or .NET Core, so that means that only 0.54% packages listed in NuGet.org are likely to being exposed to the break. We looked at packages in the list of 539 potentially-affected package ids to get an idea of the actual impact on those. We looked at the top 70 (by download counts), 20 didn't expose the pattern in the latest version, from the ones that exposed the pattern we could only look at 32 that had a permisive license:
So this means, that from the top 70 packages by download only 18% were potentially impacted. These percentages are stacked and not against the total number of packages on NuGet.org which is 229,536. So if we used the total number of packages and total number of downloads in NuGet.org, we would be looking at 539 potentially affected packages out of 229,536 which is 0.24%. And while it's great for us to analyze libraries, nuget represents only a small fraction of the C# code out there. And even if someone owns the code, a) bugs may not be easy to track down, and b) they may not actually have the source anymore. However this was a good source of data to conclude that, while this could be a very notable change in behavior, it is a break that already happened on Unix when reading inputs that might contain Windows Line Endings (which might not be that common). In .NET Core and .NET 5+, we're striving towards consistency across OSs and given the impact of this change, it felt like the right thing to do. We do care about compatibility and hence are providing a compat runtime switch for people to be able to go back to legacy behavior. Also, a conclusion from the packages that we could inspect, given that the behavior is different on Unix already, we did see a lot of defensive programming against this issue, to mitigate potential breaks across OSs. To add to this, globalization can change any time as we're a thin wrapper across the OS, so it felt like the right thing at the moment to just be the same wrapper on all OSs that we support. As part of this we improved our documentation with practical examples, roslyn analyzer rules and how affected code can mitigate this.
Thank you for all your valuable feedback since it always takes us to a better place and we will try to keep improving this experience for .NET 6, as discussed on: #43956 Since we understand the pain that this may cause because of the differences in between line endings across Unix and Windows, we're keeping this issue open and we will investigate a possible way to mitigate the |
There is also a difference with char and string: Console.WriteLine("com/test/test/test/a/b/ʹ$ʹ".IndexOf("$"));
Console.WriteLine("com/test/test/test/a/b/ʹ$ʹ".IndexOf('$'));
|
@mattleibow when using character search, it perform ordinal search. The doc https://docs.microsoft.com/en-us/dotnet/api/system.string.indexof?view=net-5.0#System_String_IndexOf_System_Char_ which is stating Console.WriteLine("com/test/test/test/a/b/ʹ$ʹ".IndexOf("$", StringComparison.Ordinal)); |
Found it, it's rule CA1310. Our docs are wrong for https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu#use-nls-instead-of-icu and don't mention this specific variation. I'll update those docs. |
@xanatos I believe @mattleibow report was regarding when using the <RuntimeHostConfigurationOption Include="System.Globalization.UseNls" Value="true" /> |
Description
I'm extending a package to support .NET 5.0 and ran into a breaking change. Given the console application:
I get different results based on the runtime from .NET Core 3.0 -> .NET 5.0:
.NET Core 3.0:
.NET 5.0:
Configuration
Windows 10 Pro Build 19041 x64
.NET Core 3.1.9
.NET 5.0.0-rc.2.20475.5
Regression?
Yes, it worked through .NET Core 3.1.9
The text was updated successfully, but these errors were encountered: