Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive in CA1307 for string.IndexOf(char) #2581

Closed
KrisVandermotten opened this issue Jun 14, 2019 · 26 comments · Fixed by #4035 or #4043
Closed

False positive in CA1307 for string.IndexOf(char) #2581

KrisVandermotten opened this issue Jun 14, 2019 · 26 comments · Fixed by #4035 or #4043

Comments

@KrisVandermotten
Copy link

Analyzer package

Microsoft.CodeAnalysis.FxCopAnalyzers

Package Version

v2.9.3

Diagnostic ID

CA1307: Specify StringComparison

Repro steps

Compile the following code against .NET Core 2.2:

class C
{
    int M(string s) => s.IndexOf('?');
}

Expected behavior

No warning.

Actual behavior

Warning: The behavior of 'string.IndexOf(char)' could vary based on the current user's locale settings. Replace this call in 'C.M(string)' with a call to 'string.IndexOf(char, System.StringComparison)'.

The premisse of the diagnostic, that string.indexOf(char) could vary is incorrect. The method documentation clearly states:

This method performs an ordinal (culture-insensitive) search, where a character is considered equivalent to another character only if their Unicode scalar values are the same.

Note also that replacing the s.IndexOf('?') by s.IndexOf('?', StringComparison.Ordinal) is undesirable, as the method is less efficient.

@KrisVandermotten KrisVandermotten changed the title False positive for CA1307 in .NET Core False positive in CA1307 for string.IndexOf(char) Jun 14, 2019
@KrisVandermotten
Copy link
Author

The same is true for string.Contains(char).

@oahrens
Copy link

oahrens commented Jun 24, 2019

#2394 falls in the same category: CA1305 is throwing warnings on Boolean.ToString and Char.ToString.

@mavasani
Copy link
Contributor

The premise of the rule is to pass in an explicit value for StringComparison argument when an overload with this parameter exists, ensuring that the user chose a specific string comparison behavior on intent, but not the default one by accident. Users should add suppressions where they prefer suppression to document the intent rather then an explicit argument.

@KrisVandermotten
Copy link
Author

The premise of the rule is to pass in an explicit value for StringComparison argument when an overload with this parameter exists

That may be true, but it is not what I said. The diagnostic reports "The behavior of 'string.IndexOf(char)' could vary based on the current user's locale settings." That is simply not true. Reporting incorrect information in a diagnostic message seems undesirable.

@stephentoub
Copy link
Member

From @mavasani in dotnet/runtime#30740 (comment):

seems reasonable to update the defaults for the rule, while also making the rule configurable so end users can specify additional method names/signatures which should be validated.

@GrabYourPitchforks
Copy link
Member

Here's my proposal for how to modify the rule to increase the signal to noise ratio.

For the methods string.StartsWith, string.EndsWith, string.IndexOf, and string.LastIndexOf specifically:

  • If the first parameter (after this) to the method is a string, and if the target overload doesn't accept a CultureInfo or a StringComparison parameter, produce warning CA1307 and tell the developer that they should call an overload that takes a CultureInfo or a StringComparison.

  • If the first parameter (after this) to the method is a char (or anything other than string), or if they're calling an overload which accepts a CultureInfo or a StringComparison parameter, then CA1307 should not trigger.

For the methods string.ToUpper and string.ToLower specifically:

  • If calling the parameterless overload, produce warning CA1307 and tell the developer that they should instead call ToUpperInvariant or ToLowerInvariant or that they should call a ToUpper or ToLower overload which takes a CultureInfo.

For the method string.Compare specifically:

  • If calling an overload that does not take a CultureInfo or StringComparison parameter, produce warning CA1307 and tell the developer that they should call an overload which accepts a CultureInfo or StringComparison parameter.

  • If calling an overload that takes a CultureInfo or StringComparison parameter, warning CA1307 should not trigger.

For the method string.CompareTo specifically:

  • Produce warning CA1307 and tell the developer to call string.Compare instead, calling an overload which accepts a CultureInfo or StringComparison parameter.

No method on string other than those methods specifically called out above should produce warning CA1307.

The intent of this proposal is to identify places in the string API surface where the current culture is implicitly captured and used as part of the operation. It is not always intuitive for the developer to know which string APIs are culture-sensitive by default and which APIs are ordinal by default. These warnings also make it easier to port existing string-based code to ReadOnlySpan<char>-based patterns, as all of the span-based APIs operate under an ordinal comparison by default unless an explicit comparison has been passed to the API.

@KrisVandermotten
Copy link
Author

@GrabYourPitchforks I like your proposal a lot. Seems a lot better than the current behavior.

I just want to point out that string has more methods on .NET Core than on .NET Framework. For example, string.Contains has four overloads on .NET Core, but only one on .NET Framework.

I think that string.Contains should be treated the same way as string.StartsWith, string.EndsWith, string.IndexOf, and string.LastIndexOf, assuming that the analyzer verifies the existence of an overload on the target framework before recommending to call it.

@GrabYourPitchforks
Copy link
Member

Ho @KrisVandermotten, thanks for the feedback! In my proposal I had suggested not triggering on string.Contains at all. The reason for this is that string.Contains defaults to an ordinal search, which I believe is what most developers expect. The methods I called out above are the only methods that default to a linguistic search.

@KrisVandermotten
Copy link
Author

In my proposal I had suggested not triggering on string.Contains at all.

That makes a lot of sense. Thanks for explaining.

@jnm2
Copy link
Contributor

jnm2 commented Mar 14, 2020

I was going to suggest adding a separate CA number that warns when not being explicit even for String methods where the default is ordinal, but I'd even be turning that off if I wasn't multitargeting. I expected the analyzer to help me navigate which methods have which default, by not warning on the ones that default to ordinal.

@GrabYourPitchforks
Copy link
Member

I expected the analyzer to help me navigate which methods have which default, by not warning on the ones that default to ordinal.

That's an interesting idea, but I don't know if we would turn such an analyzer on by default. Experience has told us that the reason people trip up is that they expect the default behavior to be an exact match, since that's what most other languages also do by default.

@mavasani
Copy link
Contributor

It seems reasonable to switch the rule defaults as per #2581 (comment), and then have a user option to be able to switch to the current behavior (always requiring an explicit StringComparison argument where an overload that takes one is available).

@mavasani
Copy link
Contributor

Also tagging @Evangelink in case he is interested in picking this one...

@jnm2
Copy link
Contributor

jnm2 commented May 14, 2020

@GrabYourPitchforks Sorry about the confusing wording. What I meant was that I'd prefer to be explicit for non-ordinal comparisons and to be implicit where possible for ordinal comparisons, since ordinal is what everyone expects the default to be.

@mavasani
Copy link
Contributor

mavasani commented Aug 6, 2020

@Evangelink Let me know if you'd like to pick this up, otherwise I can take it up early next week.

@KrisVandermotten
Copy link
Author

Is there any progress on resolving this bug?

It' been well over a year now since it was reported. It would be great to see it fixed, if possible no later than in the .NET 5 wave.

Again, this is a bug report, not a feature request. A diagnostic message that claims that "The behavior of 'string.IndexOf(char)' could vary based on the current user's locale settings." is simply not correct.

Correct behavior has been described six months ago.

@manfred-brands
Copy link
Contributor

Would we then also need a new rule where the previous fixes for CA1307 are marked as unnecessary or more explicitly: any place for StringComparison.Ordinal is passed in when that is the default for the method.
We added StringComparison.Ordinal to string.Contains(char) because of CA1307 but as this thread say it no longer should be, can we get a rule which says to remove this?

mavasani added a commit to mavasani/roslyn-analyzers that referenced this issue Aug 18, 2020
Fixes dotnet#2581
When we know that a comparison is between a string type receiver and a non-string type parameter, do not report CA1307
@mavasani
Copy link
Contributor

@manfred-brands Feel free to file a separate feature request for it.

@KrisVandermotten
Copy link
Author

KrisVandermotten commented Aug 18, 2020

@mavasani, you said

It seems reasonable to switch the rule defaults as per #2581 (comment)

Unless I'm mistaken, that is not what happened in #4035. Did you change your mind, and if so, why?

Just to be clear, if I understand #4035 correctly, the analyzer will still report that "The behavior of 'string.Contains(string)' could vary based on the current user's locale settings." This is not true, and producing such a message is a bug.

@GrabYourPitchforks provided a good description of what the behavior of the rule should be.

@mavasani
Copy link
Contributor

@KrisVandermotten The rule is now much more conservative - for any invocations to string APIs, other then the known signatures (Compare and CompareTo), analyzer bails out if the target overload's first parameter type is not string type or has no parameters. It should handle all false positives - can you point out any specific CA1307 false positives after the fix?

@KrisVandermotten
Copy link
Author

can you point out any specific CA1307 false positives after the fix?

The behavior of 'string.Contains(string)' could vary based on the current user's locale settings.

@mavasani
Copy link
Contributor

I don't believe that is a false positive or noise - I agree that string methods that do not take a string parameter the rule was generating some noise, but for places where we are doing string-to-string checks or comparisons, we absolutely want to recommend use of StringComparison.

@mavasani
Copy link
Contributor

I am actually reading through https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#specifying-string-comparisons-explicitly and it seems that section explicitly suggested sticking to the prior implementation of CA1307, and never rely on the default StringComparison of any string API. It confused me to find out why string.StartsWith(string), string.EndsWith(string) and string.Contains(string) would not use the same default StringComparison.

The changes recommended on this issue seem to indicate the analyzer mostly hardcodes the methods where culture dependent StringComparison is the default (basically the list here: https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#string-comparisons-that-use-the-current-culture), but the doc then states: In any case, we recommend that you call an overload that has a StringComparison parameter to make the intent of the method call clear.

So, my core question is:

  1. Do we want just re-purpose CA1307 to fire only for cases where ordinal compare is not the default OR
  2. Do we want 2 separate rules here: one rule which does not rely on default string comparison value used by individual APIs (basically the prior implementation of CA1307) + one rule with the more restrictive analysis for only known methods that do not use ordinal comparison by default.

@mavasani mavasani reopened this Aug 18, 2020
@manfred-brands
Copy link
Contributor

The downside of relying on a default instead of explicitly specifying this is that there is no consistent default, it is different even for methods in the same class. As it says in your referenced article: It is difficult to remember which method uses which default value, and easy to confuse the overloads.

If for whatever reason the default gets changed in a future version it would break that code that doesn't specify the default, but not the code that has explicitly stated the comparison making the latter future proof.

My vote would be for keeping CA1307 as it was and fix the code fix for CA2249 (#3839)

@KrisVandermotten
Copy link
Author

KrisVandermotten commented Aug 19, 2020

My main problem with the current rule is not what it recommends. I see value in a rule that helps people write readable code.

My main problem with this rule is that it provides incorrect information in the diagnostic message. I do think that reporting "The behavior of 'string.Contains(string)' could vary based on the current user's locale settings." is a bug, but it can be fixed in two different ways:

  • don't report a diagnostic for that case, OR
  • change the message, in such a way that it does not make false statements.

I'm fine with the second option: just have a message that says that it is better for the readability of the code to be explicit about the comparison being used.

Just to be clear: the first option is what I would prefer.

Put differently: we have disabled CA1307 in our entire codebase. If it would be changed as proposed by @GrabYourPitchforks, we would reenable it, because then it becomes a rule that helps prevent bugs. It's about correctness then, rather then about maintainability. If only the message is changed, I'm not so sure.

mavasani added a commit to mavasani/roslyn-analyzers that referenced this issue Aug 19, 2020
Fixes dotnet#2581, again!

1. CA1310: Correctness rule that only flags string compare methods that are known to use culture specific string comparison by default
2. CA1307: Maintainability/Readability rule that flags the remainder methods that have an overload with an additional StringComparison parameter at the end. This rule does not care about the defaault string comparison used by the API.
@jmarolf jmarolf modified the milestones: vNext, .NET 5 RC1 Sep 11, 2020
@jnm2
Copy link
Contributor

jnm2 commented Nov 27, 2020

https://docs.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1310 is very pleasant to use. Thanks very much, everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment