Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzer proposal: treat calls to ROS<char> op_Equality as warning #54794

Open
Tracked by #79053
GrabYourPitchforks opened this issue Jun 27, 2021 · 21 comments
Open
Tracked by #79053
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

Description

There have been a few issues filed (#10151, #54793) where developers are confused as to why two ROS<char> sequences can't be compared for content equality using operator ==, like normal string instances can be.

We should consider introducing an analyzer that detects operator == and operator != calls on Span<char> and ReadOnlySpan<char> specifically and introduces a warning, also offering to fix the call sites by rewriting this in terms of SequenceEqual.

Examples

ReadOnlySpan<char> span1;
ReadOnlySpan<char> span2;

// produces warnings
if (span1 == span2) { /* ... */ }
if (span1 != span2) { /* ... */ }

// suggested fixer replacements
if (span1.SequenceEqual(span2)) { /* ... */ }
if (!span1.SequenceEqual(span2)) { /* ... */ }

ReadOnlySpan<T> span3;
ReadOnlySpan<T> span4;

// DOES NOT produce warning
if (span3 == span4) { /* ... */ }

Discussion

This analyzer assumes that the majority use case of ROS<char> is as a substitute for string segments, so these callers would want API behaviors that match the behaviors they're used to on string. However, we shouldn't assume that all callers are treating Span<char> in this fashion; as some may really want to treat these as slices into an arbitrary data buffer and maintain the existing pointer / length equality comparisons. This may affect whether a fixer is enabled by default.

Additionally, this analyzer & fixer would only capture ROS<char> and Span<char>, not any other T. If the caller is using a generic T, they're almost certainly intending this to be a data buffer slice, not stringy data.

Open questions

Should the analyzer also trap calls to MemoryExtensions.IndexOf<char>(ReadOnlySpan<char>, ReadOnlySpan<char>) and related APIs? These have different behaviors than their corresponding string methods, as shown in the below examples.

string str = GetString();
ReadOnlySpan<char> span = str;

int idx1 = str.IndexOf("hello"); // linguistic, current-culture
int idx2 = span.IndexOf("hello"); // ordinal

Related to this last point: #43956

@GrabYourPitchforks GrabYourPitchforks added area-System.Runtime code-analyzer Marks an issue that suggests a Roslyn analyzer labels Jun 27, 2021
@ghost
Copy link

ghost commented Jun 27, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

There have been a few issues filed (#10151, #54793) where developers are confused as to why two ROS<char> sequences can't be compared for content equality using operator ==, like normal string instances can be.

We should consider introducing an analyzer that detects operator == and operator != calls on Span<char> and ReadOnlySpan<char> specifically and introduces a warning, also offering to fix the call sites by rewriting this in terms of SequenceEqual.

Examples

ReadOnlySpan<char> span1;
ReadOnlySpan<char> span2;

// produces warnings
if (span1 == span2) { /* ... */ }
if (span1 != span2) { /* ... */ }

// suggested fixer replacements
if (span1.SequenceEqual(span2)) { /* ... */ }
if (!span1.SequenceEqual(span2)) { /* ... */ }

ReadOnlySpan<T> span3;
ReadOnlySpan<T> span4;

// DOES NOT produce warning
if (span3 == span4) { /* ... */ }

Discussion

This analyzer assumes that the majority use case of ROS<char> is as a substitute for string segments, so these callers would want API behaviors that match the behaviors they're used to on string. However, we shouldn't assume that all callers are treating Span<char> in this fashion; as some may really want to treat these as slices into an arbitrary data buffer and maintain the existing pointer / length equality comparisons. This may affect whether a fixer is enabled by default.

Additionally, this analyzer & fixer would only capture ROS<char> and Span<char>, not any other T. If the caller is using a generic T, they're almost certainly intending this to be a data buffer slice, not stringy data.

Open questions

Should the analyzer also trap calls to MemoryExtensions.IndexOf<char>(ReadOnlySpan<char>, ReadOnlySpan<char>) and related APIs? These have different behaviors than their corresponding string methods, as shown in the below examples.

string str = GetString();
ReadOnlySpan<char> span = str;

int idx1 = str.IndexOf("hello"); // linguistic, current-culture
int idx2 = span.IndexOf("hello"); // ordinal

Related to this last point: #43956

Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Runtime, code-analyzer

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jun 27, 2021
@Symbai
Copy link

Symbai commented Jun 28, 2021

Thanks, this was very confusing. Is there a reason why ReadOnlySpan == string is allowed? It sounds like this shouldn't be used so I wonder why this is possible. Maybe the analyze should treat this as error instead of a warning (it could be fixed by calling AsSpan method on the string).

@Joe4evr
Copy link
Contributor

Joe4evr commented Jun 29, 2021

@Symbai Since you repeated your question, I'll repeat my answer:

It happens to fall out from string's implicit conversion to ROS<char>, so the == operator is eligible to resolve to.

@GrabYourPitchforks
Copy link
Member Author

/cc @tarekgh for his thoughts on globalization (see Open questions at the bottom of the issue description).

@tarekgh
Copy link
Member

tarekgh commented Jun 29, 2021

Should the analyzer also trap calls to MemoryExtensions.IndexOf(ReadOnlySpan, ReadOnlySpan)

I am not seeing any reason to have the analyzer trap such calls. is there a concern that users can do something wrong here?

@GrabYourPitchforks
Copy link
Member Author

@tarekgh This issue was primarily that equality (operator ==) for string instances compares their contents, while equality for ROS<char> instances compares the memory addresses. I also pointed out that certain methods like IndexOf have different behaviors when both parameters are string vs. ROS<char>, and I wondered if this could similarly trip people up if they're expecting one behavior but they get another. It's less of an issue, but it seemed like if we did want to track this, this analyzer would be able to encompass all such scenarios as we see fit.

@tarekgh
Copy link
Member

tarekgh commented Jun 29, 2021

@GrabYourPitchforks For such specific cases, the analyzer should trap the IndexOf calls which take a string and not passing the StringComparison option. but for the methods that takes ROS, the default behavior is the desired one and better avoid flagging these calls. Of course, if we find any usage in the future suggest adding the analyzer rule, we can do it by then.

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jul 12, 2021
@tannergooding tannergooding added this to the Future milestone Jul 12, 2021
@tannergooding
Copy link
Member

This probably extends to ArraySegment<T> and Memory<T> as well.

@GrabYourPitchforks
Copy link
Member Author

If you warn on the equality operator regardless of T, I agree it would make sense to extend this to ArraySegment<T> and Memory<T>. If you warn only for T = char, I agree that catching Memory<T> would make sense, but not quite sold on catching ArraySegment<T>. I don't think it's all that common to use the latter type as a substring mechanism, especially since it doesn't even implement ToString.

@HobbsCode
Copy link

This analyzer should not trigger a warning on comparisons against 'default'.

@buyaa-n buyaa-n added the code-fixer Marks an issue that suggests a Roslyn code fixer label Feb 1, 2022
@carlossanlop
Copy link
Member

@GrabYourPitchforks the proposal sounds reasonable. Do you want to mark it as ready for review?

Pending adding to the main description:

  • If the proposal should extend to ArraySegment<T>, Memory<T>.
  • If the proposal should extend to Memory<char>.
  • Confirm if ArraySegment<char> should also be included, since you said you were not yet sold on that. @tannergooding
  • The analyzer/fixer suggested info:
    • Suggested severity: Warning
    • Suggested category: Reliability
    • Suggested milestone: Future

@carlossanlop carlossanlop added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 1, 2022
@carlossanlop
Copy link
Member

Ping @tannergooding for the ask above. And @GrabYourPitchforks let me know if this is ready for review.

@tannergooding
Copy link
Member

I don't have a strong preference on ArraySegment and it can be covered in API review as a side note/consideration.

@GrabYourPitchforks
Copy link
Member Author

Similarly, I don't have a strong preference on Memory<T> and we can discuss it in API review.

@GrabYourPitchforks GrabYourPitchforks added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 18, 2022
@terrajobst
Copy link
Member

terrajobst commented Jul 26, 2022

Video

  • We'd prefer an analyzer that warns on any T, thus effectively deprecating == and != for spans
    • We should push people to use SequenceEqual and a to-be-added ReferenceEqual extension method.
  • Severity: Warning
  • Category: Reliability?

It seems we should have a method:

namespace System;

public static class MemoryExtensions
{
    public static bool ReferenceEqual<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> other);
    public static bool ReferenceEqual<T>(this Span<T> span, ReadOnlySpan<T> other);
}

@GrabYourPitchforks please file an issue for the API.

@terrajobst terrajobst added the api-approved API was approved in API review, it can be implemented label Jul 26, 2022
@teo-tsirpanis

This comment was marked as resolved.

@jkotas
Copy link
Member

jkotas commented Jul 26, 2022

bool ReferenceEqual(this ReadOnlySpan span, ReadOnlySpan other)
where T: IEquatable!

Why does it need IEquatable constraints at all?

@terrajobst
Copy link
Member

terrajobst commented Jul 27, 2022

It doesn't. It's a copy paste bug because I started with SequenceEqual. Fixed.

@teo-tsirpanis
Copy link
Contributor

Removing the api-ready-for-review label, the API has been approved with outstanding questions moved to another issue; seems to have been kept by mistake.

@teo-tsirpanis teo-tsirpanis removed the api-ready-for-review API is ready for review, it is NOT ready for implementation label Aug 16, 2022
@buyaa-n buyaa-n added the help wanted [up-for-grabs] Good issue for external contributors label Nov 9, 2022
@buyaa-n buyaa-n modified the milestones: Future, 8.0.0 Nov 11, 2022
@buyaa-n
Copy link
Contributor

buyaa-n commented Nov 16, 2022

Estimates:

  • Analyzer: Small
  • Fixer: Small

@Neme12
Copy link

Neme12 commented Jul 1, 2023

  • We'd prefer an analyzer that warns on any T, thus effectively deprecating == and != for spans

@terrajobst @tannergooding The argument that the analyzer should work for any T because people coming from other languages might expect it to do SequenceEqual as opposed to reference equality, applies not only to ReadOnlySpan, but for arrays and all other collections as well. By that logic, the analyzer should warn for all collection types because they all default to reference equality. Span shouldn't be special in that. I don't think we should do this unless we're sure that using == is discouraged for all collection types. But I'm not sure that's a good thing to do. C# defaults to reference equality and everyone is used to that. If people changed to Span not from string but from an array, they already use equality correctly because they know that array == uses reference comparison. I don't think it provides value to warn for all Ts. And it would only make Span special in that it's the only collection where you shouldn't use == and have to do something else. I think it should only warn for char because that's the only case where people get it wrong because they're coming from string.

I'm really not keen on the idea of making Span special as the only collection where it's discouraged to use ==. The argument about people coming from other languages applies to any collection type, not just Span. In fact it applies to other collections even more than to Span because other collections are much more widespread, especially among beginners to C#. It's a completely separate concern from the case of Span. If we think that that's a good argument and it provides value to warn for all Ts, it should be done for all collections, not just Span. But since that would add a lot of warnings to people's code that is most likely correct, I would have a separate diagnostic ID for the case with Span that defaults to a warning and another one for Span of anything else and for other collections that defaults to a suggestion.

@tannergooding tannergooding modified the milestones: 8.0.0, Future Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests