Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code style to convert byte arrays to UTF8 strings #60647

Merged
merged 47 commits into from
May 3, 2022
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
e997bd4
Initial analyzer to report diagnostics for byte arrays that can be co…
davidwengier Apr 7, 2022
7a7e785
Add a code fix provider that doesn't work
davidwengier Apr 7, 2022
8ba815c
Move more to the analyzer since it has the constant values already an…
davidwengier Apr 8, 2022
a648a47
More tests
davidwengier Apr 8, 2022
dee8140
Ensure we only offer when we can generate a valid string
davidwengier Apr 8, 2022
0891597
Fix order
davidwengier Apr 9, 2022
ed33897
FIx correctness
davidwengier Apr 9, 2022
3748700
Fix attributes
davidwengier Apr 9, 2022
527c33f
Fix declarations
davidwengier Apr 9, 2022
693f225
Apply trivia directly to the token
davidwengier Apr 9, 2022
b40d389
Simplify array type check
davidwengier Apr 9, 2022
9a04491
Cleanup
davidwengier Apr 9, 2022
9ec2f4d
Fix diagnostic id tests
davidwengier Apr 9, 2022
2885ee3
Move
davidwengier Apr 9, 2022
35c7ceb
Report diagnostic at first token
davidwengier Apr 9, 2022
f02a6b9
Add some more specific tests for real byte arrays found around the place
davidwengier Apr 9, 2022
9193027
Add test for collection initializer edge case
davidwengier Apr 9, 2022
55053d7
Add test for using statement edge case
davidwengier Apr 9, 2022
c8932ed
Support implicit arrays in parameter arrays
davidwengier Apr 9, 2022
80b20ad
Correctness
davidwengier Apr 9, 2022
7dbacfe
Fix code cleanup
davidwengier Apr 10, 2022
c12468f
Fix argument list rewriting
davidwengier Apr 10, 2022
ea42b95
More parameter array cases
davidwengier Apr 10, 2022
d145d63
Merge remote-tracking branch 'upstream/main' into UseUTF8Strings
davidwengier Apr 10, 2022
ad88b8e
Fix API break
davidwengier Apr 10, 2022
017037a
Fix
davidwengier Apr 10, 2022
b8a66a7
Move UTF8 string creation to the code fix, and just validate in the a…
davidwengier Apr 11, 2022
1dc375b
Share code for escaping characters
davidwengier Apr 11, 2022
9a78ad2
Fix language version check
davidwengier Apr 11, 2022
28daf9d
PR Feedback
davidwengier Apr 11, 2022
69afe46
Fix editorconfig tests
davidwengier Apr 12, 2022
0cddacd
Show option in VS and editor config UI
davidwengier Apr 12, 2022
0648b7a
Fixes
davidwengier Apr 12, 2022
2e8d442
Rename - plural sounds better
davidwengier Apr 12, 2022
437e576
Update UseUTF8StringLiteralDiagnosticAnalyzer.cs
davidwengier Apr 12, 2022
374be8f
GitHub editor swallowed my whitespace
davidwengier Apr 12, 2022
96d9b7b
Typo
davidwengier Apr 12, 2022
fa6112e
UTF8 -> UTF-8 is string resources
davidwengier Apr 26, 2022
4d10c90
Remove intermediate array(s)
davidwengier Apr 26, 2022
2919e48
Validate syntax directly, and use properties to tell the code fix wha…
davidwengier Apr 26, 2022
f2f952b
Multidimensional arrays
davidwengier Apr 26, 2022
f6dc999
Fix xlf files
davidwengier Apr 26, 2022
90bff0f
Update string for consistency
davidwengier Apr 27, 2022
7f76fff
Update tests
davidwengier Apr 27, 2022
b498cf2
Reduce allocations
davidwengier Apr 27, 2022
24ddd14
PR feedback
davidwengier May 3, 2022
c5844b4
Merge remote-tracking branch 'upstream/main' into UseUTF8Strings
davidwengier May 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/Analyzers/CSharp/Analyzers/CSharpAnalyzers.projitems
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@
<Compile Include="$(MSBuildThisFileDirectory)UsePatternMatching\CSharpUseNotPatternDiagnosticAnalyzer.cs" />
<Compile Include="$(MSBuildThisFileDirectory)UseSimpleUsingStatement\UseSimpleUsingStatementDiagnosticAnalyzer.cs" />
<Compile Include="$(MSBuildThisFileDirectory)UseThrowExpression\CSharpUseThrowExpressionDiagnosticAnalyzer.cs" />
<Compile Include="$(MSBuildThisFileDirectory)UseUTF8StringLiteral\UseUTF8StringLiteralDiagnosticAnalyzer.cs" />
<Compile Include="$(MSBuildThisFileDirectory)ValidateFormatString\CSharpValidateFormatStringDiagnosticAnalyzer.cs" />
<Compile Include="$(MSBuildThisFileDirectory)NewLines\EmbeddedStatementPlacement\EmbeddedStatementPlacementDiagnosticAnalyzer.cs" />
</ItemGroup>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -365,4 +365,10 @@
<data name="Convert_to_top_level_statements" xml:space="preserve">
<value>Convert to top-level statements</value>
</data>
<data name="Convert_to_UTF8_string_literal" xml:space="preserve">
<value>Convert to UTF8 string literal</value>
davidwengier marked this conversation as resolved.
Show resolved Hide resolved
</data>
<data name="Use_UTF8_string_literal" xml:space="preserve">
<value>Use UTF8 string literal</value>
davidwengier marked this conversation as resolved.
Show resolved Hide resolved
</data>
</root>
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System;
using System.Collections.Immutable;
using System.Linq;
using System.Text;
using Microsoft.CodeAnalysis.CodeStyle;
using Microsoft.CodeAnalysis.Collections;
using Microsoft.CodeAnalysis.CSharp.CodeStyle;
using Microsoft.CodeAnalysis.CSharp.EmbeddedLanguages.VirtualChars;
using Microsoft.CodeAnalysis.CSharp.Extensions;
using Microsoft.CodeAnalysis.CSharp.Shared.Extensions;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using Microsoft.CodeAnalysis.Diagnostics;
using Microsoft.CodeAnalysis.Operations;
using Microsoft.CodeAnalysis.PooledObjects;
using Microsoft.CodeAnalysis.Shared.Collections;
using Microsoft.CodeAnalysis.Text;
using Roslyn.Utilities;

namespace Microsoft.CodeAnalysis.CSharp.UseUTF8StringLiteral
{
[DiagnosticAnalyzer(LanguageNames.CSharp)]
internal sealed class UseUTF8StringLiteralDiagnosticAnalyzer : AbstractBuiltInCodeStyleDiagnosticAnalyzer
{
public UseUTF8StringLiteralDiagnosticAnalyzer()
: base(IDEDiagnosticIds.UseUTF8StringLiteralDiagnosticId,
EnforceOnBuildValues.UseUTF8StringLiteral,
CSharpCodeStyleOptions.PreferUTF8StringLiterals,
LanguageNames.CSharp,
new LocalizableResourceString(nameof(CSharpAnalyzersResources.Convert_to_UTF8_string_literal), CSharpAnalyzersResources.ResourceManager, typeof(CSharpAnalyzersResources)),
new LocalizableResourceString(nameof(CSharpAnalyzersResources.Use_UTF8_string_literal), CSharpAnalyzersResources.ResourceManager, typeof(CSharpAnalyzersResources)))
{
}

public override DiagnosticAnalyzerCategory GetAnalyzerCategory()
=> DiagnosticAnalyzerCategory.SemanticSpanAnalysis;

protected override void InitializeWorker(AnalysisContext context)
=> context.RegisterCompilationStartAction(context =>
{
if (!context.Compilation.LanguageVersion().IsCSharp11OrAbove())
return;

var expressionType = context.Compilation.GetTypeByMetadataName(typeof(System.Linq.Expressions.Expression<>).FullName!);

context.RegisterOperationAction(c => AnalyzeOperation(c, expressionType), OperationKind.ArrayCreation);
});

private void AnalyzeOperation(OperationAnalysisContext context, INamedTypeSymbol? expressionType)
{
var arrayCreationOperation = (IArrayCreationOperation)context.Operation;

// Don't offer if the user doesn't want it
var option = context.GetOption(CSharpCodeStyleOptions.PreferUTF8StringLiterals);
if (!option.Value)
return;

// Only replace arrays with initializers
if (arrayCreationOperation.Initializer is null)
return;

// Must be a byte array
if (arrayCreationOperation.Type is not IArrayTypeSymbol { ElementType.SpecialType: SpecialType.System_Byte })
return;

// UTF8 strings are not valid to use in attributes
davidwengier marked this conversation as resolved.
Show resolved Hide resolved
if (arrayCreationOperation.Syntax.Ancestors().OfType<AttributeSyntax>().Any())
return;

// Can't use a UTF8 string inside an expression tree.
var semanticModel = context.Operation.SemanticModel;
Contract.ThrowIfNull(semanticModel);
if (arrayCreationOperation.Syntax.IsInExpressionTree(semanticModel, expressionType, context.CancellationToken))
return;

var elements = arrayCreationOperation.Initializer.ElementValues;

// If the compiler has constructed this array creation, then we don't want to do anything
// if there aren't any elements, as we could just end up inserting ""u8 somewhere.
if (arrayCreationOperation.IsImplicit && elements.Length == 0)
return;

// We need to ensure that each element is a byte, and that they are representable as a string
// but to avoid LOH allocations from large user data, we use a SegmentedList here, and
// only create a small array when necessary later.
var values = new SegmentedList<byte>(elements.Where(v => v.ConstantValue.Value is byte).Select(v => (byte)v.ConstantValue.Value!));

// If we couldn't get constant values for all elements then we can't offer
if (values.Count != elements.Length)
davidwengier marked this conversation as resolved.
Show resolved Hide resolved
return;

if (!TryConvertToUTF8String(builder: null, values))
return;

// Only raise the diagnostic on the first token (usually "new")
var location = arrayCreationOperation.Syntax.GetFirstToken().GetLocation();

// Store the original syntax location so the code fix can find the operation again
var additionalLocations = ImmutableArray.Create(arrayCreationOperation.Syntax.GetLocation());

// If this array creation is not an array creation expression, then it must be from
// a parameter array, and the Syntax will be the entire invocation. To issue better
davidwengier marked this conversation as resolved.
Show resolved Hide resolved
// diagnostics construct a different location for this case.
if (arrayCreationOperation.Syntax is not (ImplicitArrayCreationExpressionSyntax or ArrayCreationExpressionSyntax))
{
// Issue the diagnostic for all of the parameters that make up the array. We could do just
// the first element, but that might be odd seeing: M(1, 2, [|3|], 4, 5)
var span = TextSpan.FromBounds(elements.First().Syntax.SpanStart, elements.Last().Syntax.Span.End);
location = Location.Create(location.SourceTree!, span);
}

context.ReportDiagnostic(
DiagnosticHelper.Create(Descriptor, location, option.Notification.Severity, additionalLocations, properties: null));
}

internal static bool TryConvertToUTF8String(StringBuilder? builder, SegmentedList<byte> values)
{
// Since we'll only ever need to use up to 4 bytes to check/convert to UTF8
// we can just use one array and reuse it. Using an array pool would do the same
// thing but with more locks.
var array = new byte[4];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a heap allocation be avoided here by using stackalloc?

for (var i = 0; i < values.Count;)
{
// We only need max 4 elements for a single Rune
var count = Math.Min(values.Count - i, 4);

// Need to copy to a regular array to get a ROS for Rune to process
values.CopyTo(i, array, 0, count);
var ros = new ReadOnlySpan<byte>(array, 0, count);

// If we can't decode a rune from the array then it can't be represented as a string
if (Rune.DecodeFromUtf8(ros, out var rune, out var bytesConsumed) != System.Buffers.OperationStatus.Done)
return false;

i += bytesConsumed;

if (builder is not null)
{
if (rune.TryGetEscapeCharacter(out var escapeChar))
{
builder.Append('\\');
builder.Append(escapeChar);
}
else
{
builder.Append(rune.ToString());
davidwengier marked this conversation as resolved.
Show resolved Hide resolved
davidwengier marked this conversation as resolved.
Show resolved Hide resolved
}
}
}

return true;
}
}
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading