Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Add --normalize-for-ghas argument to the rewrite command #2581

Merged
merged 12 commits into from
Jan 18, 2023
1 change: 1 addition & 0 deletions src/ReleaseHistory.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
* BUGFIX: Eliminate `IOException` and `DirectoryNotFoundException` exceptions thrown by `merge` command when splitting by rule (due to invalid file characters in rule ids). [#2513](https://github.com/microsoft/sarif-sdk/pull/2513)
* BUGFIX: Fix classes inside NotYetAutoGenerated folder missing `virtual` keyword for public methods and properties, by regenerate and manually sync the changes. [#2537](https://github.com/microsoft/sarif-sdk/pull/2537)
* FEATURE: Allow external set of `MaxFileSizeInKilobytes`, which will allow SDK users to change the value. (Default value is 1024) [#2578](https://github.com/microsoft/sarif-sdk/pull/2578)
* FEATURE: Add `--normalize-for-github` argument to the `rewrite` command to get GitHub supported SARIF format. [#2581](https://github.com/microsoft/sarif-sdk/pull/2581)
Copy link
Member

@michaelcfanning michaelcfanning Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good release note item but let me add a bit more detail, try this:

  • FEATURE: Add --normalize-for-github argument to the rewrite command to ensure rewritten SARIF is compatible with GitHub Advanced Security ingestion requirements. #2581 #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed


## **v3.1.0** [Sdk](https://www.nuget.org/packages/Sarif.Sdk/3.1.0) | [Driver](https://www.nuget.org/packages/Sarif.Driver/3.1.0) | [Converters](https://www.nuget.org/packages/Sarif.Converters/3.1.0) | [Multitool](https://www.nuget.org/packages/Sarif.Multitool/3.1.0) | [Multitool Library](https://www.nuget.org/packages/Sarif.Multitool.Library/3.1.0)

Expand Down
20 changes: 20 additions & 0 deletions src/Sarif.Multitool.Library/RewriteCommand.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;

using Microsoft.CodeAnalysis.Sarif.Driver;
using Microsoft.CodeAnalysis.Sarif.Readers;
Expand Down Expand Up @@ -64,6 +65,25 @@ public int Run(RewriteOptions options)
reformattedLog = new SortingVisitor().VisitSarifLog(reformattedLog);
}

if (options.NormalizeForGitHub)
{
if ((reformattedLog.Runs?.Any(r => r.Artifacts?.Any(a => a.Location?.Uri?.IsAbsoluteUri == true) == true) == true) ||
Copy link
Member

@michaelcfanning michaelcfanning Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reformattedLog

So I don't like these two lines of code which will traverse the log file twice.

Can't we simply invoke the RebaseUriVisitor in cases when those args are non-null.

And then make the GitHubIngestionVisitor throw new arg exception on encountering the any absolute uri as you detect here.
#Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the suggested change.
Though we can not simply throw in GitHubIngestionVisitor, because it is a same method VisitArtifactLocation to visit both of it in result and also the originalUriBaseIds,
we allow one and not the other to be absolute.
Currently I does not check and throw, let me know your instruction.

(reformattedLog.Runs?.Any(r => r.Results?.Any(s => s.Locations?.Any(l => l.PhysicalLocation?.ArtifactLocation?.Uri?.IsAbsoluteUri == true) == true) == true) == true))
{
if (options.BasePath == null || options.BasePathToken == null)
{
throw new ArgumentException("The input SARIF file contains absolute Uri which will not work with GitHub. Please also provide `--base-path-value` and `--base-path-token` to convert them to relative Uri.");
}
else
{
var visitor = new RebaseUriVisitor(options.BasePathToken, new Uri(options.BasePath), options.RebaseRelativeUris);
reformattedLog = visitor.VisitSarifLog(reformattedLog);
}
}

reformattedLog = new GitHubIngestionVisitor().VisitSarifLog(reformattedLog);
}

if (options.SarifOutputVersion == SarifVersion.OneZeroZero)
{
var visitor = new SarifCurrentToVersionOneVisitor();
Expand Down
28 changes: 28 additions & 0 deletions src/Sarif.Multitool.Library/RewriteOptions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,33 @@ public class RewriteOptions : SingleFileOptionsBase
Default = false,
HelpText = "Sort results in the final output file.")]
public bool SortResults { get; set; }

[Option(
Copy link
Collaborator Author

@shaopeng-gh shaopeng-gh Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Option(

All these options are copied from other file with existing name and HelpText. #Closed

"normalize-for-github",
Copy link
Collaborator

@yongyan-gh yongyan-gh Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"normalize-for-github"

Question: the Convert command has the same normalize-for-github parameter, can we use Convert command to achieve the same? #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert means 'convert from some other non-SARIF format to SARIF.' This operation is a SARIF->SARIF transformation, not a 'conversion'.

One thing, we could use 'normalize-for-ghas' instead as a more precise designation (though 'ghas' may be more opaque to users.

I'm fine with either choice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use 'normalize-for-ghas' instead as a more precise designation
---- sounds good, will change it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use 'normalize-for-ghas' instead as a more precise designation
---- done.

HelpText = "Normalize SARIF to conform to GitHub Advanced Security code scanning ingestion requirements.")]
public bool NormalizeForGitHub { get; set; }

[Option(
'b',
"base-path-value",
Required = false,
HelpText = "Base path value to use while rebasing all paths. E.x. 'C:\\bld\\1234\\bin\\'"
)]
public string BasePath { get; set; }

[Option(
't',
"base-path-token",
Required = false,
HelpText = "Variable to use for the base path token (e.x. 'SRCROOT')"
)]
public string BasePathToken { get; set; }

[Option(
"rebase-relative-uris",
Default = false,
HelpText = "All relative uris will be rebased to the base-path-value if true. Default is false."
)]
public bool RebaseRelativeUris { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RebaseRelativeUris

as a next step, you need to make sure rewrite always honors this command in all cases.

}
}
68 changes: 34 additions & 34 deletions src/Sarif/Visitors/GitHubIngestionVisitor.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System.Collections.Generic;
using System.Globalization;
using System.Linq;

namespace Microsoft.CodeAnalysis.Sarif.Visitors
Expand All @@ -12,42 +13,19 @@ public class GitHubIngestionVisitor : SarifRewritingVisitor
// Internal static rather than private const to allow a unit test with a practical limit.
internal static int s_MaxResults = 5000;

private Run run;
private int ruleIndex = -1;
private IList<Artifact> artifacts;
private IList<ThreadFlowLocation> threadFlowLocations;

public override Run VisitRun(Run node)
{
this.run = node;
this.artifacts = node.Artifacts;
this.threadFlowLocations = node.ThreadFlowLocations;

if (node.Results != null)
{
int errorsCount = 0;
Copy link
Member

@michaelcfanning michaelcfanning Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errorsCount

I wouldn't delete this until you confirm that GH can accept warnings. Also, if this visitor was correct, the validator should also fire on any warnings in a log file (warning that they will be stripped). #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i think it supports warning, please see my other reply with screenshot.

foreach (Result result in node.Results)
{
if (result.Level == FailureLevel.Error)
{
errorsCount++;
}
}

if (errorsCount != node.Results.Count)
{
var errors = new List<Result>();

foreach (Result result in node.Results)
{
if (result.Level == FailureLevel.Error)
{
errors.Add(result);
}

if (errors.Count == s_MaxResults) { break; }
}

node.Results = errors;
}
Copy link
Collaborator Author

@shaopeng-gh shaopeng-gh Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also see the test file "NonErrorResults.sarif'.
This does not seem right, why do we want to remove warning results?
#Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps GitHub only used to process errors? Have you tested this in their system?

Copy link
Collaborator Author

@shaopeng-gh shaopeng-gh Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the current SDK repo:
image

and also those linters works I had been using warning to upload to GHAS, shows similar to above.


if (node.Results.Count > s_MaxResults)
{
node.Results = node.Results.Take(s_MaxResults).ToList();
Expand Down Expand Up @@ -89,10 +67,7 @@ public override ThreadFlowLocation VisitThreadFlowLocation(ThreadFlowLocation no
{
// Make sure there's a place to put the message. threadFlowLocation.Location
// is not required, so the shared object might not have had one.
if (node.Location == null)
{
node.Location = new Location();
}
node.Location ??= new Location();

node.Location.Message = message;
}
Expand Down Expand Up @@ -144,6 +119,8 @@ public override ArtifactLocation VisitArtifactLocation(ArtifactLocation node)

public override Result VisitResult(Result node)
{
this.ruleIndex = node.RuleIndex;

if (node.Fingerprints != null)
{
// GitHub appears to require that fingerprints be emitted to the
Copy link
Member

@michaelcfanning michaelcfanning Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s to require

We have a message string look-up helper that you should use. As Kalle notes, this is a pretty complex operation.

OK, I see you grabbed this code from the insert optional data visitor. We shouldn't copy code like this, minimally, it should be in a common helper somewhere (you could have created a static in the other visitor, for example).

Let me keep looking for a better helper, I could swear we have one. #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, nope. It appears that we don't. So the correct course here is to please create a common helper for both this code and the insert optional data visitor to call. Second, create an issue for the problem that Kalle notes: 'Message creation logic does not properly search tool extensions for format strings.'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will work on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done both add helper method and create an issue
#2582

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VisitMessage

add unit test for result with rule id only, no index

Expand All @@ -161,6 +138,32 @@ public override Result VisitResult(Result node)
return base.VisitResult(node);
}

public override Message VisitMessage(Message node)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return

add a blank line.

if (node.Text == null)
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need the same for node.Markdown? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is form GitHub, I think GitHub requires .Text only

{
ReportingDescriptor rule = this.ruleIndex != -1 ? this.run.Tool.Driver.Rules[this.ruleIndex] : null;
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fully implement SARIF-2.1.0 §3.11.7 Message string lookup. I think this won't find messages of tool extensions. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct, created an issue to track.


if (rule != null &&
rule.MessageStrings != null &&
rule.MessageStrings.TryGetValue(node.Id, out MultiformatMessageString formatString))
{
node.Text = node.Arguments?.Count > 0
? rule.Format(node.Id, node.Arguments)
: formatString?.Text;
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the result of string.Format still contains substrings like "{0}", should this double the braces, in order to prevent the SARIF consumer from treating those as placeholders and attempting to expand them again? Also, should this reset node.Arguments to null? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result is put in .Text.
.Text is for plain text, consumer should not try to expand any placeholders in it.

should this reset node.Arguments to null ---- currently, .text is added side by side with .id and .arguments.
The information is duplicated, the benefit is we can un-flatten if needed by simply remove the .text field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Text is for plain text, consumer should not try to expand any placeholders in it.

That's not how I read the standard.

So the text property of a message object may contain placeholders. Does the SARIF consumer in GitHub Advanced Security deviate from the standard in this respect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With more reading I think you are right! The Text can contain placeholder again!

Within both plain text and formatted message strings, the characters “{” and “}” SHALL be represented by the character sequences “{{” and “}}” respectively. ----- so, I think I will replace { with {{ and } with }}, let me know if sounds good to you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will replace { with {{ and } with }} -- this is done and updated the test case.

}

if (node.Text == null &&
this.run.Tool.Driver.GlobalMessageStrings?.TryGetValue(node.Id, out formatString) == true)
{
node.Text = node.Arguments?.Count > 0
? string.Format(CultureInfo.CurrentCulture, formatString.Text, node.Arguments.ToArray())

Check warning

Code scanning / CodeQL

Dereferenced variable may be null

Variable [formatString](1) may be null at this access as suggested by [this](2) null check.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a false alarm to me

: formatString?.Text;
}
}
return base.VisitMessage(node);
}

// Merge properties from a source to a target, preferring the existing properties
// on the target if there are any duplicates.
private static void MergeProperties(PropertyBagHolder target, PropertyBagHolder source)
Expand All @@ -169,10 +172,7 @@ private static void MergeProperties(PropertyBagHolder target, PropertyBagHolder
if (source.Properties != null)
{
// If so, make sure there's someplace to put them.
if (target.Properties == null)
{
target.Properties = new Dictionary<string, SerializedPropertyInfo>();
}
target.Properties ??= new Dictionary<string, SerializedPropertyInfo>();

target.Properties = target.Properties.MergePreferFirst(source.Properties);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,78 @@
using Newtonsoft.Json.Linq;

using Xunit;
using Xunit.Abstractions;

namespace Microsoft.CodeAnalysis.Sarif.Multitool
{
public class RewriteCommandTests
public class RewriteCommandTests : FileDiffingUnitTests
{
private RewriteOptions options;

public RewriteCommandTests(ITestOutputHelper outputHelper) : base(outputHelper) { }

private static RewriteOptions CreateDefaultOptions()
{
return new RewriteOptions
{
BasePath = @"C:\vs\src\2\s\",
BasePathToken = "SRCROOT",
Inline = true,
SarifOutputVersion = SarifVersion.Current,
PrettyPrint = true,
NormalizeForGitHub = true,
};
}

protected override string ConstructTestOutputFromInputResource(string testFilePath, object parameter)
{
return RunRewriteCommand(testFilePath, this.options);
}

[Fact]
public void RebaseUriCommand_RebaseRunWithArtifacts()
{
string testFilePath = "RunWithArtifacts.sarif";

this.options = CreateDefaultOptions();

RunTest(testFilePath);
}

private string RunRewriteCommand(string testFilePath, RewriteOptions options)
Fixed Show fixed Hide fixed
{
string inputSarifLog = GetInputSarifTextFromResource(testFilePath);

string logFilePath = Path.Combine(Directory.GetCurrentDirectory(), "mylog.sarif");
StringBuilder transformedContents = new StringBuilder();

options.InputFilePath = logFilePath;
options.OutputFilePath = null;

Mock<IFileSystem> mockFileSystem = ArrangeMockFileSystem(inputSarifLog, logFilePath, transformedContents);

var RewriteCommand = new RewriteCommand(mockFileSystem.Object);

int returnCode = RewriteCommand.Run(options);
string actualOutput = transformedContents.ToString();

returnCode.Should().Be(0);

return actualOutput;
}

private static Mock<IFileSystem> ArrangeMockFileSystem(string sarifLog, string logFilePath, StringBuilder transformedContents)
{
var mockFileSystem = new Mock<IFileSystem>();
mockFileSystem.Setup(x => x.FileReadAllText(logFilePath)).Returns(sarifLog);
mockFileSystem.Setup(x => x.FileOpenRead(logFilePath)).Returns(() => new MemoryStream(Encoding.UTF8.GetBytes(sarifLog)));
mockFileSystem.Setup(x => x.FileCreate(logFilePath)).Returns(() => new MemoryStreamToStringBuilder(transformedContents));
mockFileSystem.Setup(x => x.FileWriteAllText(logFilePath, It.IsAny<string>())).Callback<string, string>((path, contents) => { transformedContents.Append(contents); });
mockFileSystem.Setup(x => x.DirectoryExists(It.IsAny<string>())).Returns(true);
mockFileSystem.Setup(x => x.DirectoryGetFiles(It.IsAny<string>(), It.IsAny<string>())).Returns(new string[] { logFilePath });
return mockFileSystem;
}

public const string MinimalPrereleaseV2Text =
@"{
""$schema"": ""http://json.schemastore.org/sarif-2.0.0"",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,6 @@
<None Remove="TestData\ExportRuleDocumentationCommand\**" />
</ItemGroup>

<ItemGroup>
<None Remove="TestData\ExportValidationRulesMetadataCommand\ExpectedOutputs\MarkdownFullDescription.md" />
<None Remove="TestData\ExportValidationRulesMetadataCommand\ExpectedOutputs\MarkdownShortDescription.md" />
<None Remove="TestData\ExportValidationRulesMetadataCommand\ExpectedOutputs\NoDescription.md" />
<None Remove="TestData\ExportValidationRulesMetadataCommand\ExpectedOutputs\NonStandardMessageStringKey.md" />
<None Remove="TestData\ExportValidationRulesMetadataCommand\ExpectedOutputs\StandardMessageStringKey.md" />
<None Remove="TestData\ExportValidationRulesMetadataCommand\ExpectedOutputs\TextShortDescription.md" />
<None Remove="TestData\MergeCommand\ExpectedOutputs\DuplicatedResults.sarif" />
</ItemGroup>

<ItemGroup>
<Content Include="xunit.runner.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
Expand All @@ -55,6 +45,8 @@
<EmbeddedResource Include="TestData\PageCommand\elfie-arriba.sarif" />
<EmbeddedResource Include="TestData\QueryCommand\elfie-arriba.sarif" />
<EmbeddedResource Include="TestData\QueryCommand\WithProperties.sarif" />
<EmbeddedResource Include="TestData\RewriteCommand\ExpectedOutputs\RunWithArtifacts.sarif" />
<EmbeddedResource Include="TestData\RewriteCommand\Inputs\RunWithArtifacts.sarif" />
<EmbeddedResource Include="TestData\RebaseUriCommand\ExpectedOutputs\RunWithArtifacts.sarif" />
<EmbeddedResource Include="TestData\RebaseUriCommand\Inputs\RunWithArtifacts.sarif" />
<EmbeddedResource Include="TestData\ValidateCommand\Configuration.json" />
Expand Down
Loading