Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SARIF Output Support #946

Merged
merged 2 commits into from
Feb 2, 2022
Merged

SARIF Output Support #946

merged 2 commits into from
Feb 2, 2022

Conversation

anthturner
Copy link
Contributor

Signed-off-by: Anthony Turner 225599+anthturner@users.noreply.github.com

What I did

Added SARIF-compatible output structures as an output option in the same vein as SonarQube

How to verify it

Use -o sarif as an option with Horusec to output a SARIF report

- Description for the changelog
Adds SARIF output support


Want to note that this is not necessarily complete; there are several things which just don't exist in Horusec right now. For example, I notice not all of the engine modules have RuleIDs populated, and there is other metadata (such as URL) which need to have a lookup table or some other place to pull them from. This might mean authoring a .csv file to track the metadata or maybe embedding it into code somehow is better.

Hopefully this at least helps get the conversation started.

@anthturner
Copy link
Contributor Author

I might take a crack at adding RuleId to the records for non-HorusecEngine scanners. I'll do that as a separate PR since it'll be a different set of potential issues to fix.

internal/controllers/printresults/print_results.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/entities/sarif/report.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
@wiliansilvazup wiliansilvazup added the kind/enhancement This issue is related to a new feature or request label Jan 24, 2022
@wiliansilvazup wiliansilvazup linked an issue Jan 24, 2022 that may be closed by this pull request
@matheusalcantarazup
Copy link
Contributor

matheusalcantarazup commented Jan 24, 2022

and there is other metadata (such as URL) which need to have a lookup table or some other place to pull them from.

I think that this would be hard to do, since we not control the results of the tools that we orchestrate.

For example, I notice not all of the engine modules have RuleIDs populated

This is because we not populate this field from tools that we orchestrate. This field should be unique, I don't know how we can control this "uniqueness" for this other tools.

BTW, the schema says that ruleId should be a object, not a string, I'm seeing the correct schema?

I would like to suggest to using git commit --amend to amed your changes on the current git commit instead create a new one, with this you don't need to add DCO on all commits (and when is PR will be merged all commits will be squashed anyway).

@matheusalcantarazup
Copy link
Contributor

Oh, I think that I was using the wrong schema, this implementation is based on this documentation right?

@anthturner
Copy link
Contributor Author

Oh, I think that I was using the wrong schema, this implementation is based on this documentation right?

Correct. RuleID is a string rather than object. It's mostly just used for correlation between multiple files/lines which have the same violation type.

@anthturner
Copy link
Contributor Author

anthturner commented Jan 24, 2022

For example, I notice not all of the engine modules have RuleIDs populated

This is because we not populate this field from tools that we orchestrate. This field should be unique, I don't know how we can control this "uniqueness" for this other tools.

See #949 for my thoughts on how we can introduce this.

@matheusalcantarazup
Copy link
Contributor

Correct. RuleID is a string rather than object. It's mostly just used for correlation between multiple files/lines which have the same violation type.

According to the documentation this field is optional, can we follow with this implementation separately from adding the rule id for all vulnerabilities?

Maybe for this initial implementation we can leave this rule id empty (for vulnerability that was not found by horusec engine) and when we found a solution for #949 we will automatically have this value on output for all tools.

I'm not finding the helpUri and informationUri fields on docs, these fields is optional too? If it yes we can use the same approach described above.

cc @wiliansilvazup @nathanmartinszup @iancardosozup

@anthturner
Copy link
Contributor Author

Correct. RuleID is a string rather than object. It's mostly just used for correlation between multiple files/lines which have the same violation type.

According to the documentation this field is optional, can we follow with this implementation separately from adding the rule id for all vulnerabilities?

Maybe for this initial implementation we can leave this rule id empty (for vulnerability that was not found by horusec engine) and when we found a solution for #949 we will automatically have this value on output for all tools.

I'm not finding the helpUri and informationUri fields on docs, these fields is optional too? If it yes we can use the same approach described above.

cc @wiliansilvazup @nathanmartinszup @iancardosozup

I'm fine dropping the URIs -- again, those were originally for my own purposes, since they can link out to corresponding doc pages. That's incredibly useful for developers to help them fix an error they may not understand.

A lot of how I'm thinking of this visually can be seen by using:
https://microsoft.github.io/sarif-web-component/

I've attached two sample SARIF files (as TXT, because of GitHub limitations)

sample-a.sarif.txt -- This one uses RuleId for grouping
sample-b.sarif.txt -- This one does not group

These files can be used with the above viewer to visualize what this might look like in an external tool.

You can probably see how scaling this out to what could be hundreds, if not thousands, of vulnerabilities across large targets (as in my use case) would be challenging without an adequate grouping mechanism. Just a thought. 😄

@iancardosozup
Copy link
Contributor

iancardosozup commented Jan 24, 2022

Maybe for this initial implementation we can leave this rule id empty (for vulnerability that was not found by horusec engine) and when we found a solution for #949 we will automatically have this value on output for all tools.

I guess this is a good approach to get sarif output in a quicker way, but i think #949 should be prioritized since we're gonna get an inconsistent output between our engine and tools that we orchestrate.

@matheusalcantarazup
Copy link
Contributor

You can probably see how scaling this out to what could be hundreds, if not thousands, of vulnerabilities across large targets (as in my use case) would be challenging without an adequate grouping mechanism. Just a thought.

This is definitely a problem, but I think that we should concentrate the discussion about the rule id on #949, and on this PR we ensure that we export the Sarif output correctly.

@matheusalcantarazup
Copy link
Contributor

I think that would be good add some unit tests, to ensure that we convert the analysis vulnerabilities to sarif.Report struct correctly.

To test this, we can basically create an analysis.Analysis with a bunch of vulnerabilities and them call ConvertVulnerabilityToSarif and assert that the fields from Report is filled correctly.

Please, ask any question if you have any problem to create these tests.

internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/schema.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
@matheusalcantarazup
Copy link
Contributor

There is some others linter warnings, you can check them here.

@iancardosozup
Copy link
Contributor

@anthturner i think if you run make format and make lint on root repository would be easier to fix those lint errors that @matheusalcantarazup mentioned, and please add your signature to commit to pass the DCO check (find more about on CONTRIBUTING.md)

@anthturner
Copy link
Contributor Author

@anthturner i think if you run make format and make lint on root repository would be easier to fix those lint errors that @matheusalcantarazup mentioned, and please add your signature to commit to pass the DCO check (find more about on CONTRIBUTING.md)

Aside from the challenges of line lengths and function lengths with this one, I'm stuck on a number of similar errors to this: importShadow: shadow of imported from 'github.com/ZupIT/horusec-devkit/pkg/entities/analysis' package 'analysis'

That occurs when I don't use the aliasing of the package on import; if I do, this doesn't appear.

Googling it is not much help; was hoping someone could help me along so I can pass the linter.

@nathanmartinszup
Copy link
Contributor

nathanmartinszup commented Jan 28, 2022

@anthturner i think if you run make format and make lint on root repository would be easier to fix those lint errors that @matheusalcantarazup mentioned, and please add your signature to commit to pass the DCO check (find more about on CONTRIBUTING.md)

Aside from the challenges of line lengths and function lengths with this one, I'm stuck on a number of similar errors to this: importShadow: shadow of imported from 'github.com/ZupIT/horusec-devkit/pkg/entities/analysis' package 'analysis'

That occurs when I don't use the aliasing of the package on import; if I do, this doesn't appear.

Googling it is not much help; was hoping someone could help me along so I can pass the linter.

This error occurs when a package and a variable have the same name, as @matheusalcantarazup suggested you can rename the variables as analysiss to avoid this.

Copy link
Contributor

@matheusalcantarazup matheusalcantarazup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a documentation about this private functions that do the parsing using this guide? They seems a bit tricky do understand in first place.

I'm not convinced about this methods should have to many arguments, but for now I don't have any better idea, maybe embed on Sarif struct, so these methods just use them?

cc @wiliansilvazup @nathanmartinszup @iancardosozup

internal/services/sarif/sarif_test.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
@iancardosozup
Copy link
Contributor

Can you please add a documentation about this private functions that do the parsing using this guide? They seems a bit tricky do understand in first place.

I'm not convinced about this methods should have to many arguments, but for now I don't have any better idea, maybe embed on Sarif struct, so these methods just use them?

cc @wiliansilvazup @nathanmartinszup @iancardosozup

i agree, if these fields are into Sarif struct the method initToolStructure could be refactored to fit in NewSarif method

@anthturner
Copy link
Contributor Author

Can you please add a documentation about this private functions that do the parsing using this guide? They seems a bit tricky do understand in first place.
I'm not convinced about this methods should have to many arguments, but for now I don't have any better idea, maybe embed on Sarif struct, so these methods just use them?
cc @wiliansilvazup @nathanmartinszup @iancardosozup

i agree, if these fields are into Sarif struct the method initToolStructure could be refactored to fit in NewSarif method

This was a great suggestion and helped considerably.

I added documentation as best as I could, including why each association map was necessary. It's admittedly a challenge, as the SARIF format requires very specific grouping of different elements which Horusec does not differentiate between in its report by default.

internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
internal/services/sarif/sarif.go Outdated Show resolved Hide resolved
Copy link
Contributor

@matheusalcantarazup matheusalcantarazup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks very much for your contribution @anthturner

@matheusalcantarazup
Copy link
Contributor

Some commits still failing because of DCO. I suggest you squash all 15 commits in a single one signed commit.

To fix linter warnings you can run make format and them commit those changes.

Copy link
Contributor

@iancardosozup iancardosozup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for yoru contribution, you rock!

@nathanmartinszup
Copy link
Contributor

Awesome job @anthturner.

Thanks for your contribution!

@anthturner
Copy link
Contributor Author

Some commits still failing because of DCO. I suggest you squash all 15 commits in a single one signed commit.

To fix linter warnings you can run make format and them commit those changes.

Will do. One thing I noticed, if you want to add it to docs, is that the goimports tool requires $HOME/go/bin be in PATH for the make format command to run properly. (On MacOS, this is not the case by default)

I'm currently struggling with the next part of this, which is that make format seems to be expecting a very specific path format:

❯ make format
go install golang.org/x/tools/cmd/goimports@latest
go install mvdan.cc/gofumpt@latest
go install github.com/daixiang0/gci@latest
gofmt -s -l -w $(find . -name '*.go' | grep -v vendor | grep -v /examples/)
goimports -w -local github.com/ZupIT/horusec/ $(find . -name '*.go' | grep -v vendor | grep -v /examples/)
gofumpt -l -w $(find . -name '*.go' | grep -v vendor | grep -v /examples/)
gci -w -local github.com/ZupIT/horusec/ $(find . -name '*.go' | grep -v vendor | grep -v /examples/)
Using the old parameters is deprecated, please use the named subcommands!
Error: stat github.com/ZupIT/horusec/: no such file or directory

Note that for me, I'm cloned into ~/github_forks/horusec.

Thoughts?

Signed-off-by: Anthony Turner <225599+anthturner@users.noreply.github.com>
Signed-off-by: Anthony Turner <225599+anthturner@users.noreply.github.com>
@anthturner
Copy link
Contributor Author

I'd appreciate your insight on an issue we've discussed in the past; "ruleId". For engines which don't have this, the SARIF validator fails with:

SARIF1010: runs[1].results[0]: This result contains neither of the properties 'ruleId' or 'rule.id'. The SARIF specification ([§3.27.5](https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317643)) requires at least one of these properties to be present.

In this case, would it make sense to make ruleId the same as the result text, just in the interest of meeting the specification?

@matheusalcantarazup
Copy link
Contributor

matheusalcantarazup commented Feb 1, 2022

I'm currently struggling with the next part of this, which is that make format seems to be expecting a very specific path format:

The gci release a new version with some breaking changes. I'll open a PR to fix this, but I found a bug on this new version too, so gci format will still broken after this PR. The tool that fix the linter issue still working, so you can run make format and commit the changes anyway. You already fix, linter is happy now.

In this case, would it make sense to make ruleId the same as the result text, just in the interest of meeting the specification?

I don't know about this, maybe tool name? I'm afraid of use the result text and make the rule id field to big.

cc @nathanmartinszup @iancardosozup

@iancardosozup
Copy link
Contributor

iancardosozup commented Feb 1, 2022

Reading the sarif converter error @anthturner mentioned i found some rules we should base on:

Not all existing analysis tools emit the equivalent of a ruleId in their output. A SARIF converter which converts the output of such an analysis tool to the SARIF format SHOULD synthesize ruleId from other information available in the analysis tool's output.

Each SARIF converter might synthesize ruleId in a different way. Therefore, a SARIF consumer SHOULD NOT attempt to compare or combine the output from different converters for the same analysis tool. See Appendix D for more information about production of SARIF by converters.

If the input data does not include an equivalent for any SARIF element, a converter MAY attempt to synthesize that element. (For example, a converter might heuristically extract a rule id from the text of an unstructured error message.)

Since each converter might synthesize SARIF elements differently (notably the rule id; see §3.27.5), a SARIF consumer SHOULD NOT attempt to combine results produced by different converters for the same tool.

based on that i suggest to review every tool that does not generate an rule id and generate id based on some retrieved information( i don't think we can be generic here ), but we have to ensure this information remains stable between our versions ( so we're back to #949 )

@anthturner
Copy link
Contributor Author

Reading the sarif converter error @anthturner mentioned i found some rules we should base on:

Not all existing analysis tools emit the equivalent of a ruleId in their output. A SARIF converter which converts the output of such an analysis tool to the SARIF format SHOULD synthesize ruleId from other information available in the analysis tool's output.

Each SARIF converter might synthesize ruleId in a different way. Therefore, a SARIF consumer SHOULD NOT attempt to compare or combine the output from different converters for the same analysis tool. See Appendix D for more information about production of SARIF by converters.

If the input data does not include an equivalent for any SARIF element, a converter MAY attempt to synthesize that element. (For example, a converter might heuristically extract a rule id from the text of an unstructured error message.)

Since each converter might synthesize SARIF elements differently (notably the rule id; see §3.27.5), a SARIF consumer SHOULD NOT attempt to combine results produced by different converters for the same tool.

based on that i suggest to review every tool that does not generate an rule id and generate id based on some retrieved information( i don't think we can be generic here ), but we have to ensure this information remains stable between our versions ( so we're back to #949 )

Reference #967 for addition of many RuleIDs

@wiliansilvazup
Copy link
Contributor

thank you very much for your contribution @anthturner, it was really an excellent job done here. I'm very happy that more people are interested in our project. You rock 🚀

@nathanmartinszup
Copy link
Contributor

Reading the sarif converter error @anthturner mentioned i found some rules we should base on:
Not all existing analysis tools emit the equivalent of a ruleId in their output. A SARIF converter which converts the output of such an analysis tool to the SARIF format SHOULD synthesize ruleId from other information available in the analysis tool's output.
Each SARIF converter might synthesize ruleId in a different way. Therefore, a SARIF consumer SHOULD NOT attempt to compare or combine the output from different converters for the same analysis tool. See Appendix D for more information about production of SARIF by converters.
If the input data does not include an equivalent for any SARIF element, a converter MAY attempt to synthesize that element. (For example, a converter might heuristically extract a rule id from the text of an unstructured error message.)
Since each converter might synthesize SARIF elements differently (notably the rule id; see §3.27.5), a SARIF consumer SHOULD NOT attempt to combine results produced by different converters for the same tool.
based on that i suggest to review every tool that does not generate an rule id and generate id based on some retrieved information( i don't think we can be generic here ), but we have to ensure this information remains stable between our versions ( so we're back to #949 )

Reference #967 for addition of many RuleIDs

I'm not sure how we should proceed with this. How useful will this information be for the user? Do we need to print this information? Maybe it should be something unique to the sarif parser.

@matheusalcantarazup
Copy link
Contributor

@anthturner can we merge this PR and focus the discussion about rule id on #967? There is something more that we need to discuss here about Sarif output specifically?

@anthturner
Copy link
Contributor Author

@anthturner can we merge this PR and focus the discussion about rule id on #967? There is something more that we need to discuss here about Sarif output specifically?

Absolutely feel free to merge this; I assumed we'd move to the #967 discussion anyways :)

@matheusalcantarazup matheusalcantarazup merged commit 522076a into ZupIT:main Feb 2, 2022
@matheusalcantarazup
Copy link
Contributor

Thanks very much @anthturner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This issue is related to a new feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SARIF Output Support
5 participants