-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SARIF Output Support #946
SARIF Output Support #946
Conversation
I might take a crack at adding |
I think that this would be hard to do, since we not control the results of the tools that we orchestrate.
This is because we not populate this field from tools that we orchestrate. This field should be unique, I don't know how we can control this "uniqueness" for this other tools. BTW, the schema says that I would like to suggest to using |
Oh, I think that I was using the wrong schema, this implementation is based on this documentation right? |
Correct. RuleID is a string rather than object. It's mostly just used for correlation between multiple files/lines which have the same violation type. |
See #949 for my thoughts on how we can introduce this. |
According to the documentation this field is optional, can we follow with this implementation separately from adding the rule id for all vulnerabilities? Maybe for this initial implementation we can leave this rule id empty (for vulnerability that was not found by horusec engine) and when we found a solution for #949 we will automatically have this value on output for all tools. I'm not finding the |
I'm fine dropping the URIs -- again, those were originally for my own purposes, since they can link out to corresponding doc pages. That's incredibly useful for developers to help them fix an error they may not understand. A lot of how I'm thinking of this visually can be seen by using: I've attached two sample SARIF files (as TXT, because of GitHub limitations) sample-a.sarif.txt -- This one uses These files can be used with the above viewer to visualize what this might look like in an external tool. You can probably see how scaling this out to what could be hundreds, if not thousands, of vulnerabilities across large targets (as in my use case) would be challenging without an adequate grouping mechanism. Just a thought. 😄 |
I guess this is a good approach to get sarif output in a quicker way, but i think #949 should be prioritized since we're gonna get an inconsistent output between our engine and tools that we orchestrate. |
This is definitely a problem, but I think that we should concentrate the discussion about the rule id on #949, and on this PR we ensure that we export the Sarif output correctly. |
I think that would be good add some unit tests, to ensure that we convert the analysis vulnerabilities to To test this, we can basically create an Please, ask any question if you have any problem to create these tests. |
There is some others linter warnings, you can check them here. |
@anthturner i think if you run |
Aside from the challenges of line lengths and function lengths with this one, I'm stuck on a number of similar errors to this: That occurs when I don't use the aliasing of the package on import; if I do, this doesn't appear. Googling it is not much help; was hoping someone could help me along so I can pass the linter. |
This error occurs when a package and a variable have the same name, as @matheusalcantarazup suggested you can rename the variables as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a documentation about this private functions that do the parsing using this guide? They seems a bit tricky do understand in first place.
I'm not convinced about this methods should have to many arguments, but for now I don't have any better idea, maybe embed on Sarif
struct, so these methods just use them?
i agree, if these fields are into Sarif struct the method |
This was a great suggestion and helped considerably. I added documentation as best as I could, including why each association map was necessary. It's admittedly a challenge, as the SARIF format requires very specific grouping of different elements which Horusec does not differentiate between in its report by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks very much for your contribution @anthturner
Some commits still failing because of DCO. I suggest you squash all 15 commits in a single one signed commit. To fix linter warnings you can run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for yoru contribution, you rock!
Awesome job @anthturner. Thanks for your contribution! |
Will do. One thing I noticed, if you want to add it to docs, is that the I'm currently struggling with the next part of this, which is that
Note that for me, I'm cloned into Thoughts? |
Signed-off-by: Anthony Turner <225599+anthturner@users.noreply.github.com>
Signed-off-by: Anthony Turner <225599+anthturner@users.noreply.github.com>
I'd appreciate your insight on an issue we've discussed in the past; "ruleId". For engines which don't have this, the SARIF validator fails with:
In this case, would it make sense to make |
The gci release a new version with some breaking changes. I'll open a PR to fix this, but I found a bug on this new version too, so gci format will still broken after this PR.
I don't know about this, maybe tool name? I'm afraid of use the result text and make the rule id field to big. |
Reading the sarif converter error @anthturner mentioned i found some rules we should base on: Not all existing analysis tools emit the equivalent of a ruleId in their output. A SARIF converter which converts the output of such an analysis tool to the SARIF format SHOULD synthesize ruleId from other information available in the analysis tool's output. Each SARIF converter might synthesize ruleId in a different way. Therefore, a SARIF consumer SHOULD NOT attempt to compare or combine the output from different converters for the same analysis tool. See Appendix D for more information about production of SARIF by converters. If the input data does not include an equivalent for any SARIF element, a converter MAY attempt to synthesize that element. (For example, a converter might heuristically extract a rule id from the text of an unstructured error message.) Since each converter might synthesize SARIF elements differently (notably the rule id; see §3.27.5), a SARIF consumer SHOULD NOT attempt to combine results produced by different converters for the same tool. based on that i suggest to review every tool that does not generate an rule id and generate id based on some retrieved information( i don't think we can be generic here ), but we have to ensure this information remains stable between our versions ( so we're back to #949 ) |
Reference #967 for addition of many RuleIDs |
thank you very much for your contribution @anthturner, it was really an excellent job done here. I'm very happy that more people are interested in our project. You rock 🚀 |
I'm not sure how we should proceed with this. How useful will this information be for the user? Do we need to print this information? Maybe it should be something unique to the sarif parser. |
@anthturner can we merge this PR and focus the discussion about rule id on #967? There is something more that we need to discuss here about Sarif output specifically? |
Absolutely feel free to merge this; I assumed we'd move to the #967 discussion anyways :) |
Thanks very much @anthturner |
Signed-off-by: Anthony Turner 225599+anthturner@users.noreply.github.com
What I did
Added SARIF-compatible output structures as an output option in the same vein as SonarQube
How to verify it
Use
-o sarif
as an option with Horusec to output a SARIF report- Description for the changelog
Adds SARIF output support
Want to note that this is not necessarily complete; there are several things which just don't exist in Horusec right now. For example, I notice not all of the engine modules have RuleIDs populated, and there is other metadata (such as URL) which need to have a lookup table or some other place to pull them from. This might mean authoring a
.csv
file to track the metadata or maybe embedding it into code somehow is better.Hopefully this at least helps get the conversation started.