Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add SARIF output support #9078

Merged
merged 15 commits into from
Dec 13, 2023
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion crates/ruff_cli/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,7 @@ pub fn check(args: CheckCommand, log_level: LogLevel) -> Result<ExitStatus> {
if cli.statistics {
printer.write_statistics(&diagnostics, &mut summary_writer)?;
} else {
printer.write_once(&diagnostics, &mut summary_writer)?;
printer.write_once(&diagnostics, &mut summary_writer, &pyproject_config)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one potential issue here is that pyproject_config isn't guaranteed to contain the correct set of enabled rules for all files. Ruff supports hierarchical configuration, so you can have different configuration files that apply to different subdirectories in your project. The pyproject_config here really just represents the settings in the current working directory.

Could we infer the set of rules from the diagnostic messages? (What is this used for on the SARIF side?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha--thanks for that context. SARIF lists the rules applied as part of the tool description so that you can be sure you're getting an apples to apples comparison, rather than just saying both times you checked with a particular rust version.

While the field is not mandatory in the SARIF specification Section 3.19.23 rules property it is required by Github. Also see: Understand Rules and Results

I'm not exactly sure how github would handle having the rules be supplied as only the ones for which diagnostics could be found as that's not documented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could we just include all rules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including all rules is probably the safest bet. In my experience with code scanning on github, I've never looked at what checks were done, only the findings, so this is probably closest to the SARIF expectation. My gut is that most people have most rules applied, and only turn off a handful, but I could be wrong, especially for older/larger projects.

Only downsides are if someone complains and then we change it, if others are using it in their CI, the rules set will change, which again, not sure how bad that is and that the file size will be a bit larger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try all-rules for now. You can iterate over them with Rule::iter().

}

if !cli.exit_zero {
Expand Down
9 changes: 8 additions & 1 deletion crates/ruff_cli/src/printer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@ use anyhow::Result;
use bitflags::bitflags;
use colored::Colorize;
use itertools::{iterate, Itertools};
use ruff_workspace::resolver::PyprojectConfig;
use serde::Serialize;

use ruff_linter::fs::relativize_path;
use ruff_linter::logging::LogLevel;
use ruff_linter::message::{
AzureEmitter, Emitter, EmitterContext, GithubEmitter, GitlabEmitter, GroupedEmitter,
JsonEmitter, JsonLinesEmitter, JunitEmitter, PylintEmitter, TextEmitter,
JsonEmitter, JsonLinesEmitter, JunitEmitter, PylintEmitter, SarifEmitter, TextEmitter,
};
use ruff_linter::notify_user;
use ruff_linter::registry::{AsRule, Rule};
Expand Down Expand Up @@ -210,6 +211,7 @@ impl Printer {
&self,
diagnostics: &Diagnostics,
writer: &mut dyn Write,
config: &PyprojectConfig,
) -> Result<()> {
if matches!(self.log_level, LogLevel::Silent) {
return Ok(());
Expand Down Expand Up @@ -291,6 +293,11 @@ impl Printer {
SerializationFormat::Azure => {
AzureEmitter.emit(writer, &diagnostics.messages, &context)?;
}
SerializationFormat::Sarif => {
SarifEmitter::default()
.with_applied_rules(config.settings.linter.rules)
.emit(writer, &diagnostics.messages, &context)?;
}
}

writer.flush()?;
Expand Down
1 change: 1 addition & 0 deletions crates/ruff_linter/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ toml = { workspace = true }
typed-arena = { version = "2.0.2" }
unicode-width = { workspace = true }
unicode_names2 = { workspace = true }
url = { version = "2.2.2" }
wsl = { version = "0.1.0" }

[dev-dependencies]
Expand Down
4 changes: 3 additions & 1 deletion crates/ruff_linter/src/codes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@
/// `--select`. For pylint this is e.g. C0414 and E0118 but also C and E01.
use std::fmt::Formatter;

use serde::Serialize;

use crate::registry::{AsRule, Linter};
use crate::rule_selector::is_single_rule_selector;
use crate::rules;

use strum_macros::{AsRefStr, EnumIter};

#[derive(PartialEq, Eq, PartialOrd, Ord)]
#[derive(PartialEq, Eq, PartialOrd, Ord, Serialize)]
pub struct NoqaCode(&'static str, &'static str);

impl NoqaCode {
Expand Down
2 changes: 2 additions & 0 deletions crates/ruff_linter/src/message/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ use ruff_diagnostics::{Diagnostic, DiagnosticKind, Fix};
use ruff_notebook::NotebookIndex;
use ruff_source_file::{SourceFile, SourceLocation};
use ruff_text_size::{Ranged, TextRange, TextSize};
pub use sarif::SarifEmitter;
pub use text::TextEmitter;

mod azure;
Expand All @@ -28,6 +29,7 @@ mod json;
mod json_lines;
mod junit;
mod pylint;
mod sarif;
mod text;

#[derive(Debug, PartialEq, Eq)]
Expand Down
205 changes: 205 additions & 0 deletions crates/ruff_linter/src/message/sarif.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
use std::io::Write;
use url::Url;

use serde::{Serialize, Serializer};
use serde_json::json;

use ruff_source_file::OneIndexed;

use crate::fs::normalize_path;
use crate::message::{Emitter, EmitterContext, Message};
use crate::registry::{AsRule, Linter, Rule, RuleNamespace};
use crate::settings::rule_table::RuleTable;
use crate::VERSION;

#[derive(Default)]
pub struct SarifEmitter<'a> {
applied_rules: Vec<SarifRule<'a>>,
}

impl SarifEmitter<'_> {
#[must_use]
pub fn with_applied_rules(mut self, rule_table: RuleTable) -> Self {
let mut applied_rules = Vec::new();

for rule in rule_table.iter_enabled() {
applied_rules.push(SarifRule::from_rule(rule));
}
self.applied_rules = applied_rules;
self
}
}

impl Emitter for SarifEmitter<'_> {
fn emit(
&mut self,
writer: &mut dyn Write,
messages: &[Message],
_context: &EmitterContext,
) -> anyhow::Result<()> {
let results = messages
.iter()
.map(SarifResult::from_message)
.collect::<Vec<_>>();

let output = json!({
"$schema": "https://json.schemastore.org/sarif-2.1.0.json",
"version": "2.1.0",
"runs": [{
"tool": {
"driver": {
"name": "ruff",
"informationUri": "https://github.com/astral-sh/ruff",
"rules": self.applied_rules,
"version": VERSION.to_string(),
}
},
"results": results,
}],
});
serde_json::to_writer_pretty(writer, &output)?;
Ok(())
}
}

#[derive(Debug, Clone)]
struct SarifRule<'a> {
name: &'a str,
code: String,
linter: &'a str,
summary: &'a str,
explanation: Option<&'a str>,
url: Option<String>,
}

impl<'a> SarifRule<'a> {
fn from_rule(rule: Rule) -> Self {
let code = rule.noqa_code().to_string();
let (linter, _) = Linter::parse_code(&code).unwrap();
Self {
name: rule.into(),
code,
linter: linter.name(),
summary: rule.message_formats()[0],
explanation: rule.explanation(),
url: rule.url(),
}
}
}

impl Serialize for SarifRule<'_> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
json!({
"id": self.code,
"shortDescription": {
"text": self.summary,
},
"fullDescription": {
"text": self.explanation,
},
"helpUri": self.url,
"properties": {
"id": self.code,
"kind": self.linter,
"name": self.name,
"problem.severity": "error".to_string(),
},
})
.serialize(serializer)
}
}

#[derive(Debug)]
struct SarifResult {
charliermarsh marked this conversation as resolved.
Show resolved Hide resolved
rule: Rule,
level: String,
message: String,
uri: String,
start_line: OneIndexed,
start_column: OneIndexed,
}

impl SarifResult {
#[cfg(not(target_arch = "wasm32"))]
fn from_message(message: &Message) -> Self {
let start_location = message.compute_start_location();
let abs_filepath = normalize_path(message.filename());
Self {
rule: message.kind.rule(),
level: "error".to_string(),
message: message.kind.name.clone(),
uri: Url::from_file_path(abs_filepath).unwrap().to_string(),
start_line: start_location.row,
start_column: start_location.column,
}
}
#[cfg(target_arch = "wasm32")]
fn from_message(message: &Message) -> Self {
let start_location = message.compute_start_location();
let abs_filepath = normalize_path(message.filename());
Self {
rule: message.kind.rule(),
level: "error".to_string(),
message: message.kind.name.clone(),
uri: abs_filepath.display().to_string(),
charliermarsh marked this conversation as resolved.
Show resolved Hide resolved
start_line: start_location.row,
start_column: start_location.column,
}
}
}

impl Serialize for SarifResult {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
json!({
"level": self.level,
"message": {
"text": self.message,
},
"locations": [{
"physicalLocation": {
"artifactLocation": {
"uri": self.uri,
},
"region": {
"startLine": self.start_line,
"startColumn": self.start_column,
}
}
}],
"ruleId": self.rule.noqa_code().to_string(),
})
.serialize(serializer)
}
}

#[cfg(test)]
mod tests {

use crate::message::tests::{capture_emitter_output, create_messages};
use crate::message::SarifEmitter;

fn get_output() -> String {
let mut emitter = SarifEmitter::default();
capture_emitter_output(&mut emitter, &create_messages())
}

#[test]
fn valid_json() {
let content = get_output();
serde_json::from_str::<serde_json::Value>(&content).unwrap();
}

#[test]
fn test_results() {
let content = get_output();
let sarif = serde_json::from_str::<serde_json::Value>(content.as_str()).unwrap();
let results = sarif["runs"][0]["results"].as_array().unwrap();
assert_eq!(results.len(), 3);
}
}
4 changes: 2 additions & 2 deletions crates/ruff_linter/src/registry/rule_set.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ const RULESET_SIZE: usize = 12;
/// A set of [`Rule`]s.
///
/// Uses a bitset where a bit of one signals that the Rule with that [u16] is in this set.
#[derive(Clone, Default, CacheKey, PartialEq, Eq)]
#[derive(Clone, Default, CacheKey, PartialEq, Eq, Copy)]
pub struct RuleSet([u64; RULESET_SIZE]);

impl RuleSet {
Expand Down Expand Up @@ -257,7 +257,7 @@ impl RuleSet {
/// ```
pub fn iter(&self) -> RuleSetIterator {
RuleSetIterator {
set: self.clone(),
set: *self,
index: 0,
}
}
Expand Down
4 changes: 2 additions & 2 deletions crates/ruff_linter/src/settings/rule_table.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use ruff_macros::CacheKey;
use crate::registry::{Rule, RuleSet, RuleSetIterator};

/// A table to keep track of which rules are enabled and whether they should be fixed.
#[derive(Debug, CacheKey, Default)]
#[derive(Debug, CacheKey, Default, Copy, Clone)]
pub struct RuleTable {
/// Maps rule codes to a boolean indicating if the rule should be fixed.
enabled: RuleSet,
Expand Down Expand Up @@ -66,7 +66,7 @@ impl FromIterator<Rule> for RuleTable {
fn from_iter<T: IntoIterator<Item = Rule>>(iter: T) -> Self {
let rules = RuleSet::from_iter(iter);
Self {
enabled: rules.clone(),
enabled: rules,
should_fix: rules,
}
}
Expand Down
1 change: 1 addition & 0 deletions crates/ruff_linter/src/settings/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,7 @@ pub enum SerializationFormat {
Gitlab,
Pylint,
Azure,
Sarif,
}

impl Default for SerializationFormat {
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -481,7 +481,7 @@ Options:
--ignore-noqa
Ignore any `# noqa` comments
--output-format <OUTPUT_FORMAT>
Output serialization format for violations [env: RUFF_OUTPUT_FORMAT=] [possible values: text, json, json-lines, junit, grouped, github, gitlab, pylint, azure]
Output serialization format for violations [env: RUFF_OUTPUT_FORMAT=] [possible values: text, json, json-lines, junit, grouped, github, gitlab, pylint, azure, sarif]
-o, --output-file <OUTPUT_FILE>
Specify file to write the linter output to (default: stdout)
--target-version <TARGET_VERSION>
Expand Down
3 changes: 2 additions & 1 deletion ruff.schema.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading