Skip to content

Conversation

mbg
Copy link
Member

@mbg mbg commented Sep 29, 2025

This PR changes the approach used by upload-sarif to simplify the complexity of the implementation:

Instead of using findAndUpload, which finds files relevant to an analysis and uploads them, for Code Scanning and Code Quality in turn, the implementation in this PR is based around a new getGroupedSarifFilePaths function which:

  • Finds all .sarif files.
  • Decides which analysis they belong to.

We then loop through the results of getGroupedSarifFilePaths and upload the SARIF files to the respective endpoint.

Risk assessment

For internal use only. Please select the risk level of this change:

  • Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Consider adding a changelog entry for this change.
  • Confirm the readme and docs have been updated if necessary.

@mbg mbg self-assigned this Sep 29, 2025
@mbg mbg force-pushed the mbg/upload-sarif/find-then-filter branch from 6b42eb8 to 2ba3c8b Compare September 29, 2025 12:04
Base automatically changed from mbg/upload-sarif/add-tests to main September 29, 2025 14:06
@mbg mbg force-pushed the mbg/upload-sarif/find-then-filter branch from 2ba3c8b to 93711d3 Compare September 29, 2025 14:07
@mbg mbg requested review from esbena and henrymercer September 29, 2025 14:11
@mbg mbg marked this pull request as ready for review September 29, 2025 14:11
@mbg mbg requested a review from a team as a code owner September 29, 2025 14:11
@Copilot Copilot AI review requested due to automatic review settings September 29, 2025 14:11
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the upload-sarif Action's approach by introducing a "find, then filter" strategy that simplifies the implementation while maintaining backward compatibility.

  • Replaces the complex findAndUpload function with a new getGroupedSarifFilePaths function that finds all SARIF files first, then categorizes them by analysis type
  • Streamlines the upload logic by processing grouped SARIF files in a single loop
  • Moves category fixing logic into the analysis configuration objects for better organization

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/util.ts Adds typed helper functions for Object.keys and Object.entries
src/upload-sarif.ts Refactors main upload logic to use new grouping approach, removes findAndUpload function
src/upload-sarif.test.ts Updates tests to reflect new API structure and removes findAndUpload tests
src/upload-lib.ts Implements new getGroupedSarifFilePaths function and moves category fixing to uploadSpecifiedFiles
src/upload-lib.test.ts Adds comprehensive tests for the new grouping functionality
src/analyze.ts Updates to use analysis-specific category fixing method
src/analyze-action.ts Simplifies quality upload by removing duplicate category fixing
src/analyses.ts Adds fixCategory method to analysis configs and introduces helper functions
Generated JS files Compiled output reflecting TypeScript changes

} else {
for (const analysisConfig of analyses.SarifScanOrder) {
if (
analysisConfig.kind === analyses.AnalysisKind.CodeScanning ||
Copy link
Preview

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded check for CodeScanning analysis kind breaks the abstraction pattern. Consider adding a isDefaultAnalysis property to the analysis configuration or restructuring the logic to avoid special-casing specific analysis types.

Suggested change
analysisConfig.kind === analyses.AnalysisKind.CodeScanning ||
analysisConfig.isDefaultAnalysis ||

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@esbena esbena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great unification.
I got hung up on the clarity of getGroupedSarifFilePaths, but I'm confident that it is correct as is though.

src/util.ts Outdated
export function entriesTyped<T extends Record<string, any>>(
object: T,
): Array<[keyof T, NonNullable<T[keyof T]>]> {
return Object.entries(object) as Array<[keyof T, NonNullable<T[keyof T]>]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NonNullable<T[keyof T]> is an undocumented refinement that is unsound.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I have been thinking about this a good bit yesterday and I am not entirely sure about the best solution.

I initially thought that maybe using Exclude<T[keyof T], undefined> instead would be better, but then remembered that we can explicitly set a key to undefined in which case Object.entries still returns a pair for the key and undefined as the value.

I am now thinking that perhaps it would be better to explicitly filter the results of these functions to exclude undefined values and keys that don't belong to T?

src/util.ts Outdated
Comment on lines 1290 to 1302

/** Like `Object.keys`, but infers the correct key type. */
export function keysTyped<T extends Record<string, any>>(
object: T,
): Array<keyof T> {
return Object.keys(object) as Array<keyof T>;
}

/** Like `Object.entries`, but infers the correct key type. */
export function entriesTyped<T extends Record<string, any>>(
object: T,
): Array<[keyof T, NonNullable<T[keyof T]>]> {
return Object.entries(object) as Array<[keyof T, NonNullable<T[keyof T]>]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the naming / docstring to be different here for both keysTyped and entriesTyped. Perhaps something with strict/exact/...? And definitely not something with "correct", since the builtin Object.keys is actually the one that is correct already.
See https://stackoverflow.com/questions/55012174/why-doesnt-object-keys-return-a-keyof-type-in-typescript


const results = {};

if (stats.isDirectory()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm my understanding here for the directory case.

  1. find all .sarif files recursively below sarifPath, define as sarifFiles
  2. pick the first analysis, and filter sarifFiles to those that match the first analysis' sarif predicate, define as files
  3. store files for the current analysis
  4. remove files from sarifFiles
  5. repeat step 2-4 for the remaining analyses
  6. warn about a non-empty sarifFiles since that implies some .sarif files did not match any analysis' sarif predicate.

I find the modification in step 4. and/or the naming of sarifFiles/files to be confusing. Abstractly, my issue is that sarifFiles is treated as workitem queue, and that files really are analysis specific but that none of those properties are clear from the naming.

I'm suggestion an alternative that processes the sarifFiles differently.
I do not worry about the performance, this is purely the human perspective.

Minimal version without any logging:

    sarifFiles.forEach((sarifFile) => {
      // classify each file into exactly one analysis kind
      for(const analysisConfig of analyses.SarifScanOrder) {
        if (analysisConfig.sarifPredicate(sarifFile)) {
          results[analysisConfig.kind] = results[analysisConfig.kind] || [];
          results[analysisConfig.kind].push(sarifFile);
          return;
        }
      }
    });

Full version with logging that is similar to the current setup

sarifFiles.forEach((sarifFile) => {
      // classify each file into exactly one analysis kind, or warn if it doesn't match any
      for(const analysisConfig of analyses.SarifScanOrder) {
        if (analysisConfig.sarifPredicate(sarifFile)) {
          logger.debug(
            `Using '${sarifFile}' as a SARIF file for ${analysisConfig.name}.`,
          );
          results[analysisConfig.kind] = results[analysisConfig.kind] || [];
          results[analysisConfig.kind].push(sarifFile);
          return;
        }
      }
      logger.debug(`'${sarifFile}' does not belong to any analysis.`)
    });

    // warn if any analysis didn't get any files
    for (const analysisConfig of analyses.SarifScanOrder) {
      if (!(analysisConfig.kind in results)) {
        logger.warning(
          `No SARIF files found for ${analysisConfig.name} in ${sarifPath}.`,
        );
      }
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm my understanding here for the directory case.

Your understanding is correct.

I'm suggestion an alternative that processes the sarifFiles differently.

Your point about the naming is fair and your alternative is a further improvement over the current implementation, I think. I'll play around with it and push some changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having played around with this a bit yesterday, I think I'll stick with the current implementation and just rename the variables.

Your version is nicer on paper I think, but I ran into some annoyances while trying to make it work: the logic relies on being able to return from the (anonymous) function given to forEach, but we don't allow forEach. So it has to be a for-loop and then we need to track whether we have found an analysis for sarifFile. Then TypeScript isn't quite smart to know that if we assign results[analysisConfig.kind] to something in a loop, that we can safely call push on the next line. So it ends up being just a bit clunkier than it should be.

@mbg mbg requested a review from esbena October 1, 2025 14:29
@mbg mbg merged commit 10feb5d into main Oct 2, 2025
377 of 460 checks passed
@mbg mbg deleted the mbg/upload-sarif/find-then-filter branch October 2, 2025 10:51
@github-actions github-actions bot mentioned this pull request Oct 2, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants