Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to parse diff when a file has many (or no) changes #274

Closed
rturrado opened this issue Oct 17, 2024 · 8 comments · Fixed by #277
Closed

Fails to parse diff when a file has many (or no) changes #274

rturrado opened this issue Oct 17, 2024 · 8 comments · Fixed by #277
Labels
bug Something isn't working

Comments

@rturrado
Copy link

What events trigger your workflow?

on:
  push:
    branches:
      - develop
      - master
  pull_request:

What OS does your workflow use?

runs-on: ubuntu-latest

How is cpp-linter-action configured?

jobs:
  cpp-linters:
    name: "C++ linters"
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Run C++ linters
        uses: cpp-linter/cpp-linter-action@v2
        id: linter
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        with:
          extensions: 'cpp,hpp'
          format-review: 'true'
          style: 'file'  # use .clang-format config file
          tidy-checks: '-*'  # disable clang-tidy checks
          version: 18
      - name: Fail fast
        if: steps.linter.outputs.clang-format-checks-failed > 0
        run: |
          echo "::notice::Try executing 'python3 ./scripts/run_cpp_linters.py .' to fix linter issues."
          exit 1

I initially had also the option `files-changed-only: 'false'`. I took it out but the error remains.

What was the unexpected behavior?

I'm getting this error from GitHub Actions:

ERROR:CPP Linter:response returned 406 from GET https://api.github.com/repos/QuTech-Delft/libqasm/pulls/263 with message: {"message":"Sorry, the diff exceeded the maximum number of files (300). Consider using 'List pull requests files' API or locally cloning the repository instead.","errors":

You can see the full log here: https://github.com/QuTech-Delft/libqasm/actions/runs/11381786455/job/31663943797

I'm updating a master branch from a develop branch, and the PR includes over 500 files. But I don't see how that can affect the C++ linters.

@rturrado
Copy link
Author

A possible solution from people experiencing the same issue: reviewdog/reviewdog#1696 (comment)

rturrado added a commit to QuTech-Delft/libqasm that referenced this issue Oct 17, 2024
Until we have a fix for this cpp-linter-action issue:
cpp-linter/cpp-linter-action#274
rturrado added a commit to QuTech-Delft/libqasm that referenced this issue Oct 17, 2024
@2bndy5
Copy link
Collaborator

2bndy5 commented Oct 17, 2024

According to the logs, the real problem is that a file in the list of changed files did not include the patch info. Let's walk through the logs.

ERROR:CPP Linter:response returned 406 from GET https://api.github.com/repos/QuTech-Delft/libqasm/pulls/263 with message: {"message":"Sorry, the diff exceeded the maximum number of files (300). Consider using 'List pull requests files' API or locally cloning the repository instead.","errors":[{"resource":"PullRequest","field":"diff","code":"too_large"}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#list-pull-requests-files","status":"406"}
INFO:CPP Linter:Could not get raw diff of the pull_request event. Perhaps there are too many changes?

This means the at you hit the problem reported in #249. The solution to that was to fall back to using GitHub's REST API in which each file's diff is parsed individually.

ERROR:CPP Linter:Using pure python to parse diff because pygit2 failed!
WARNING:CPP Linter:pygit2.Diff.parse_diff() threw invalid patch instruction at line ...

These repeated errors and warnings just mean that libgit2 (python binding) failed to parse the files' individual diff (which we have to assemble on the fly with limited information about each file's patch). This uses brute force regex patterns instead of libgit2's diff parsing algorithm.

But then we find a file that doesn't include any patch information:

Traceback (most recent call last):
    File "/home/runner/work/_actions/cpp-linter/cpp-linter-action/v2/venv/bin/cpp-linter", line 8, in <module>
      sys.exit(main())
    File "/home/runner/work/_actions/cpp-linter/cpp-linter-action/v2/venv/lib/python3.10/site-packages/cpp_linter/__init__.py", line 46, in main
      files = rest_api_client.get_list_of_changed_files(
    File "/home/runner/work/_actions/cpp-linter/cpp-linter-action/v2/venv/lib/python3.10/site-packages/cpp_linter/rest_api/github_api.py", line 108, in get_list_of_changed_files
      return self._get_changed_files_paginated(
    File "/home/runner/work/_actions/cpp-linter/cpp-linter-action/v2/venv/lib/python3.10/site-packages/cpp_linter/rest_api/github_api.py", line 144, in _get_changed_files_paginated
      assert "patch" in file
  AssertionError

This is one of the reasons we originally switched to getting the raw diff of the event instead of parsing each changed file's patch. It means that GitHub refused to give us the patch for a file with too many changes.

Workaround

If you expect all your PRs to have a large diff, then you should disable any features that rely on having diff information.

- uses: cpp-linter/cpp-linter-action@v2
  id: linter
  with:
    lines-changed-only: false
    files-changed-only: false
    format-review: false
    tidy-review: false

Possible solution on our end

I guess we could silently fail when a file's diff is unknown, but that will cause further unexpected behavior (like unreported syntax/format errors in files with too many changes). If GitHub won't give us the information we need, then we can't be expected to operate correctly. I have to investigate more...

I think it would simply be better to improve the error message output. We should point to the file that didn't include the patch in the REST API's response.

@2bndy5 2bndy5 changed the title Linters passing on develop but failing when syncing master from develop Fails to parse diff when a file has many changes Oct 17, 2024
@2bndy5
Copy link
Collaborator

2bndy5 commented Oct 17, 2024

A possible solution from people experiencing the same issue: reviewdog/reviewdog#1696 (comment)

This didn't help. The solution there was specific to reviewdog software.

@2bndy5
Copy link
Collaborator

2bndy5 commented Oct 17, 2024

The file in which Github didn't provide the patch info is named "include/libqasm/tree.hpp". According to the PR's diff, this file was renamed with no content changes. This means we need to add a check for content changes when trying to parse a diff for files without content changes.

Should renamed files (with no code changes) get skipped when lines-changed-only: true?

My gut reaction says "yes".

@2bndy5 2bndy5 added the bug Something isn't working label Oct 17, 2024
@2bndy5 2bndy5 changed the title Fails to parse diff when a file has many changes Fails to parse diff when a file has many (or no) changes Oct 17, 2024
@2bndy5 2bndy5 linked a pull request Oct 18, 2024 that will close this issue
@2bndy5
Copy link
Collaborator

2bndy5 commented Oct 18, 2024

I just published v2.13.3 with included fix. @rturrado you're next CI run in QuTech-Delft/libqasm#263 should automatically pull in the new release's changes.

@rturrado
Copy link
Author

rturrado commented Oct 18, 2024

@2bndy5 wow, great thanks! That was a lightning fast support!

Just a question. I saw your comments yesterday, but it was late to answer. In one of them you were suggesting to keep a few attributes set to false. In other one you were suggesting to keep lines-changed-only to true. Does anyone of those comments still hold after the fix?

@2bndy5
Copy link
Collaborator

2bndy5 commented Oct 18, 2024

In one of them you were suggesting to keep a few attributes set to false.

That suggestion was a workaround that avoids needing a diff.

In other one you were suggesting to keep line-changes-only to true.

I think that suggestion was from a separate discussion about using the PR review feature for more concise feedback.

Summary

The PR review features (format-review and tidy-review) needs diff information to understand where in the "files changed" tab to post suggestions.

The workaround that avoids using a diff should not be needed with v2.13.3. But it might be a good idea to keep in mind if you keep submitting PRs that have either

  • more than 300 files; 3000 is the absolute maximum number of changed files that GitHub will report via REST API
  • any single changed file with a massive amount of changes (like over 1000 lines); GitHub will simply not provide such patch information (via REST API) no matter what we do.

@rturrado
Copy link
Author

Perfect, I think I'll keep lines-changed-only, files-changed-only, and format-review set to false (default values) to avoid further problems.

With that and your last fix we're playing safe.

Thanks for all the detailed explanations!

rturrado added a commit to QuTech-Delft/libqasm that referenced this issue Oct 18, 2024
If we expect some of our PRs to have a large diff, then we should disable any features that rely on having diff information:
- lines-changed-only,
- files-changed-only,
- format-review, and
- tidy-review.

For a more detailed explanation: cpp-linter/cpp-linter-action#274 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants