Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock in Git.handleBinary #3375

Closed
wants to merge 1 commit into from

Conversation

rgmz
Copy link
Contributor

@rgmz rgmz commented Oct 6, 2024

Description:

This closes #3339.

cmd and catCmd are the same reference. It seems that explicitly calling cmd.Wait() could cause a deadlock where the deferred stdout.Close() was never run. Albeit, this could be merely fixing the 'symptom' and not actually curing the issue.

Edit: hmm, this seems produce the beloved signal: broken pipe error when calling cmd.Wait().

2024-10-05T21:16:08-04:00	error	trufflehog	error while waiting.	{"source_manager_worker_id": "VksFa", "unit": "https://github.com/cohere-ai/tokenizer.git", "unit_kind": "repo", "repo": "https://github.com/cohere-ai/tokenizer.git", "commit": "6ead2eb", "path": "tokenizer/tokenizer_go.so", "error": "signal: broken pipe"}
2024-10-05T21:16:08-04:00	error	trufflehog	error handling binary file	{"source_manager_worker_id": "VksFa", "unit": "https://github.com/cohere-ai/tokenizer.git", "unit_kind": "repo", "repo": "https://github.com/cohere-ai/tokenizer.git", "filename": "tokenizer/tokenizer_go.so", "commit": "6ead2eb2b9c67ae9dea31cc859edad88e75ae6dc", "file": "tokenizer/tokenizer_go.so", "error": "signal: broken pipe"}
2024-10-05T21:16:08-04:00	info-0	trufflehog	finished scanning	{"chunks": 2381, "bytes": 11359546, "verified_secrets": 0, "unverified_secrets": 0, "scan_duration": "5.686829476s", "trufflehog_version": "dev"}

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@rgmz rgmz requested review from a team as code owners October 6, 2024 00:51
@ahrav
Copy link
Collaborator

ahrav commented Oct 6, 2024

I initially chose not to call stdout.Close() because I saw the following comment in the documentation for StdoutPipe, or at least I think that's why. 🤔

I'm guessing maybe the command wasn't exiting which is what lead to the rogue processes Dustin might've been trying to address with his PR.

// StdoutPipe returns a pipe that will be connected to the command's
// standard output when the command starts.
//
// [Cmd.Wait] will close the pipe after seeing the command exit, so most callers
// need not close the pipe themselves. It is thus incorrect to call Wait
// before all reads from the pipe have completed.
// For the same reason, it is incorrect to call [Cmd.Run] when using StdoutPipe.
// See the example for idiomatic usage.

@rgmz
Copy link
Contributor Author

rgmz commented Oct 6, 2024

It seems like the process still doesn't exit after several hours, yet running the command manually it completes in seconds. From what I've read, it's possible that the buffer is being filled up and Wait() gets stuck because there's still more to write.

FWIW, it doesn't show up as Z in ps either.

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

I can reliably reproduce this hang by making HandleFile return immediately.

if reader == nil {
return fmt.Errorf("reader is nil")
}

This leads me to suspect that the hang is caused by the Read/Write of io.ReadCloser.

Edit: furthermore, changing executeCatFileCmd to simply write the files to disk causes it to run buttery smooth, as expected. No hanging.

func (s *Git) executeCatFileCmd(cmd *exec.Cmd, filepath string) (io.ReadCloser, error) {
	var stderr bytes.Buffer
	cmd.Stderr = &stderr

	stdout, err := cmd.StdoutPipe()
	if err != nil {
		return nil, fmt.Errorf("error running git cat-file: %w\n%s", err, stderr.Bytes())
	}

	if err := cmd.Start(); err != nil {
		return nil, fmt.Errorf("error starting git cat-file: %w\n%s", err, stderr.Bytes())
	}

	// Create or open the file to write the output
	fileName := strings.ReplaceAll(filepath, "/", "_")
	outFile, err := os.Create("/tmp/thog/" + fileName)
	if err != nil {
		return nil, fmt.Errorf("error creating file: '%s', %w", filepath, err)
	}
	defer outFile.Close()

	// Copy the command's output to the file
	if _, err := io.Copy(outFile, stdout); err != nil {
		return nil, fmt.Errorf("error copying output to file: %s, %w", filepath, err)
	}

	// Wait for the command to finish
	if err := cmd.Wait(); err != nil {
		return nil, fmt.Errorf("error waiting for command: %w", err)
	}

	return nil, nil

@rgmz rgmz marked this pull request as draft October 7, 2024 00:30
@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

Pulling this back into draft because I'm not comfortable that this actually fixes the problem.

@dustin-decker
Copy link
Contributor

Oh, ignore my approval, looks like we have more to figure out.

I introduced this regression with #3339.

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

I introduced this regression with #3339.

It looks like #3339 introduced the deadlock, based on a latent issue. I don't think the code was working 100% prior to that.

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

The issue doesn't occur if I revert #3351. I have no idea why.

Did that change introduce a regression or was it covering up a latent issue?

@ahrav
Copy link
Collaborator

ahrav commented Oct 7, 2024

I introduced this regression with #3339.

It looks like #3339 introduced the deadlock, based on a latent issue. I don't think the code was working 100% prior to that.

That's what I thought too 😅 . I’ll let Dustin respond. I vaguely remember mentions of processes not being killed, but I could be wrong.

@dustin-decker
Copy link
Contributor

Hmm, it does look like Wait should close all pipes, so I'm not sure #3339 actually introduced an issue, but it's not correct.

I've fixed that up in #3379 and added smoke tests for git and zombie processes remaining after a scan. It worked locally, let's see what happens in CI 🤞

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

The issue doesn't occur if I revert #3351. I have no idea why.

I still don't understand why adding the call to .Size() in HandleFile prevents the issue from occurring.

It seems like the issue is within the handler or bufferedreader code and that Git.handleBinary is just incidental.

@ahrav
Copy link
Collaborator

ahrav commented Oct 7, 2024

The issue doesn't occur if I revert #3351. I have no idea why.

I still don't understand why adding the call to .Size() in HandleFile prevents the issue from occurring.

It seems like the issue is within the handler or bufferedreader code and that Git.handleBinary is just incidental.

I suspect there's an issue with reading from the stdout pipe and buffering the result. I noticed that reading the entire stdout content at once prevents it from hanging.

Ex:

// newFileReader creates a fileReader from an io.Reader, optionally using BufferedFileWriter for certain formats.
func newFileReader(r io.Reader) (fileReader, error) {
	var fReader fileReader

	data, err := io.ReadAll(r)
	if err != nil {
		return fReader, fmt.Errorf("unable to read file: %w", err)
	}

	fReader.BufferedReadSeeker = iobuf.NewBufferedReaderSeeker(bytes.NewReader(data))
..
}

I created some tests to see if I could reproduce it but the tests pass event without consuming the entire reader with the snippet above..

func TestHandleGitCatFileLargeBlob(t *testing.T) {
	fileName := "largefile.bin"
	fileSize := 50 * 1024 * 1024 // 50 MB

	// Set up a temporary git repository with a large file
	gitDir := setupTempGitRepo(t, fileName, fileSize)
	defer os.RemoveAll(gitDir)

	cmd := exec.Command("git", "-C", gitDir, "rev-parse", "HEAD")
	hashBytes, err := cmd.Output()
	assert.NoError(t, err, "Failed to get commit hash")
	commitHash := strings.TrimSpace(string(hashBytes))

	// Create a pipe to simulate the git cat-file stdout
	cmd = exec.Command("git", "-C", gitDir, "cat-file", "blob", fmt.Sprintf("%s:%s", commitHash, fileName))

	var stderr bytes.Buffer
	cmd.Stderr = &stderr

	stdout, err := cmd.StdoutPipe()
	assert.NoError(t, err, "Failed to create stdout pipe")

	err = cmd.Start()
	assert.NoError(t, err, "Failed to start git cat-file command")

	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()

	chunkCh := make(chan *sources.Chunk, 1000) // Adjust buffer size as needed

	go func() {
		defer close(chunkCh)
		err := HandleFile(ctx, stdout, &sources.Chunk{}, sources.ChanReporter{Ch: chunkCh}, WithSkipArchives(false))
		assert.NoError(t, err, "HandleFile should not return an error")
	}()

	err = cmd.Wait()
	assert.NoError(t, err, "git cat-file command should complete without error")

	count := 0
	for range chunkCh {
		count++
	}

	expectedChunks := 100 // This needs to be updated
	assert.Equal(t, expectedChunks, count, "Number of chunks should match the expected value")
}

func setupTempGitRepo(t *testing.T, archiveName string, fileSize int) string {
	tempDir := t.TempDir()

	// Initialize the Git repository
	cmd := exec.Command("git", "init", tempDir)
	var initStderr bytes.Buffer
	cmd.Stderr = &initStderr
	err := cmd.Run()
	if err != nil {
		t.Fatalf("Failed to initialize git repository: %v, stderr: %s", err, initStderr.String())
	}

	archivePath := filepath.Join(tempDir, archiveName)

	zipFile, err := os.Create(archivePath)
	if err != nil {
		t.Fatalf("Failed to create archive file: %v", err)
	}

	zipWriter := zip.NewWriter(zipFile)

	innerFileName := "largefile.txt"

	// Create a new file entry in the ZIP archive with no compression
	header := &zip.FileHeader{
		Name:   innerFileName,
		Method: zip.Store, // No compression
	}
	zipFileWriter, err := zipWriter.CreateHeader(header)
	if err != nil {
		t.Fatalf("Failed to create file in ZIP archive: %v", err)
	}

	dataChunk := bytes.Repeat([]byte("A"), 1024) // 1KB chunk
	totalWritten := 0
	for totalWritten < fileSize {
		remaining := fileSize - totalWritten
		if remaining < len(dataChunk) {
			_, err = zipFileWriter.Write(dataChunk[:remaining])
			if err != nil {
				t.Fatalf("Failed to write to inner file in ZIP archive: %v", err)
			}
			totalWritten += remaining
		} else {
			_, err = zipFileWriter.Write(dataChunk)
			if err != nil {
				t.Fatalf("Failed to write to inner file in ZIP archive: %v", err)
			}
			totalWritten += len(dataChunk)
		}
	}

	if err := zipWriter.Close(); err != nil {
		zipFile.Close()
		t.Fatalf("Failed to close ZIP writer: %v", err)
	}

	if err := zipFile.Close(); err != nil {
		t.Fatalf("Failed to close ZIP file: %v", err)
	}

	// Verify the ZIP archive's integrity
	verifyCmd := exec.Command("unzip", "-t", archivePath)
	verifyOutput, err := verifyCmd.CombinedOutput()
	if err != nil {
		t.Fatalf("Failed to verify ZIP archive: %v, output: %s", err, string(verifyOutput))
	}

	// Add the archive to the Git repository
	cmd = exec.Command("git", "-C", tempDir, "add", archiveName)
	var addStderr bytes.Buffer
	cmd.Stderr = &addStderr
	err = cmd.Run()
	if err != nil {
		t.Fatalf("Failed to add archive to git: %v, stderr: %s", err, addStderr.String())
	}

	cmd = exec.Command("git", "-C", tempDir, "commit", "-m", "Add archive file")
	var commitStderr bytes.Buffer
	cmd.Stderr = &commitStderr
	err = cmd.Run()
	if err != nil {
		t.Fatalf("Failed to commit archive to git: %v, stderr: %s", err, commitStderr.String())
	}

	return tempDir
}

@ahrav
Copy link
Collaborator

ahrav commented Oct 7, 2024

The issue doesn't occur if I revert #3351. I have no idea why.

I still don't understand why adding the call to .Size() in HandleFile prevents the issue from occurring.

It seems like the issue is within the handler or bufferedreader code and that Git.handleBinary is just incidental.

I believe Size() consumes the entire reader if it's not seekable. It would hold it in memory if it's small enough otherwise, I think it writes it to disk.

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

I created some tests to see if I could reproduce it but the tests pass event without consuming the entire reader with the snippet above..

I can get it to reproduce specifically with this file.

This issue isn't going anywhere, so probably something best looked at after a good night's rest.

=== RUN   TestHandleGitCatFileLargeBlob
    handlers_test.go:518: cat-file command still alive after 15 seconds
    handlers_test.go:522: 
        	Error Trace:	/home/gomez/dev/trufflehog/pkg/handlers/handlers_test.go:522
        	Error:      	Received unexpected error:
        	            	signal: broken pipe
        	Test:       	TestHandleGitCatFileLargeBlob
        	Messages:   	git cat-file command should complete without error
    handlers_test.go:530: 
        	Error Trace:	/home/gomez/dev/trufflehog/pkg/handlers/handlers_test.go:530
        	Error:      	Not equal: 
        	            	expected: 100
        	            	actual  : 0
        	Test:       	TestHandleGitCatFileLargeBlob
        	Messages:   	Number of chunks should match the expected value
--- FAIL: TestHandleGitCatFileLargeBlob (15.17s)

Expected :100
Actual   :0
<Click to see difference>
Updated test code (click to expand)
func TestHandleGitCatFileLargeBlob(t *testing.T) {
	fileName := "_tokenizer.cpython-38-x86_64-linux-gnu.so"

	// Set up a temporary git repository with a large file
	gitDir := setupTempGitRepo(t, fileName)
	defer os.RemoveAll(gitDir)

	cmd := exec.Command("git", "-C", gitDir, "rev-parse", "HEAD")
	hashBytes, err := cmd.Output()
	assert.NoError(t, err, "Failed to get commit hash")
	commitHash := strings.TrimSpace(string(hashBytes))

	// Create a pipe to simulate the git cat-file stdout
	cmd = exec.Command("git", "-C", gitDir, "cat-file", "blob", fmt.Sprintf("%s:%s", commitHash, fileName))

	var stderr bytes.Buffer
	cmd.Stderr = &stderr

	stdout, err := cmd.StdoutPipe()
	assert.NoError(t, err, "Failed to create stdout pipe")

	err = cmd.Start()
	assert.NoError(t, err, "Failed to start git cat-file command")

	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()

	chunkCh := make(chan *sources.Chunk, 1000) // Adjust buffer size as needed

	go func() {
		defer close(chunkCh)
		err := HandleFile(ctx, stdout, &sources.Chunk{}, sources.ChanReporter{Ch: chunkCh}, WithSkipArchives(false))
		assert.NoError(t, err, "HandleFile should not return an error")
	}()

	time.AfterFunc(15*time.Second, func() {
		t.Errorf("cat-file command still alive after 15 seconds")
		stdout.Close()
	})
	err = cmd.Wait()
	assert.NoError(t, err, "git cat-file command should complete without error")

	count := 0
	for range chunkCh {
		count++
	}

	expectedChunks := 100 // This needs to be updated
	assert.Equal(t, expectedChunks, count, "Number of chunks should match the expected value")
}

func setupTempGitRepo(t *testing.T, archiveName string) string {
	tempDir := t.TempDir()

	// Initialize the Git repository
	cmd := exec.Command("git", "init", tempDir)
	var initStderr bytes.Buffer
	cmd.Stderr = &initStderr
	err := cmd.Run()
	if err != nil {
		t.Fatalf("Failed to initialize git repository: %v, stderr: %s", err, initStderr.String())
	}

	archivePath := filepath.Join(tempDir, archiveName)
	cmd = exec.Command("cp", "/tmp/_tokenizer.cpython-38-x86_64-linux-gnu.so", archivePath)
	var cpStderr bytes.Buffer
	cmd.Stderr = &cpStderr
	err = cmd.Run()
	if err != nil {
		t.Fatalf("Failed to copy file: %v, stderr: %s", err, cpStderr.String())
	}

	// Add the archive to the Git repository
	cmd = exec.Command("git", "-C", tempDir, "add", archiveName)
	var addStderr bytes.Buffer
	cmd.Stderr = &addStderr
	err = cmd.Run()
	if err != nil {
		t.Fatalf("Failed to add archive to git: %v, stderr: %s", err, addStderr.String())
	}

	cmd = exec.Command("git", "-C", tempDir, "commit", "-m", "Add archive file")
	var commitStderr bytes.Buffer
	cmd.Stderr = &commitStderr
	err = cmd.Run()
	if err != nil {
		t.Fatalf("Failed to commit archive to git: %v, stderr: %s", err, commitStderr.String())
	}

	return tempDir
}

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

Edit: furthermore, changing executeCatFileCmd to simply write the files to disk causes it to run buttery smooth, as expected. No hanging.

It seems like the issue is within the handler or bufferedreader code and that Git.handleBinary is just incidental.

I believe Size() consumes the entire reader if it's not seekable. It would hold it in memory if it's small enough otherwise, I think it writes it to disk.

I think this affirms that the issue is with NewBufferedReaderSeeker not consuming the entire output, hence why cmd.Wait() never completes.

Wait waits for the command to exit and waits for any copying ... from stdout or stderr to complete.
https://pkg.go.dev/os/exec#Cmd.Wait


Edit: if I add some debugging statements to bufferedreaderseeker.go, it looks like only 3072 bytes / 5360328 bytes are being read.

=== RUN   TestHandleGitCatFileLargeBlob
[BufferedReadSeeker#Read] Read '3072' bytes, err = <nil> <<<<<<<<<<<<<<<<<<<<<<<<
    handlers_test.go:518: cat-file command still alive after 15 seconds
    handlers_test.go:522: 
        	Error Trace:	/home/gomez/dev/trufflehog/pkg/handlers/handlers_test.go:522
        	Error:      	Received unexpected error:
        	            	signal: broken pipe
Change diff (click to expand)
diff --git a/pkg/iobuf/bufferedreaderseeker.go b/pkg/iobuf/bufferedreaderseeker.go
index 47ba5119..f70361c9 100644
--- a/pkg/iobuf/bufferedreaderseeker.go
+++ b/pkg/iobuf/bufferedreaderseeker.go
@@ -141,6 +141,8 @@ func (br *BufferedReadSeeker) Read(out []byte) (int, error) {
 
        // If we've exceeded the in-memory threshold and have a temp file.
        if br.tempFile != nil && br.index < br.diskBufferSize {
+               fmt.Printf("[BufferedReadSeeker#Read] in-memory threshold exceeded, going to file '%s'\n", br.tempFile.Name())
+
                if _, err := br.tempFile.Seek(br.index-int64(br.buf.Len()), io.SeekStart); err != nil {
                        return totalBytesRead, err
                }
@@ -172,9 +174,11 @@ func (br *BufferedReadSeeker) Read(out []byte) (int, error) {
 
        if errors.Is(err, io.EOF) {
                br.totalSize = br.bytesRead
+               fmt.Printf("[BufferedReadSeeker#Read] Total size = %d\n", br.bytesRead)
                br.sizeKnown = true
        }
 
+       fmt.Printf("[BufferedReadSeeker#Read] Read '%d' bytes, err = %v\n", totalBytesRead, err)
        return totalBytesRead, err
 }

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

I think this affirms that the issue is with NewBufferedReaderSeeker not consuming the entire output, hence why cmd.Wait() never completes.

I think I've figured it out: the file is skipped due to its extension/mimetype, therefore the reader is never read/drained.

if common.SkipFile(mimeExt) || common.IsBinary(mimeExt) {
ctx.Logger().V(5).Info("skipping file", "ext", mimeExt)
h.metrics.incFilesSkipped()
return nil
}

I had to manually comment out the noisy decoder not applicable for chunk log to notice this line in the --trace output. Perhaps that should be up from V(4) -> V(5) and the skipped log should be moved down from V(5) -> V(3)?

[BufferedReadSeeker#Read] Read '3072' bytes, err = <nil>
[BufferedReadSeeker#Read] read position inside memory buffer matches out = '2'
[BufferedReadSeeker#Read] read position inside memory buffer matches out = '8'
[BufferedReadSeeker#Read] read position inside memory buffer matches out = '502'
>>>>>>>>>>>>>>>>>> 2024-10-07T09:04:53-04:00	info-5	trufflehog	skipping file	{"source_manager_worker_id": "Om7lm", "unit": "https://github.com/cohere-ai/tokenizer.git", "unit_kind": "repo", "repo": "https://github.com/cohere-ai/tokenizer.git", "mime": "application/x-sharedlib", "timeout": 60, "ext": ".so"}
2024-10-07T09:04:53-04:00	info-5	trufflehog	handler channel closed, all chunks processed	{"source_manager_worker_id": "Om7lm", "unit": "https://github.com/cohere-ai/tokenizer.git", "unit_kind": "repo", "repo": "https://github.com/cohere-ai/tokenizer.git", "mime": "application/x-sharedlib", "timeout": 60}

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

@ahrav what's the most efficient way to ensure the reader is flushed?

I would have thought it was handled by .Close():

defer rdr.Close()

@ahrav
Copy link
Collaborator

ahrav commented Oct 7, 2024

specifically with this file

I was initially thinking something like:

	handler := selectHandler(mimeT, rdr.isGenericArchive)
	archiveChan, err := handler.HandleFile(processingCtx, rdr)
	if err != nil {
		// Ensure all data is read to prevent broken pipe
		_, _ = io.Copy(io.Discard, reader)
		if closeErr := rdr.Close(); closeErr != nil {
			ctx.Logger().Error(closeErr, "error closing reader after handler error")
		}
		return fmt.Errorf("error handling file: %w", err)
	}

	handleErr := handleChunks(processingCtx, archiveChan, chunkSkel, reporter)
	if handleErr != nil {
		// Ensure all data is read to prevent broken pipe
		_, _ = io.Copy(io.Discard, reader)
		if closeErr := rdr.Close(); closeErr != nil {
			ctx.Logger().Error(closeErr, "error closing reader after handleChunks error")
		}
		return fmt.Errorf("error processing chunks: %w", handleErr)
	}

But i'm not sure that handles all cases given handleNonArchiveContent happens in a goroutine. We might need to add the io.Copy(io.Discard, reader) in there as well.

if common.SkipFile(mimeExt) || common.IsBinary(mimeExt) {
    ctx.Logger().V(5).Info("skipping file", "ext", mimeExt)
    h.metrics.incFilesSkipped()
    // Make sure we consume the reader to avoid potentially blocking indefinitely.
    _, _ = io.Copy(io.Discard, reader)
    return nil
}

I'm working on a mechanism to propagate errors from the goroutines back to the caller, but I think using io.Copy should work.

@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

That looks worthwhile, though it wouldn't fix this specific issue as handleNonArchiveContent returns nil when a file is skipped. Unless your intention is to return a sentinel error for that?

Either way, I think this can be closed in favour of #3379 + your future change.

Edit: I misread the second example. That could work -- however, any future changes or new paths could cause a regression. I think an ideal solution would be to handle it at the top level (handlers.HandleFile).

@rgmz rgmz closed this Oct 7, 2024
@ahrav
Copy link
Collaborator

ahrav commented Oct 7, 2024

That looks worthwhile, though it wouldn't fix this specific issue as handleNonArchiveContent returns nil when a file is skipped. Unless your intention is to return a sentinel error for that?

Either way, I think this can be closed in favour of #3379 + your future change.

Edit: I misread the second example. That could work -- however, any future changes or new paths could cause a regression. I think an ideal solution would be to handle it at the top level (handlers.HandleFile).

I agree. We either need to drain the reader or propagate the error. What do you think?

Ideally, I'd like a single place to handle error management, but that might not be practical.

handleNonArchiveContent seems central since all handlers eventually pass data to it. However, putting the logic in a specific implementation still doesn't feel ideal.

@rgmz rgmz deleted the fix/git-handlebinary branch October 7, 2024 18:58
@rgmz
Copy link
Contributor Author

rgmz commented Oct 7, 2024

handleNonArchiveContent seems central since all handlers eventually pass data to it. However, putting the logic in a specific implementation still doesn't feel ideal.

After looking through the code, it seems like the best place — for now. Adding test cases for top-level and ignored nested files for each handler should help mitigate anything being missed.

@dustin-decker
Copy link
Contributor

I've updated the smoke test in #3379 to include the problematic file in a test fixture repo.

@ahrav do you want to update that PR with the fix required?

@rgmz rgmz mentioned this pull request Oct 7, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants