Skip to content

Bleve Code Search: Cannot find binary files by full filename #34332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wbste opened this issue May 1, 2025 · 0 comments
Open

Bleve Code Search: Cannot find binary files by full filename #34332

wbste opened this issue May 1, 2025 · 0 comments
Labels

Comments

@wbste
Copy link

wbste commented May 1, 2025

Description

Related to: #33828

When using code search, searching for the full or partial filename of binary files (e.g., my document.pdf, image.png, my document) yields no results.

Steps to Reproduce:

  1. Add a binary file (e.g., test report.pdf) to a repository.
  2. Wait for indexing.
  3. Search for test report.pdf or test report.

Actual Behavior:
No search results are returned for the binary file based on its filename.

Expected Behavior:
The search should return the binary file test report.pdf.

Observations:

  • Searching for just the extension (e.g., pdf) does return all PDF files, including test report.pdf.
  • Searching for code files (e.g., README.md) works correctly.
  • Binary file content is correctly not being indexed, the issue is specific to matching the filename string.

Possible Cause:
This might stem from the interaction between the filenameIndexerAnalyzer (using unicode tokenizer + path filter) and the bleve.NewPrefixQuery used for filename searching in modules/indexer/code/bleve/bleve.go. The tokenization of filenames containing spaces/dots combined with a prefix query seems to prevent finding these files by their complete name.

Relevant Code:

  • modules/indexer/code/bleve/bleve.go (Filename query logic & analyzer definition)
  • modules/indexer/code/bleve/token/path/path.go (Path token filter logic)

Gitea Version

1.24.0rc0

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

binary from windows

Database

SQLite

@wbste wbste added the type/bug label May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant