You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using code search, searching for the full or partial filename of binary files (e.g., my document.pdf, image.png, my document) yields no results.
Steps to Reproduce:
Add a binary file (e.g., test report.pdf) to a repository.
Wait for indexing.
Search for test report.pdf or test report.
Actual Behavior:
No search results are returned for the binary file based on its filename.
Expected Behavior:
The search should return the binary file test report.pdf.
Observations:
Searching for just the extension (e.g., pdf) does return all PDF files, including test report.pdf.
Searching for code files (e.g., README.md) works correctly.
Binary file content is correctly not being indexed, the issue is specific to matching the filename string.
Possible Cause:
This might stem from the interaction between the filenameIndexerAnalyzer (using unicode tokenizer + path filter) and the bleve.NewPrefixQuery used for filename searching in modules/indexer/code/bleve/bleve.go. The tokenization of filenames containing spaces/dots combined with a prefix query seems to prevent finding these files by their complete name.
Description
Related to: #33828
When using code search, searching for the full or partial filename of binary files (e.g.,
my document.pdf
,image.png
,my document
) yields no results.Steps to Reproduce:
test report.pdf
) to a repository.test report.pdf
ortest report
.Actual Behavior:
No search results are returned for the binary file based on its filename.
Expected Behavior:
The search should return the binary file
test report.pdf
.Observations:
pdf
) does return all PDF files, includingtest report.pdf
.README.md
) works correctly.Possible Cause:
This might stem from the interaction between the
filenameIndexerAnalyzer
(usingunicode
tokenizer +path
filter) and thebleve.NewPrefixQuery
used for filename searching inmodules/indexer/code/bleve/bleve.go
. The tokenization of filenames containing spaces/dots combined with a prefix query seems to prevent finding these files by their complete name.Relevant Code:
modules/indexer/code/bleve/bleve.go
(Filename query logic & analyzer definition)modules/indexer/code/bleve/token/path/path.go
(Path token filter logic)Gitea Version
1.24.0rc0
Can you reproduce the bug on the Gitea demo site?
Yes
Log Gist
No response
Screenshots
No response
Git Version
No response
Operating System
No response
How are you running Gitea?
binary from windows
Database
SQLite
The text was updated successfully, but these errors were encountered: