-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search for issues/pulls #530
Conversation
ce901cd
to
a7ea326
Compare
f22128e
to
39f0a4a
Compare
I realize this PR is a lot to review, but I don't see any good ways to split it into smaller parts. If anyone has suggestions on how to make it more digestible, let me know. |
39f0a4a
to
98c3cb7
Compare
} | ||
sess = x.Limit(setting.UI.IssuePagingNum, start) | ||
} else { | ||
sess = x.NewSession() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer sess.Close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
search.SortBy([]string{"-CreatedUnix"}) | ||
} | ||
|
||
index, err := bleve.Open(setting.IndexPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One index file could be reopen twice? Maybe the index service should a separated goroutine to receive add/remove/search index command and return results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored it so that the indexer is accessed only once.
} | ||
|
||
search.Fields = []string{ | ||
"ID", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only need to store index or order columns, don't store all the data And then we can load the other data from database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused: in #379 (comment) you said "If we use bleve, SQL query will be no need". I thought that meant that all of the issue columns should be in the indexer. @tboerger's comment (#387 (comment)) also says to add all of the columns to the indexer.
I personally prefer the approach of only storing the title and content in the indexer, and filtering by and loading other columns from the database, and would be happy to change my PR to use this approach. I just want to make sure that I'm understanding you before moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I made you confuse I'm sorry. I think only store keyword and ID, order and sorter, and then load data from database via In
or Or
.
98c3cb7
to
aab0fcc
Compare
I've tested it with several large issues(between 1-5mb of text) and the search feature works very fast. |
@Bwko That's good to hear! There are a few minor things that I need to clean up, and then it should be ready |
return err | ||
} | ||
} | ||
|
||
return sess.Commit() | ||
return sess.Commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;
if err != nil { | ||
return nil, err | ||
} | ||
result, err := index.Search(search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer index.Close()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to what I've said in other comment, I'd like to check if index.Close()
returns an error. I've refactored the code to make it a little clearer that index.Close()
always gets called.
return err | ||
} | ||
} | ||
indexerUpdateQueue = make(chan *Issue, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make QueueSize as config in [indexer]
and default is 10.
for { | ||
select { | ||
case issue := <-indexerUpdateQueue: | ||
index, err := bleve.Open(setting.IssueIndexerPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not always open on the begging of the function? And defer Close it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one thread can open an indexer at a time. If the update thread always held the indexer open, no other threads could ever access the indexer. I'll add a comment
|
||
// PopulateIssuesIndexer write all issues in the database to the issues indexer | ||
func PopulateIssuesIndexer() error { | ||
index, err := bleve.Open(setting.IssueIndexerPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index.Close()
will always be called; every return statement is preceded by a call to index.Close()
. I decided not to use defer index.Close()
since index.Close()
returns an error
, and I wanted to be able to check if closing the index was successful.
If you want, I can change it to use defer index.Close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the best solution is to break PopulateIssuesIndexer
into two functions: an "outer" function PopulateIssuesIndexer
which manages the opening and closing of the indexer, and an "inner" function populateIssuesIndexer
which is passed an indexer, and adds issues to it.
This will allow us to check if an error is returned by index.Close()
, while also easily ensuring that every code path will call index.Close()
.
return err | ||
} | ||
go func() { | ||
if err := PopulateIssuesIndexer(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer remove the go
keyword and then this should be called when Gitea start. And if there are err, it should stop and print the error msg on console.
@lunny I believe I have addressed all of your requests. |
0f3991d
to
57284db
Compare
search := bleve.NewSearchRequestOptions(indexerQuery, 2147483647, 0, false) | ||
search.Fields = []string{"ID"} | ||
|
||
index, err := bleve.Open(setting.Indexer.IssuePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we have to keep only one index opened all round the program? It should be created or opened at the start up and closed before Gitea will be closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lunny Fixed 😄
And bleve's default analysis don't support CJK languages, we could support them in step 2. |
@lunny is there an upstream issue for that? otherwise I suggest making one 😉 |
Not a upstream problem. We need a CJK custom plugin for bleve to do that. |
@lunny an indexer in 2017 that doesn't handle CJK isn't a very good indexer and should be fixed 😆 |
@bkcsoft, but I think that will be a big PR. So it could be splitted as two PRs to do that. |
cff1cd1
to
4764d00
Compare
LGTM |
LGTM |
This should be rebased down to a single commit before getting merged in, right? |
Yeah it would be nice if you can squash to a single commit (with |
4764d00
to
0893aa5
Compare
0893aa5
to
e0e49aa
Compare
let L-G-T-M work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improvements to make the search feature actually useful.
// SearchIssuesByKeyword searches for issues by given conditions. | ||
// Returns the matching issue IDs | ||
func SearchIssuesByKeyword(repoID int64, keyword string) ([]int64, error) { | ||
fields := strings.Fields(strings.ToLower(keyword)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"fields" should actually be "terms", API: NewPhraseQuery(terms []string, field string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name has since been updated to terms
(#1031)
indexerQuery := bleve.NewConjunctionQuery( | ||
numericQuery(repoID, "RepoID"), | ||
bleve.NewDisjunctionQuery( | ||
bleve.NewPhraseQuery(fields, "Title"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least we should have a Match Phrase Query, if not a Match Query or best a Query String Query.
For both fields "Title" and "Content".
bleve.NewDisjunctionQuery( | ||
bleve.NewPhraseQuery(fields, "Title"), | ||
bleve.NewPhraseQuery(fields, "Content"), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should be indexed, too, and queried here.
textFieldMapping.Analyzer = simple.Name | ||
docMapping.AddFieldMappingsAt("Title", textFieldMapping) | ||
docMapping.AddFieldMappingsAt("Content", textFieldMapping) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue comments should also be indexed here.
textFieldMapping.Analyzer = simple.Name | ||
docMapping.AddFieldMappingsAt("Title", textFieldMapping) | ||
docMapping.AddFieldMappingsAt("Content", textFieldMapping) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More token filters should be added (http://www.blevesearch.com/docs/Token-Filters/), to allow partial matches. My suggestion:
- Short n-grams to allow partial matches (an issue tracker won't work without partial matching)
- Strong Unicode normalizing (because of diacritical letters)
- Multi-lingual stemming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sha-red exactly. Would you have time to send some PRs to help improve it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lunny Basically I'd love to help with actual code, with pleasure, but because I'd have to set up a whole Go development environment and lot's of other work to do, this will take some time to get everything running.
Please also have a look at our minimalistic issue search path for gogs: gogs/gogs#4015
IMHO whole phrase search is better than nothing, but at least comment indexing should be added before releasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can improve the search in v1.2, comment index should be added in that release cycle and of course the UI should be changed for keyword highlight.
Issue #379. This PR is the same as the now-closed #387; I had to reissue the PR for logistical reasons.
Uses a bleve indexer to search issues/pulls by keyword.