Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for issues/pulls #530

Merged
merged 1 commit into from
Jan 25, 2017
Merged

Search for issues/pulls #530

merged 1 commit into from
Jan 25, 2017

Conversation

ethantkoenig
Copy link
Member

@ethantkoenig ethantkoenig commented Dec 29, 2016

Issue #379. This PR is the same as the now-closed #387; I had to reissue the PR for logistical reasons.

Uses a bleve indexer to search issues/pulls by keyword.

screenshot from 2017-01-12 14-12-03

@lunny lunny added this to the 1.1.0 milestone Dec 29, 2016
@andreynering andreynering added type/feature Completely new functionality. Can only be merged if feature freeze is not active. pr/wip This PR is not ready for review labels Dec 29, 2016
@ethantkoenig ethantkoenig force-pushed the search_bar branch 4 times, most recently from ce901cd to a7ea326 Compare January 5, 2017 18:46
@ethantkoenig ethantkoenig force-pushed the search_bar branch 3 times, most recently from f22128e to 39f0a4a Compare January 9, 2017 21:30
@ethantkoenig
Copy link
Member Author

I realize this PR is a lot to review, but I don't see any good ways to split it into smaller parts. If anyone has suggestions on how to make it more digestible, let me know.

@tboerger tboerger added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 10, 2017
}
sess = x.Limit(setting.UI.IssuePagingNum, start)
} else {
sess = x.NewSession()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer sess.Close()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

search.SortBy([]string{"-CreatedUnix"})
}

index, err := bleve.Open(setting.IndexPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One index file could be reopen twice? Maybe the index service should a separated goroutine to receive add/remove/search index command and return results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored it so that the indexer is accessed only once.

}

search.Fields = []string{
"ID",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only need to store index or order columns, don't store all the data And then we can load the other data from database?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused: in #379 (comment) you said "If we use bleve, SQL query will be no need". I thought that meant that all of the issue columns should be in the indexer. @tboerger's comment (#387 (comment)) also says to add all of the columns to the indexer.

I personally prefer the approach of only storing the title and content in the indexer, and filtering by and loading other columns from the database, and would be happy to change my PR to use this approach. I just want to make sure that I'm understanding you before moving forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I made you confuse I'm sorry. I think only store keyword and ID, order and sorter, and then load data from database via In or Or.

@Bwko
Copy link
Member

Bwko commented Jan 12, 2017

I've tested it with several large issues(between 1-5mb of text) and the search feature works very fast.
Is it still a WIP?

@ethantkoenig
Copy link
Member Author

@Bwko That's good to hear! There are a few minor things that I need to clean up, and then it should be ready

@ethantkoenig ethantkoenig changed the title [WIP] Search for issues/pulls Search for issues/pulls Jan 12, 2017
return err
}
}

return sess.Commit()
return sess.Commit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;

if err != nil {
return nil, err
}
result, err := index.Search(search)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer index.Close()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what I've said in other comment, I'd like to check if index.Close() returns an error. I've refactored the code to make it a little clearer that index.Close() always gets called.

return err
}
}
indexerUpdateQueue = make(chan *Issue, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make QueueSize as config in [indexer] and default is 10.

for {
select {
case issue := <-indexerUpdateQueue:
index, err := bleve.Open(setting.IssueIndexerPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not always open on the begging of the function? And defer Close it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one thread can open an indexer at a time. If the update thread always held the indexer open, no other threads could ever access the indexer. I'll add a comment


// PopulateIssuesIndexer write all issues in the database to the issues indexer
func PopulateIssuesIndexer() error {
index, err := bleve.Open(setting.IssueIndexerPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No close?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.Close() will always be called; every return statement is preceded by a call to index.Close(). I decided not to use defer index.Close() since index.Close() returns an error, and I wanted to be able to check if closing the index was successful.

If you want, I can change it to use defer index.Close()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the best solution is to break PopulateIssuesIndexer into two functions: an "outer" function PopulateIssuesIndexer which manages the opening and closing of the indexer, and an "inner" function populateIssuesIndexer which is passed an indexer, and adds issues to it.

This will allow us to check if an error is returned by index.Close(), while also easily ensuring that every code path will call index.Close().

return err
}
go func() {
if err := PopulateIssuesIndexer(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer remove the go keyword and then this should be called when Gitea start. And if there are err, it should stop and print the error msg on console.

@ethantkoenig
Copy link
Member Author

@lunny I believe I have addressed all of your requests.

search := bleve.NewSearchRequestOptions(indexerQuery, 2147483647, 0, false)
search.Fields = []string{"ID"}

index, err := bleve.Open(setting.Indexer.IssuePath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we have to keep only one index opened all round the program? It should be created or opened at the start up and closed before Gitea will be closed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lunny Fixed 😄

@lunny
Copy link
Member

lunny commented Jan 14, 2017

And bleve's default analysis don't support CJK languages, we could support them in step 2.

@lunny lunny mentioned this pull request Jan 17, 2017
@bkcsoft
Copy link
Member

bkcsoft commented Jan 20, 2017

don't support CJK languages

@lunny is there an upstream issue for that? otherwise I suggest making one 😉

@lunny
Copy link
Member

lunny commented Jan 20, 2017

Not a upstream problem. We need a CJK custom plugin for bleve to do that.

@bkcsoft
Copy link
Member

bkcsoft commented Jan 21, 2017

@lunny an indexer in 2017 that doesn't handle CJK isn't a very good indexer and should be fixed 😆

@lunny
Copy link
Member

lunny commented Jan 21, 2017

@bkcsoft, but I think that will be a big PR. So it could be splitted as two PRs to do that.

@lunny
Copy link
Member

lunny commented Jan 22, 2017

LGTM

@tboerger tboerger added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jan 22, 2017
@lunny lunny removed the pr/wip This PR is not ready for review label Jan 22, 2017
@lunny lunny mentioned this pull request Jan 24, 2017
@thibaultmeyer
Copy link
Contributor

thibaultmeyer commented Jan 24, 2017

LGTM

@tboerger tboerger added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jan 24, 2017
@ethantkoenig
Copy link
Member Author

This should be rebased down to a single commit before getting merged in, right?

@thibaultmeyer
Copy link
Contributor

Yeah it would be nice if you can squash to a single commit (with git reset HEAD~n by exemple)

@lunny
Copy link
Member

lunny commented Jan 25, 2017

let L-G-T-M work.

@lunny lunny merged commit 833f8b9 into go-gitea:master Jan 25, 2017
@ethantkoenig ethantkoenig deleted the search_bar branch January 25, 2017 03:47
lunny added a commit to lunny/gitea that referenced this pull request Jan 25, 2017
@lunny lunny mentioned this pull request Jan 25, 2017
6 tasks
lunny added a commit that referenced this pull request Jan 25, 2017
@Bwko Bwko mentioned this pull request Jan 26, 2017
Copy link

@sha-red sha-red left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvements to make the search feature actually useful.

// SearchIssuesByKeyword searches for issues by given conditions.
// Returns the matching issue IDs
func SearchIssuesByKeyword(repoID int64, keyword string) ([]int64, error) {
fields := strings.Fields(strings.ToLower(keyword))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fields" should actually be "terms", API: NewPhraseQuery(terms []string, field string)

Copy link
Member Author

@ethantkoenig ethantkoenig Feb 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name has since been updated to terms (#1031)

indexerQuery := bleve.NewConjunctionQuery(
numericQuery(repoID, "RepoID"),
bleve.NewDisjunctionQuery(
bleve.NewPhraseQuery(fields, "Title"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least we should have a Match Phrase Query, if not a Match Query or best a Query String Query.

For both fields "Title" and "Content".

bleve.NewDisjunctionQuery(
bleve.NewPhraseQuery(fields, "Title"),
bleve.NewPhraseQuery(fields, "Content"),
))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be indexed, too, and queried here.

textFieldMapping.Analyzer = simple.Name
docMapping.AddFieldMappingsAt("Title", textFieldMapping)
docMapping.AddFieldMappingsAt("Content", textFieldMapping)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue comments should also be indexed here.

textFieldMapping.Analyzer = simple.Name
docMapping.AddFieldMappingsAt("Title", textFieldMapping)
docMapping.AddFieldMappingsAt("Content", textFieldMapping)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More token filters should be added (http://www.blevesearch.com/docs/Token-Filters/), to allow partial matches. My suggestion:

  • Short n-grams to allow partial matches (an issue tracker won't work without partial matching)
  • Strong Unicode normalizing (because of diacritical letters)
  • Multi-lingual stemming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sha-red exactly. Would you have time to send some PRs to help improve it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lunny Basically I'd love to help with actual code, with pleasure, but because I'd have to set up a whole Go development environment and lot's of other work to do, this will take some time to get everything running.

Please also have a look at our minimalistic issue search path for gogs: gogs/gogs#4015

IMHO whole phrase search is better than nothing, but at least comment indexing should be added before releasing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can improve the search in v1.2, comment index should be added in that release cycle and of course the UI should be changed for keyword highlight.

@go-gitea go-gitea locked and limited conversation to collaborators Nov 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/feature Completely new functionality. Can only be merged if feature freeze is not active.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants