Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[full-ci] - use KQL as default search query language #7212

Merged
merged 14 commits into from
Sep 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .drone.env
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# The test runner source for UI tests
WEB_COMMITID=de510963a4c9d9eaa05ba69512fabb323a32bd73
WEB_COMMITID=1322e5b46c827d0e7f7b8f563f302e61269f8515
WEB_BRANCH=master

This file was deleted.

27 changes: 27 additions & 0 deletions changelog/unreleased/kql-search-query-language.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Enhancement: Keyword Query Language (KQL) search syntax

We've introduced support for [KQL](https://learn.microsoft.com/en-us/sharepoint/dev/general-development/keyword-query-language-kql-syntax-reference) as the default oCIS search query language.

Some examples of a valid KQL query are:

* `Tag`: `tag:golden tag:"silver"`
* `Filename`: `name:file.txt name:"file.docx"`
* `Content`: `content:ahab content:"captain aha*"`

Conjunctive normal form queries:

* `Boolean`: `tag:golden AND tag:"silver`, `tag:golden OR tag:"silver`, `tag:golden NOT tag:"silver`
* `Group`: `(tag:book content:ahab*)`, `tag:(book pdf)`

Complex queries:

* `(name:"moby di*" OR tag:bestseller) AND tag:book NOT tag:read`

https://github.com/owncloud/ocis/pull/7212
https://github.com/owncloud/ocis/pull/7043
https://github.com/owncloud/web/pull/9653
https://github.com/owncloud/ocis/issues/7042
https://github.com/owncloud/ocis/issues/7179
https://github.com/owncloud/ocis/issues/7114
https://github.com/owncloud/web/issues/9636
https://github.com/owncloud/web/issues/9646
16 changes: 16 additions & 0 deletions services/search/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,22 @@ Note that as of now, the search service can not be scaled. Consider using a dedi

By default, the search service is shipped with [bleve](https://github.com/blevesearch/bleve) as its primary search engine. The available engines can be extended by implementing the [Engine](pkg/engine/engine.go) interface and making that engine available.

## Query language

By default, [KQL](https://learn.microsoft.com/en-us/sharepoint/dev/general-development/keyword-query-language-kql-syntax-reference) is used as query language,
for an overview of how the syntax works, please read the [microsoft documentation](https://learn.microsoft.com/en-us/sharepoint/dev/general-development/keyword-query-language-kql-syntax-reference).

Not all parts are supported, the following list gives an overview of parts that are not implemented yet:

* Synonym operators
* Inclusion and exclusion operators
* Dynamic ranking operator
* ONEAR operator
* NEAR operator
* Date intervals

In the following [ADR](https://github.com/owncloud/ocis/blob/docs/ocis/adr/0020-file-search-query-language.md) you can read why we chose KQL.

## Extraction Engines

The search service provides the following extraction engines and their results are used as index for searching:
Expand Down
22 changes: 11 additions & 11 deletions services/search/pkg/engine/bleve.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ import (
"github.com/blevesearch/bleve/v2/analysis/tokenizer/single"
"github.com/blevesearch/bleve/v2/analysis/tokenizer/unicode"
"github.com/blevesearch/bleve/v2/mapping"
bleveQuery "github.com/blevesearch/bleve/v2/search/query"
"github.com/blevesearch/bleve/v2/search/query"
storageProvider "github.com/cs3org/go-cs3apis/cs3/storage/provider/v1beta1"
"google.golang.org/protobuf/types/known/timestamppb"

Expand All @@ -27,13 +27,13 @@ import (
searchMessage "github.com/owncloud/ocis/v2/protogen/gen/ocis/messages/search/v0"
searchService "github.com/owncloud/ocis/v2/protogen/gen/ocis/services/search/v0"
"github.com/owncloud/ocis/v2/services/search/pkg/content"
"github.com/owncloud/ocis/v2/services/search/pkg/query"
searchQuery "github.com/owncloud/ocis/v2/services/search/pkg/query"
)

// Bleve represents a search engine which utilizes bleve to search and store resources.
type Bleve struct {
index bleve.Index
query query.Creator[bleveQuery.Query]
index bleve.Index
queryCreator searchQuery.Creator[query.Query]
}

// NewBleveIndex returns a new bleve index
Expand All @@ -58,10 +58,10 @@ func NewBleveIndex(root string) (bleve.Index, error) {
}

// NewBleveEngine creates a new Bleve instance
func NewBleveEngine(index bleve.Index, qbc query.Creator[bleveQuery.Query]) *Bleve {
func NewBleveEngine(index bleve.Index, queryCreator searchQuery.Creator[query.Query]) *Bleve {
return &Bleve{
index: index,
query: qbc,
index: index,
queryCreator: queryCreator,
}
}

Expand Down Expand Up @@ -118,15 +118,15 @@ func BuildBleveMapping() (mapping.IndexMapping, error) {

// Search executes a search request operation within the index.
// Returns a SearchIndexResponse object or an error.
func (b *Bleve) Search(_ context.Context, sir *searchService.SearchIndexRequest) (*searchService.SearchIndexResponse, error) {
createdQuery, err := b.query.Create(sir.Query)
func (b *Bleve) Search(ctx context.Context, sir *searchService.SearchIndexRequest) (*searchService.SearchIndexResponse, error) {
createdQuery, err := b.queryCreator.Create(sir.Query)
if err != nil {
return nil, err
}

q := bleve.NewConjunctionQuery(
// Skip documents that have been marked as deleted
&bleveQuery.BoolFieldQuery{
&query.BoolFieldQuery{
Bool: false,
FieldVal: "Deleted",
},
Expand All @@ -136,7 +136,7 @@ func (b *Bleve) Search(_ context.Context, sir *searchService.SearchIndexRequest)
if sir.Ref != nil {
q.Conjuncts = append(
q.Conjuncts,
&bleveQuery.TermQuery{
&query.TermQuery{
FieldVal: "RootID",
Term: storagespace.FormatResourceID(
storageProvider.ResourceId{
Expand Down
28 changes: 13 additions & 15 deletions services/search/pkg/engine/bleve_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,9 @@ import (
"context"
"fmt"

"github.com/cs3org/reva/v2/pkg/storagespace"

bleveSearch "github.com/blevesearch/bleve/v2"
sprovider "github.com/cs3org/go-cs3apis/cs3/storage/provider/v1beta1"
"github.com/cs3org/reva/v2/pkg/storagespace"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"

Expand All @@ -22,15 +21,14 @@ var _ = Describe("Bleve", func() {
var (
eng *engine.Bleve
idx bleveSearch.Index
ctx context.Context

doSearch = func(id string, query, path string) (*searchsvc.SearchIndexResponse, error) {
rID, err := storagespace.ParseID(id)
if err != nil {
return nil, err
}

return eng.Search(ctx, &searchsvc.SearchIndexRequest{
return eng.Search(context.Background(), &searchsvc.SearchIndexRequest{
Query: query,
Ref: &searchmsg.Reference{
ResourceId: &searchmsg.ResourceID{
Expand Down Expand Up @@ -63,7 +61,7 @@ var _ = Describe("Bleve", func() {
idx, err = bleveSearch.NewMemOnly(mapping)
Expect(err).ToNot(HaveOccurred())

eng = engine.NewBleveEngine(idx, bleve.LegacyCreator)
eng = engine.NewBleveEngine(idx, bleve.DefaultCreator)
Expect(err).ToNot(HaveOccurred())

rootResource = engine.Resource{
Expand Down Expand Up @@ -94,7 +92,7 @@ var _ = Describe("Bleve", func() {

Describe("New", func() {
It("returns a new index instance", func() {
b := engine.NewBleveEngine(idx, bleve.LegacyCreator)
b := engine.NewBleveEngine(idx, bleve.DefaultCreator)
Expect(b).ToNot(BeNil())
})
})
Expand Down Expand Up @@ -134,7 +132,7 @@ var _ = Describe("Bleve", func() {
err := eng.Upsert(parentResource.ID, parentResource)
Expect(err).ToNot(HaveOccurred())

assertDocCount(rootResource.ID, `Name:foo\ o*`, 1)
assertDocCount(rootResource.ID, `name:"foo o*"`, 1)
})

It("finds files by digits in the filename", func() {
Expand Down Expand Up @@ -409,14 +407,14 @@ var _ = Describe("Bleve", func() {
err = eng.Upsert(childResource.ID, childResource)
Expect(err).ToNot(HaveOccurred())

assertDocCount(rootResource.ID, parentResource.Document.Name, 1)
assertDocCount(rootResource.ID, childResource.Document.Name, 1)
assertDocCount(rootResource.ID, `"`+parentResource.Document.Name+`"`, 1)
assertDocCount(rootResource.ID, `"`+childResource.Document.Name+`"`, 1)

err = eng.Delete(parentResource.ID)
Expect(err).ToNot(HaveOccurred())

assertDocCount(rootResource.ID, parentResource.Document.Name, 0)
assertDocCount(rootResource.ID, childResource.Document.Name, 0)
assertDocCount(rootResource.ID, `"`+parentResource.Document.Name+`"`, 0)
assertDocCount(rootResource.ID, `"`+childResource.Document.Name+`"`, 0)
})
})

Expand All @@ -431,14 +429,14 @@ var _ = Describe("Bleve", func() {
err = eng.Delete(parentResource.ID)
Expect(err).ToNot(HaveOccurred())

assertDocCount(rootResource.ID, parentResource.Name, 0)
assertDocCount(rootResource.ID, childResource.Name, 0)
assertDocCount(rootResource.ID, `"`+parentResource.Name+`"`, 0)
assertDocCount(rootResource.ID, `"`+childResource.Name+`"`, 0)

err = eng.Restore(parentResource.ID)
Expect(err).ToNot(HaveOccurred())

assertDocCount(rootResource.ID, parentResource.Name, 1)
assertDocCount(rootResource.ID, childResource.Name, 1)
assertDocCount(rootResource.ID, `"`+parentResource.Name+`"`, 1)
assertDocCount(rootResource.ID, `"`+childResource.Name+`"`, 1)
})
})

Expand Down
12 changes: 12 additions & 0 deletions services/search/pkg/query/ast/ast.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
// Package ast provides available ast nodes.
package ast

import (
"time"
)

// Node represents abstract syntax tree node
type Node interface {
Location() *Location
Expand Down Expand Up @@ -48,6 +52,14 @@ type BooleanNode struct {
Value bool
}

// DateTimeNode represents a time.Time value
type DateTimeNode struct {
*Base
Key string
Operator *OperatorNode
Value time.Time
}

// OperatorNode represents an operator value like
// AND, OR, NOT, =, <= ... and so on
type OperatorNode struct {
Expand Down
1 change: 1 addition & 0 deletions services/search/pkg/query/ast/test/test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ func DiffAst(x, y interface{}, opts ...cmp.Option) string {
cmpopts.IgnoreFields(ast.OperatorNode{}, "Base"),
cmpopts.IgnoreFields(ast.GroupNode{}, "Base"),
cmpopts.IgnoreFields(ast.BooleanNode{}, "Base"),
cmpopts.IgnoreFields(ast.DateTimeNode{}, "Base"),
)...,
)
}
5 changes: 3 additions & 2 deletions services/search/pkg/query/bleve/bleve.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
bQuery "github.com/blevesearch/bleve/v2/search/query"

"github.com/owncloud/ocis/v2/services/search/pkg/query"
"github.com/owncloud/ocis/v2/services/search/pkg/query/kql"
)

// Creator is combines a Builder and a Compiler which is used to Create the query.
Expand All @@ -29,5 +30,5 @@ func (c Creator[T]) Create(qs string) (T, error) {
return t, nil
}

// LegacyCreator exposes an ocis legacy bleve query creator.
var LegacyCreator = Creator[bQuery.Query]{LegacyBuilder{}, LegacyCompiler{}}
// DefaultCreator exposes a kql to bleve query creator.
var DefaultCreator = Creator[bQuery.Query]{kql.Builder{}, Compiler{}}
48 changes: 46 additions & 2 deletions services/search/pkg/query/bleve/compiler.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,15 @@ var _fields = map[string]string{
"type": "Type",
"tag": "Tags",
"tags": "Tags",
"content": "Content",
"hidden": "Hidden",
}

// Compiler represents a KQL query search string to the bleve query formatter.
type Compiler struct{}

// Compile implements the query formatter which converts the KQL query search string to the bleve query.
func (c *Compiler) Compile(givenAst *ast.Ast) (bleveQuery.Query, error) {
func (c Compiler) Compile(givenAst *ast.Ast) (bleveQuery.Query, error) {
q, err := compile(givenAst)
if err != nil {
return nil, err
Expand All @@ -52,7 +54,49 @@ func walk(offset int, nodes []ast.Node) (bleveQuery.Query, int) {
for i := offset; i < len(nodes); i++ {
switch n := nodes[i].(type) {
case *ast.StringNode:
q := bleveQuery.NewQueryStringQuery(getField(n.Key) + ":" + n.Value)
k := getField(n.Key)
v := strings.ReplaceAll(n.Value, " ", `\ `)

if k != "Hidden" {
v = strings.ToLower(v)
}

q := bleveQuery.NewQueryStringQuery(k + ":" + v)
if prev == nil {
prev = q
} else {
next = q
}
case *ast.DateTimeNode:
q := &bleveQuery.DateRangeQuery{
Start: bleveQuery.BleveQueryTime{},
End: bleveQuery.BleveQueryTime{},
InclusiveStart: nil,
InclusiveEnd: nil,
FieldVal: getField(n.Key),
}

if n.Operator == nil {
continue
}

switch n.Operator.Value {
case ">":
q.Start.Time = n.Value
q.InclusiveStart = &[]bool{false}[0]
case ">=":
q.Start.Time = n.Value
q.InclusiveStart = &[]bool{true}[0]
case "<":
q.End.Time = n.Value
q.InclusiveEnd = &[]bool{false}[0]
case "<=":
q.End.Time = n.Value
q.InclusiveEnd = &[]bool{true}[0]
default:
continue
}

if prev == nil {
prev = q
} else {
Expand Down
Loading