From 87e282bebcd924c76296ae86dfdd0d45c7afa797 Mon Sep 17 00:00:00 2001 From: Konrad Staniszewski Date: Tue, 19 Nov 2024 00:03:58 -0800 Subject: [PATCH 1/4] Add query documentation --- README.md | 2 + docs/query.md | 197 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 199 insertions(+) create mode 100644 docs/query.md diff --git a/README.md b/README.md index 1163e148..c83bcd34 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,8 @@ https://github.com/user-attachments/assets/98d46192-5469-430f-ad9e-5c042adbb10d You can try out our public hosted demo [here](https://demo.sourcebot.dev/)! +For information about the query language, take a look at the [query docs](docs/query.md) + # Getting Started Get started with a single docker command: diff --git a/docs/query.md b/docs/query.md new file mode 100644 index 00000000..753c1f0d --- /dev/null +++ b/docs/query.md @@ -0,0 +1,197 @@ +# Zoekt Query Language Guide + +This guide explains the Zoekt query language, used for searching text within Git repositories. Zoekt queries allow combining multiple filters and expressions using logical operators, negations, and grouping. Here's how to craft queries effectively. + +--- + +## Syntax Overview + +A query is made up of expressions. An **expression** can be: +- A negation (e.g., `-`), +- A field (e.g., `repo:`). +- A grouping (e.g., parentheses `()`), + +Logical `OR` operations combine multiple expressions. The **`AND` operator is implicit**, meaning multiple expressions written together will be automatically treated as `AND`. + +--- + +## Query Components + +### 1. **Fields** + +Fields restrict your query to specific criteria. Here's a list of fields and their usage: + +| Field | Aliases | Values | Description | Examples | +|--------------|---------|------------------------|------------------------------------------------------------|----------------------------------------| +| `archived:` | `a:` | `yes` or `no` | Filters archived repositories. | `archived:yes` | +| `case:` | `c:` | `yes`, `no`, or `auto` | Matches case-sensitive or insensitive text. | `case:yes content:"Foo"` | +| `content:` | `c:` | Text (string or regex) | Searches content of files. | `content:"search term"` | +| `file:` | `f:` | Text (string or regex) | Searches file names. | `file:"main.go"` | +| `fork:` | `f:` | `yes` or `no` | Filters forked repositories. | `fork:no` | +| `lang:` | `l:` | Text | Filters by programming language. | `lang:python` | +| `public:` | | `yes` or `no` | Filters public repositories. | `public:yes` | +| `regex:` | | Regex pattern | Matches content using a regular expression. | `regex:/foo.*bar/` | +| `repo:` | `r:` | Text (string or regex) | Filters repositories by name. | `repo:"github.com/user/project"` | +| `sym:` | | Text | Searches for symbol names. | `sym:"MyFunction"` | +| `branch:` | `b:` | Text | Searches within a specific branch. | `branch:main` | +| `type:` | `t:` | `filematch`, `filename`, `file`, or `repo` | Limits result types. | `type:filematch` | + +--- + +### 2. **Negation** + +Negate an expression using the `-` symbol. + +#### Examples: +- Exclude a repository: + ```plaintext + -repo:"github.com/example/repo" + ``` +- Exclude a language: + ```plaintext + -lang:javascript + ``` + +--- + +### 3. **Grouping** + +Group queries using parentheses `()` to create complex logic. + +#### Examples: +- Match either of two repositories: + ```plaintext + (repo:repo1 or repo:repo2) + ``` +- Find test in either python or javascript files: + ```plaintext + content:test (lang:python or lang:javascript) + ``` + +--- + +### 4. **Logical Operators** + +Use `or` to combine multiple expressions. + +#### Examples: +- Match files in either of two languages: + ```plaintext + lang:go or lang:java + ``` + +`and` boolean operator is applied automatically when expressions are separated by a space. + +--- + +## Special Query Values + +- **Boolean Values**: + Use `yes` or `no` for fields like `archived:` or `fork:`. + +- **Text Fields**: + Text fields (`content:`, `repo:`, etc.) accept: + - Strings: `"my text"` + - Regular expressions: `/my.*regex/` + +- **Escape Characters**: + To include special characters, use backslashes (`\`). + +#### Examples: +- Match the string `foo"bar`: + ```plaintext + content:"foo\"bar" + ``` +- Match the regex `foo.*bar`: + ```plaintext + content:/foo.*bar/ + ``` + +--- + +## Advanced Examples + +1. **Search for content in Python files in public repositories**: + ```plaintext + lang:python public:yes content:"my_function" + ``` + +2. **Exclude archived repositories and match a regex**: + ```plaintext + archived:no regex:/error.*handler/ + ``` + +3. **Find files named `README.md` in forks**: + ```plaintext + file:"README.md" fork:yes + ``` + +4. **Search for a specific branch**: + ```plaintext + branch:main content:"TODO" + ``` + +5. **Combine multiple fields**: + ```plaintext + (repo:"github.com/example" or repo:"github.com/test") and lang:go + ``` + +--- + +## Tips + +1. **Combine Filters**: You can combine as many fields as needed. For instance: + ```plaintext + repo:"github.com/example" lang:go content:"init" + ``` + +2. **Use Regular Expressions**: Make complex content searches more powerful: + ```plaintext + content:/func\s+\w+\s*\(/ + ``` + +3. **Case Sensitivity**: Use `case:yes` for exact matches: + ```plaintext + case:yes content:"ExactMatch" + ``` + +4. **Match Specific File Types**: + ```plaintext + file:".*\.go" content:"package main" + ``` + +### EBNF Summary + +```ebnf +query = expression , { "or" , expression } ; + +expression = negation + | grouping + | field ; + +negation = "-" , expression ; + +grouping = "(" , query , ")" ; + +field = ( ( "archived:" | "a:" ) , boolean ) + | ( ( "case:" | "c:" ) , ("yes" | "no" | "auto") ) + | ( ( "content:" | "c:" ) , text ) + | ( ( "file:" | "f:" ) , text ) + | ( ( "fork:" | "f:" ) , boolean ) + | ( ( "lang:" | "l:" ) , text ) + | ( ( "public:" ) , boolean ) + | ( ( "regex:" ) , text ) + | ( ( "repo:" | "r:" ) , text ) + | ( ( "sym:" ) , text ) + | ( ( "branch:" | "b:" ) , text ) + | ( ( "type:" | "t:" ) , type ); + +boolean = "yes" | "no" ; +text = string | regex ; +string = '"' , { character | escape } , '"' ; +regex = '/' , { character | escape } , '/' ; + +type = "filematch" | "filename" | "file" | "repo" ; +``` + + From 005f4dcbc037f324b1f69b77cafab14c669dff05 Mon Sep 17 00:00:00 2001 From: Konrad Staniszewski Date: Tue, 19 Nov 2024 00:27:26 -0800 Subject: [PATCH 2/4] Add some more examples --- docs/query.md | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/docs/query.md b/docs/query.md index 753c1f0d..eb757da8 100644 --- a/docs/query.md +++ b/docs/query.md @@ -194,4 +194,72 @@ regex = '/' , { character | escape } , '/' ; type = "filematch" | "filename" | "file" | "repo" ; ``` +--- + +### **Complex Query Examples** + +1. **Search for functions in Go files with TODO comments** + ```plaintext + lang:go /func .* \/\/ TODO/ + ``` + Matches Go files where functions are annotated with TODO comments. + +2. **Find Python test files containing the word "assert"** + ```plaintext + lang:python file:".*test.*\\.py" content:"assert" + ``` + Looks for test files in Python containing assertions. + +3. **Search for all README files mentioning "installation"** + ```plaintext + file:"README.*" content:"installation" + ``` + Matches README files across repositories containing the word "installation." +4. **Find public repositories containing "openapi" in YAML files** + ```plaintext + file:".*\\.yaml$" content:"openapi" + ``` + Matches YAML files mentioning "openapi." + +5. **Search Java repositories for method signatures matching `public static`** + ```plaintext + lang:java /public static .*\\(/ + ``` + Finds Java methods declared as public static. + +6. **Find JavaScript files importing React** + ```plaintext + lang:javascript content:"import React from 'react';" + ``` + Matches JavaScript files importing React. + +7. **Find all Markdown files mentioning "license" or "agreement"** + ```plaintext + file:".*\\.md" (content:"license" or content:"agreement") + ``` + Targets Markdown files containing either "license" or "agreement." + +8. **Find log statements in Go files** + ```plaintext + lang:go /"log\\.(Print|Printf|Fatal|Panic).*\\(.*\\)"/ + ``` + Matches Go log statements. + +9. **Look for Python repositories containing Flask imports in their `app.py` file** + ```plaintext + lang:python file:"app\\.py" content:"from flask import .*" + ``` + Matches Flask applications. + +10. **Search for JSON files containing an array of objects** + ```plaintext + file:".*\\.json" /\\[\\s*{.*/ + ``` + Finds JSON files with object arrays. + +11. **Search for Kubernetes YAML files containing `kind: Deployment`** + ```plaintext + file:".*\\.yaml" content:"kind: Deployment" + ``` + Matches Kubernetes deployment files. From 93ba3f06c41c9a6b154cec14914891f5577e3c1f Mon Sep 17 00:00:00 2001 From: Konrad Staniszewski Date: Fri, 6 Dec 2024 00:53:55 -0800 Subject: [PATCH 3/4] Update docs --- docs/query.md | 66 +++++++++++++++++---------------------------------- 1 file changed, 22 insertions(+), 44 deletions(-) diff --git a/docs/query.md b/docs/query.md index eb757da8..24d5390e 100644 --- a/docs/query.md +++ b/docs/query.md @@ -1,7 +1,6 @@ -# Zoekt Query Language Guide - -This guide explains the Zoekt query language, used for searching text within Git repositories. Zoekt queries allow combining multiple filters and expressions using logical operators, negations, and grouping. Here's how to craft queries effectively. +# Search Query Language Guide +This guide explains the search query language used by Sourcebot, which is a derivative of the Zoekt query language with some minor differences. Search queries allow for combining multiple filters and expressions using logical operators, negations, and grouping. Here's how to craft queries effectively. --- ## Syntax Overview @@ -13,6 +12,8 @@ A query is made up of expressions. An **expression** can be: Logical `OR` operations combine multiple expressions. The **`AND` operator is implicit**, meaning multiple expressions written together will be automatically treated as `AND`. +All expressions are evaluated as regular expressions unless wrapped with "". + --- ## Query Components @@ -33,8 +34,7 @@ Fields restrict your query to specific criteria. Here's a list of fields and the | `regex:` | | Regex pattern | Matches content using a regular expression. | `regex:/foo.*bar/` | | `repo:` | `r:` | Text (string or regex) | Filters repositories by name. | `repo:"github.com/user/project"` | | `sym:` | | Text | Searches for symbol names. | `sym:"MyFunction"` | -| `branch:` | `b:` | Text | Searches within a specific branch. | `branch:main` | -| `type:` | `t:` | `filematch`, `filename`, `file`, or `repo` | Limits result types. | `type:filematch` | +| `revision:` | `rev:` | Text | Searches within a specific branch or tag. | `revision:main` | --- @@ -84,6 +84,18 @@ Use `or` to combine multiple expressions. --- +### 5. **Exact Matching** + +Quotes "" works to match exactly what you are looking for, instead of using regular expressions. + +#### Examples: +- Find test.* exactly: + ```plaintext + content:"test.*" + ``` + +--- + ## Special Query Values - **Boolean Values**: @@ -92,7 +104,7 @@ Use `or` to combine multiple expressions. - **Text Fields**: Text fields (`content:`, `repo:`, etc.) accept: - Strings: `"my text"` - - Regular expressions: `/my.*regex/` + - Regular expressions: `my.*regex` - **Escape Characters**: To include special characters, use backslashes (`\`). @@ -104,7 +116,7 @@ Use `or` to combine multiple expressions. ``` - Match the regex `foo.*bar`: ```plaintext - content:/foo.*bar/ + content:foo.*bar ``` --- @@ -118,7 +130,7 @@ Use `or` to combine multiple expressions. 2. **Exclude archived repositories and match a regex**: ```plaintext - archived:no regex:/error.*handler/ + archived:no error.*handler ``` 3. **Find files named `README.md` in forks**: @@ -133,7 +145,7 @@ Use `or` to combine multiple expressions. 5. **Combine multiple fields**: ```plaintext - (repo:"github.com/example" or repo:"github.com/test") and lang:go + (repo:"github.com/example" or repo:"github.com/test") lang:go ``` --- @@ -147,7 +159,7 @@ Use `or` to combine multiple expressions. 2. **Use Regular Expressions**: Make complex content searches more powerful: ```plaintext - content:/func\s+\w+\s*\(/ + content:func\s+\w+\s*\( ``` 3. **Case Sensitivity**: Use `case:yes` for exact matches: @@ -160,40 +172,6 @@ Use `or` to combine multiple expressions. file:".*\.go" content:"package main" ``` -### EBNF Summary - -```ebnf -query = expression , { "or" , expression } ; - -expression = negation - | grouping - | field ; - -negation = "-" , expression ; - -grouping = "(" , query , ")" ; - -field = ( ( "archived:" | "a:" ) , boolean ) - | ( ( "case:" | "c:" ) , ("yes" | "no" | "auto") ) - | ( ( "content:" | "c:" ) , text ) - | ( ( "file:" | "f:" ) , text ) - | ( ( "fork:" | "f:" ) , boolean ) - | ( ( "lang:" | "l:" ) , text ) - | ( ( "public:" ) , boolean ) - | ( ( "regex:" ) , text ) - | ( ( "repo:" | "r:" ) , text ) - | ( ( "sym:" ) , text ) - | ( ( "branch:" | "b:" ) , text ) - | ( ( "type:" | "t:" ) , type ); - -boolean = "yes" | "no" ; -text = string | regex ; -string = '"' , { character | escape } , '"' ; -regex = '/' , { character | escape } , '/' ; - -type = "filematch" | "filename" | "file" | "repo" ; -``` - --- ### **Complex Query Examples** From 3be533aca40a2ea9e0f5be8a41d575265b459b77 Mon Sep 17 00:00:00 2001 From: Konrad Staniszewski Date: Fri, 6 Dec 2024 00:53:55 -0800 Subject: [PATCH 4/4] Update docs --- docs/query.md | 66 +++++++++++++++++---------------------------------- 1 file changed, 22 insertions(+), 44 deletions(-) diff --git a/docs/query.md b/docs/query.md index eb757da8..24d5390e 100644 --- a/docs/query.md +++ b/docs/query.md @@ -1,7 +1,6 @@ -# Zoekt Query Language Guide - -This guide explains the Zoekt query language, used for searching text within Git repositories. Zoekt queries allow combining multiple filters and expressions using logical operators, negations, and grouping. Here's how to craft queries effectively. +# Search Query Language Guide +This guide explains the search query language used by Sourcebot, which is a derivative of the Zoekt query language with some minor differences. Search queries allow for combining multiple filters and expressions using logical operators, negations, and grouping. Here's how to craft queries effectively. --- ## Syntax Overview @@ -13,6 +12,8 @@ A query is made up of expressions. An **expression** can be: Logical `OR` operations combine multiple expressions. The **`AND` operator is implicit**, meaning multiple expressions written together will be automatically treated as `AND`. +All expressions are evaluated as regular expressions unless wrapped with "". + --- ## Query Components @@ -33,8 +34,7 @@ Fields restrict your query to specific criteria. Here's a list of fields and the | `regex:` | | Regex pattern | Matches content using a regular expression. | `regex:/foo.*bar/` | | `repo:` | `r:` | Text (string or regex) | Filters repositories by name. | `repo:"github.com/user/project"` | | `sym:` | | Text | Searches for symbol names. | `sym:"MyFunction"` | -| `branch:` | `b:` | Text | Searches within a specific branch. | `branch:main` | -| `type:` | `t:` | `filematch`, `filename`, `file`, or `repo` | Limits result types. | `type:filematch` | +| `revision:` | `rev:` | Text | Searches within a specific branch or tag. | `revision:main` | --- @@ -84,6 +84,18 @@ Use `or` to combine multiple expressions. --- +### 5. **Exact Matching** + +Quotes "" works to match exactly what you are looking for, instead of using regular expressions. + +#### Examples: +- Find test.* exactly: + ```plaintext + content:"test.*" + ``` + +--- + ## Special Query Values - **Boolean Values**: @@ -92,7 +104,7 @@ Use `or` to combine multiple expressions. - **Text Fields**: Text fields (`content:`, `repo:`, etc.) accept: - Strings: `"my text"` - - Regular expressions: `/my.*regex/` + - Regular expressions: `my.*regex` - **Escape Characters**: To include special characters, use backslashes (`\`). @@ -104,7 +116,7 @@ Use `or` to combine multiple expressions. ``` - Match the regex `foo.*bar`: ```plaintext - content:/foo.*bar/ + content:foo.*bar ``` --- @@ -118,7 +130,7 @@ Use `or` to combine multiple expressions. 2. **Exclude archived repositories and match a regex**: ```plaintext - archived:no regex:/error.*handler/ + archived:no error.*handler ``` 3. **Find files named `README.md` in forks**: @@ -133,7 +145,7 @@ Use `or` to combine multiple expressions. 5. **Combine multiple fields**: ```plaintext - (repo:"github.com/example" or repo:"github.com/test") and lang:go + (repo:"github.com/example" or repo:"github.com/test") lang:go ``` --- @@ -147,7 +159,7 @@ Use `or` to combine multiple expressions. 2. **Use Regular Expressions**: Make complex content searches more powerful: ```plaintext - content:/func\s+\w+\s*\(/ + content:func\s+\w+\s*\( ``` 3. **Case Sensitivity**: Use `case:yes` for exact matches: @@ -160,40 +172,6 @@ Use `or` to combine multiple expressions. file:".*\.go" content:"package main" ``` -### EBNF Summary - -```ebnf -query = expression , { "or" , expression } ; - -expression = negation - | grouping - | field ; - -negation = "-" , expression ; - -grouping = "(" , query , ")" ; - -field = ( ( "archived:" | "a:" ) , boolean ) - | ( ( "case:" | "c:" ) , ("yes" | "no" | "auto") ) - | ( ( "content:" | "c:" ) , text ) - | ( ( "file:" | "f:" ) , text ) - | ( ( "fork:" | "f:" ) , boolean ) - | ( ( "lang:" | "l:" ) , text ) - | ( ( "public:" ) , boolean ) - | ( ( "regex:" ) , text ) - | ( ( "repo:" | "r:" ) , text ) - | ( ( "sym:" ) , text ) - | ( ( "branch:" | "b:" ) , text ) - | ( ( "type:" | "t:" ) , type ); - -boolean = "yes" | "no" ; -text = string | regex ; -string = '"' , { character | escape } , '"' ; -regex = '/' , { character | escape } , '/' ; - -type = "filematch" | "filename" | "file" | "repo" ; -``` - --- ### **Complex Query Examples**