Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support context of lines with lower indentation #897

Closed
tokarenko opened this issue Jan 20, 2025 · 15 comments
Closed

Support context of lines with lower indentation #897

tokarenko opened this issue Jan 20, 2025 · 15 comments

Comments

@tokarenko
Copy link

In addition to context of “N lines before/after match”, I suggest to add option of “lines with lower indentation”. Such context will be very useful for indentation-sensitive languages, like Python. This is described by Matt Brubeck in a blog post. There are implementations for Rust, Python and Haskell. I am not a Go programmer. Consider this as a feature request from me.

As a side question, please advise if Zoekt supports indexing of arbitrary indented text? If not, then how to implement it?

Copy link

linear bot commented Jan 20, 2025

@keegancsmith
Copy link
Member

As a side question, please advise if Zoekt supports indexing of arbitrary indented text? If not, then how to implement it?

I'm unsure of what you mean by indexed. Do you mean serializing some sort of datastructure to quickly jump to maybe map indentation level changes to byte offsets in the file? If so no. However, the hard work is likely finding the matches in the first place. The way you would probably implement this is taking candidate matches and then post-processing it with the content of the file.

Zoekt does not directly support this and to be honest I don't think it should directly. I would encourage an API use with setting Whole to true in the SearchOptions then doing this post processing in the client. If you would like to experiment with code here I'd recommend doing that in a client (such as zoekt-webserver's or the neogrok project), or look at the places where SearchOptions.Whole is read and figure out a way to make zoekt's backend do it more efficiently.

Note: I doubt we would accept this patch since it can accomplish what you want just using the existing APIs with post-processing.

@tokarenko
Copy link
Author

Thank you @keegancsmith for your quick reply and suggestions. My request is to provide context for a search result not only as fixed number of lines before/after a match, but also up to first unindented level, e.g.:

 102: component("net") {
 385:   if (!is_nacl) {
 386:     sources += [
 409:       "base/arena.cc",

Do you still consider such context not feasible for the Zoekt search backend?

@keegancsmith
Copy link
Member

It's feasible to implement, but I don't think the feature meets the bar of being part of core zoekt. To expand on my thoughts a bit, this is a tradeoff on implementation complexity and product features. Given someone could implement this feature on top of the Zoekt API and I don't think this feature would be widely used (although I stand to be corrected) I don't think we should add direct support for it in core zoekt.

@MrEasy
Copy link

MrEasy commented Jan 23, 2025

Would consider this very helpful.
There even was a draft commit in google/zoekt#127 (comment)

@ecki
Copy link

ecki commented Jan 23, 2025

It's feasible to implement, but I don't think the feature meets the bar of being part of core zoekt.

Is there a different project for the Web UI? I would expect most users care about some context (smart or flat). And with missing context (and nested context) in the Web UI it feels really incomplete. Thats like arguing "grep does not need the -C switch you can just get the line number and extract the content yourself".

@tokarenko
Copy link
Author

I suggest to reconsider this feature request and put it on vote maybe. This feature request is backed up by many of my fellow engineers. We plan to try out Zoekt on our codebase of indentation-sensitive programming languages and other indentation sensitive formats, like YAML and other configuration files. Searching indentation-sensitive format without indentation-aware context is hard. Post-processing full text of matches in many large files will severely degrade search performance. If obtaining such a context is a heavy load for Zoekt then it could be at least limited to current context window, e.g. of N lines of context above display all lines with lower indentation. I think that demand for this feature may be assesed against VSCode sticky scroll feature. It is among the Top-Ranking Issues (All Time) with 324 👍

@ecki
Copy link

ecki commented Jan 29, 2025

Dimitrii @tokarenko can I ask a related question, what frontend do you use for Zoekt and does it already show normal (I.e. ±2 lines) context in search results?

@tokarenko
Copy link
Author

Bernd @ecki, I think that Neogrok should work according to their demo page. We have not tried Neogrok as we are trying to integrate Zoekt with our custom frontend.

@MrEasy
Copy link

MrEasy commented Jan 30, 2025

Bernd @ecki, I think that Neogrok should work according to their demo page. We have not tried Neogrok as we are trying to integrate Zoekt with our custom frontend.

@tokarenko yes Neogrok looks good regarding this - thanks!

Example with 4 context lines:

Image

@jtibshirani
Copy link
Member

jtibshirani commented Jan 30, 2025

If I'm following the conversation right, there are actually two separate requests.

First, Zoekt can already return context around a match to help understand where the line sits in the file, through the SearchOptions.NumContextLines argument. If you set this, I'd also recommend trying SearchOptions.ChunkMatches = true. The ChunkMatches setting combines nearby matching lines into single chunks, which ensures your search results don't overlap with each other. Sourcegraph uses both these settings as the default (see www.sourcegraph.com/search).

@tokarenko is asking for something a bit different, which is to provide a full file outline before the matched line based on indentation (inspired by https://limpet.net/mbrubeck/2010/01/12/outline-grep.html). As @keegancsmith said, we don't plan to support this natively in Zoekt, because we don't feel it's the right trade-off. However it should be possible to implement this client-side using SearchOptions.Whole, which returns the whole file, then parsing the result.

If you find the latency of loading whole files to be unacceptable, please file a issue and we are happy to look into optimizing it. We are also open to revisiting the decision if we hear from more users that something like "outline grep" is useful -- we'll revisit the complexity/ utility trade-off.

@ecki
Copy link

ecki commented Jan 31, 2025

First, Zoekt can already return context around

Yes right apologies for bringing that up here, can’t be selected in the standard web frontend. But nevermind we switched to neogrok for that now.

@tokarenko
Copy link
Author

@ecki , @jtibshirani , we tried the following code in between of lines L296-L298 of contentprovider.go. Now we get context of indented lines without noticing any performance issues. I suggest to reevaluate my feature request.

		curr_str := string(data[lineStart:nextLineStart])
		curr_len := len(curr_str)
		curr_trim := len(strings.Trim(curr_str, " "))
		curr_ident := curr_len - curr_trim
		if curr_ident > 0 {
			finalMatch.Context = make(map[string]string)
			lines := strings.Split(string(data[:lineStart]), "\n")
			for i := len(lines) - 2; i >= 0; i-- {
				in_str := lines[i]
				curr_len := len(in_str)
				curr_trim := len(strings.Trim(in_str, " "))
				if curr_ident > curr_len-curr_trim {
					finalMatch.Context[fmt.Sprintf("%v", i+1)] = in_str
					curr_ident = curr_len - curr_trim
				}
				if curr_ident == 0 {
					break
				}
			}
		}

@jtibshirani
Copy link
Member

I still don't feel it's the right trade-off to add this natively to Zoekt. Have you tried implementing it on top of Zoekt using SearchOptions.Whole as I suggested? If the performance of that is okay, then that's the best way to go for now.

@tokarenko
Copy link
Author

We decided to not even try to transfer whole files for all the matches to parse just a few lines of context. It seems inefficient a priori. To get a single whole file to work with we got PR approved.
After the implementation I find no tradeoffs here. Regarding the product feature, I still hold the opinion that a code search engine should provide an adequate context for search results. Indentation-sensitive programming languages, like Python, require an indentation-aware context. The implementation complexity seems to be minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants