Support context of lines with lower indentation #897

tokarenko · 2025-01-20T19:14:47Z

In addition to context of “N lines before/after match”, I suggest to add option of “lines with lower indentation”. Such context will be very useful for indentation-sensitive languages, like Python. This is described by Matt Brubeck in a blog post. There are implementations for Rust, Python and Haskell. I am not a Go programmer. Consider this as a feature request from me.

As a side question, please advise if Zoekt supports indexing of arbitrary indented text? If not, then how to implement it?

linear · 2025-01-20T19:14:51Z

SPLF-821 Support context of lines with lower indentation

keegancsmith · 2025-01-20T20:08:52Z

As a side question, please advise if Zoekt supports indexing of arbitrary indented text? If not, then how to implement it?

I'm unsure of what you mean by indexed. Do you mean serializing some sort of datastructure to quickly jump to maybe map indentation level changes to byte offsets in the file? If so no. However, the hard work is likely finding the matches in the first place. The way you would probably implement this is taking candidate matches and then post-processing it with the content of the file.

Zoekt does not directly support this and to be honest I don't think it should directly. I would encourage an API use with setting Whole to true in the SearchOptions then doing this post processing in the client. If you would like to experiment with code here I'd recommend doing that in a client (such as zoekt-webserver's or the neogrok project), or look at the places where SearchOptions.Whole is read and figure out a way to make zoekt's backend do it more efficiently.

Note: I doubt we would accept this patch since it can accomplish what you want just using the existing APIs with post-processing.

tokarenko · 2025-01-20T20:55:57Z

Thank you @keegancsmith for your quick reply and suggestions. My request is to provide context for a search result not only as fixed number of lines before/after a match, but also up to first unindented level, e.g.:

 102: component("net") {
 385:   if (!is_nacl) {
 386:     sources += [
 409:       "base/arena.cc",

Do you still consider such context not feasible for the Zoekt search backend?

keegancsmith · 2025-01-21T07:03:55Z

It's feasible to implement, but I don't think the feature meets the bar of being part of core zoekt. To expand on my thoughts a bit, this is a tradeoff on implementation complexity and product features. Given someone could implement this feature on top of the Zoekt API and I don't think this feature would be widely used (although I stand to be corrected) I don't think we should add direct support for it in core zoekt.

MrEasy · 2025-01-23T14:17:42Z

Would consider this very helpful.
There even was a draft commit in google/zoekt#127 (comment)

ecki · 2025-01-23T14:46:39Z

It's feasible to implement, but I don't think the feature meets the bar of being part of core zoekt.

Is there a different project for the Web UI? I would expect most users care about some context (smart or flat). And with missing context (and nested context) in the Web UI it feels really incomplete. Thats like arguing "grep does not need the -C switch you can just get the line number and extract the content yourself".

tokarenko · 2025-01-29T13:41:35Z

I suggest to reconsider this feature request and put it on vote maybe. This feature request is backed up by many of my fellow engineers. We plan to try out Zoekt on our codebase of indentation-sensitive programming languages and other indentation sensitive formats, like YAML and other configuration files. Searching indentation-sensitive format without indentation-aware context is hard. Post-processing full text of matches in many large files will severely degrade search performance. If obtaining such a context is a heavy load for Zoekt then it could be at least limited to current context window, e.g. of N lines of context above display all lines with lower indentation. I think that demand for this feature may be assesed against VSCode sticky scroll feature. It is among the Top-Ranking Issues (All Time) with 324 👍

ecki · 2025-01-29T17:18:28Z

Dimitrii @tokarenko can I ask a related question, what frontend do you use for Zoekt and does it already show normal (I.e. ±2 lines) context in search results?

tokarenko · 2025-01-29T19:00:11Z

Bernd @ecki, I think that Neogrok should work according to their demo page. We have not tried Neogrok as we are trying to integrate Zoekt with our custom frontend.

MrEasy · 2025-01-30T08:11:02Z

Bernd @ecki, I think that Neogrok should work according to their demo page. We have not tried Neogrok as we are trying to integrate Zoekt with our custom frontend.

@tokarenko yes Neogrok looks good regarding this - thanks!

Example with 4 context lines:

jtibshirani · 2025-01-30T17:07:42Z

If I'm following the conversation right, there are actually two separate requests.

First, Zoekt can already return context around a match to help understand where the line sits in the file, through the SearchOptions.NumContextLines argument. If you set this, I'd also recommend trying SearchOptions.ChunkMatches = true. The ChunkMatches setting combines nearby matching lines into single chunks, which ensures your search results don't overlap with each other. Sourcegraph uses both these settings as the default (see www.sourcegraph.com/search).

@tokarenko is asking for something a bit different, which is to provide a full file outline before the matched line based on indentation (inspired by https://limpet.net/mbrubeck/2010/01/12/outline-grep.html). As @keegancsmith said, we don't plan to support this natively in Zoekt, because we don't feel it's the right trade-off. However it should be possible to implement this client-side using SearchOptions.Whole, which returns the whole file, then parsing the result.

If you find the latency of loading whole files to be unacceptable, please file a issue and we are happy to look into optimizing it. We are also open to revisiting the decision if we hear from more users that something like "outline grep" is useful -- we'll revisit the complexity/ utility trade-off.

ecki · 2025-01-31T01:36:02Z

First, Zoekt can already return context around

Yes right apologies for bringing that up here, can’t be selected in the standard web frontend. But nevermind we switched to neogrok for that now.

tokarenko · 2025-02-14T07:18:19Z

@ecki , @jtibshirani , we tried the following code in between of lines L296-L298 of contentprovider.go. Now we get context of indented lines without noticing any performance issues. I suggest to reevaluate my feature request.

		curr_str := string(data[lineStart:nextLineStart])
		curr_len := len(curr_str)
		curr_trim := len(strings.Trim(curr_str, " "))
		curr_ident := curr_len - curr_trim
		if curr_ident > 0 {
			finalMatch.Context = make(map[string]string)
			lines := strings.Split(string(data[:lineStart]), "\n")
			for i := len(lines) - 2; i >= 0; i-- {
				in_str := lines[i]
				curr_len := len(in_str)
				curr_trim := len(strings.Trim(in_str, " "))
				if curr_ident > curr_len-curr_trim {
					finalMatch.Context[fmt.Sprintf("%v", i+1)] = in_str
					curr_ident = curr_len - curr_trim
				}
				if curr_ident == 0 {
					break
				}
			}
		}

jtibshirani · 2025-02-14T17:48:45Z

I still don't feel it's the right trade-off to add this natively to Zoekt. Have you tried implementing it on top of Zoekt using SearchOptions.Whole as I suggested? If the performance of that is okay, then that's the best way to go for now.

tokarenko · 2025-02-15T14:49:40Z

We decided to not even try to transfer whole files for all the matches to parse just a few lines of context. It seems inefficient a priori. To get a single whole file to work with we got PR approved.
After the implementation I find no tradeoffs here. Regarding the product feature, I still hold the opinion that a code search engine should provide an adequate context for search results. Indentation-sensitive programming languages, like Python, require an indentation-aware context. The implementation complexity seems to be minimal.

keegancsmith closed this as completed Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support context of lines with lower indentation #897

Support context of lines with lower indentation #897

tokarenko commented Jan 20, 2025

linear bot commented Jan 20, 2025

keegancsmith commented Jan 20, 2025

tokarenko commented Jan 20, 2025

keegancsmith commented Jan 21, 2025

MrEasy commented Jan 23, 2025 •

edited

Loading

ecki commented Jan 23, 2025 •

edited

Loading

tokarenko commented Jan 29, 2025

ecki commented Jan 29, 2025

tokarenko commented Jan 29, 2025

MrEasy commented Jan 30, 2025

jtibshirani commented Jan 30, 2025 •

edited

Loading

ecki commented Jan 31, 2025

tokarenko commented Feb 14, 2025

jtibshirani commented Feb 14, 2025

tokarenko commented Feb 15, 2025

Support context of lines with lower indentation #897

Support context of lines with lower indentation #897

Comments

tokarenko commented Jan 20, 2025

linear bot commented Jan 20, 2025

keegancsmith commented Jan 20, 2025

tokarenko commented Jan 20, 2025

keegancsmith commented Jan 21, 2025

MrEasy commented Jan 23, 2025 • edited Loading

ecki commented Jan 23, 2025 • edited Loading

tokarenko commented Jan 29, 2025

ecki commented Jan 29, 2025

tokarenko commented Jan 29, 2025

MrEasy commented Jan 30, 2025

jtibshirani commented Jan 30, 2025 • edited Loading

ecki commented Jan 31, 2025

tokarenko commented Feb 14, 2025

jtibshirani commented Feb 14, 2025

tokarenko commented Feb 15, 2025

MrEasy commented Jan 23, 2025 •

edited

Loading

ecki commented Jan 23, 2025 •

edited

Loading

jtibshirani commented Jan 30, 2025 •

edited

Loading