Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search provider API feedback #50788

Closed
chrmarti opened this issue May 30, 2018 · 14 comments
Closed

Search provider API feedback #50788

chrmarti opened this issue May 30, 2018 · 14 comments
Assignees
Labels
search Search widget and operation issues under-discussion Issue is under discussion for relevance, priority, approach
Milestone

Comments

@chrmarti
Copy link
Contributor

Testing #50497

  1. provideFileSearchResults comes without query argument, that will prevent implementations from optimizing the amount of data they need to provide. E.g., a large remote filesystem might prefer doing the filtering based on the user's input on the remote side. Or, e.g., a local search engine might prefer to run in a separate process (for performance or runtime reason) and benefit from doing the filtering there. Also include a maxResults: number option to further help limit the amount of data being passed around (we currently only show 500 entries in QuickOpen, IIRC).

  2. previewOptions is on SearchOptions, should it be on TextSearchOptions?

  3. There was some concern around using strings instead of URIs. Maybe the performance impact needs to be better understood. Although with large filesystems and a long workspace folder path, repeating the workspace folder path can have a significant memory overhead. /cc @jrieken

  4. I introduced an exists option at one point to optimize the workspaceContains trigger, but that could also be passed as maxResults: 1 (See 1.).

  5. includes and excludes are a specific glob mini-language. Maybe we should have type Glob = string; with the documentation that is currently on GlobPattern?

  6. Should provideFileSearchResults be mandatory? Listing files seems almost like a subset of search for text in the same files.

Documentation will go a long way in making this API understood. We should take a fresh look when that is in place.

@jrieken
Copy link
Member

jrieken commented May 30, 2018

There was some concern around using strings instead of URIs.

URIs must be used because that's our way to knowing what's a file path. I understand the concern but I measure (without easy optimisations we have discussed) that toJSON-fying 10000 uri takes ~40ms. Hardly a blocker, but you might have measure something else.

@roblourens roblourens added search Search widget and operation issues api-proposal labels May 30, 2018
@roblourens roblourens added this to the June 2018 milestone May 30, 2018
@roblourens
Copy link
Member

  1. I think it makes sense to omit 'query' so filtering will be consistent across different providers. We could have one provider doing a fuzzy search, another just doing a substring match, another doing something else, and it would make quickopen seem broken. And this lets us just invoke the provider once, so we can own and optimize the cache filtering logic to make quickopen fast. I agree about maxResults, wrote notes about my plans here: Search Provider extension API #47058 (comment)

  2. Yes...

  3. I'll revisit this. Besides the toString issue, there was also slowness around the unnecessary string operations from constructing, deconstructing, then reconstructing URIs for every result. File search builds a tree of relative path fragments to do its searching and matching, and it was simpler to just work in relative paths.

  4. Yes

  5. Good idea. I could also reuse GlobPattern although the 'base' part is redundant.

  6. You could have separate providers for text/file search. I don't want to limit anyone.

roblourens added a commit that referenced this issue May 30, 2018
@chrmarti
Copy link
Contributor Author

  1. I agree. Still, requiring the provider to send a complete listing of files will severely limit the performance for remote filesystems. Can we foresee a second method for file search that would take a query parameter? We can then add that later in case there is real demand for that.

@roblourens
Copy link
Member

  1. I'm open to it, we will get a chance to validate this with the live share extension.

@hansonw
Copy link

hansonw commented May 31, 2018

+1 to #1. We've been testing the remote filesystem API at Facebook with filesystems that have more than 1 million files, so returning the full list of files is a non-starter unfortunately :( We'd appreciate going back to the older API that includes the query string. cc @siegebell @bolinfest

sbatten pushed a commit to sbatten/vscode that referenced this issue Jun 4, 2018
* Merge NPM Scripts: Added configuration option to change default click action microsoft#49282

* fix microsoft#50560

* Fix getmac test (microsoft#48804)

* Cleaning up some typos in vscode.d.ts and vscode.proposed.d.ts (microsoft#50533)

* yarn.lock changes

* grid - log serialized state if restoring failed

* Fixes microsoft#50382: Align .mtku with .detected-link

* debug issues back to isidorn

* Hide interactive playground commands from command palette (fixes microsoft#49706)

* grid - drop outline as focus indication and rely on dimmed state

* grid - allow to close empty group when closing all editors of a group

* grid - get preferredSize from grid

* Fixes microsoft#43003: Cannot specify a global problemMatcher in tasks.json

* cleanup integrated terminal support for debugging

* grid - action wording tweaks

* microsoft#45663 Use core translations asset

* grid - allow to close dirty diff editor if master is still opened in same group

* Support to run npm install from a package node

* grid - log previous UI state upon error

* grid - stronger border top color

otherwise when a tab is showing in a vertical split, the border collides with the sash border

* remove todo

* do not append logLevel

microsoft#47774

* Implement Go To Next/Previous Breakpoint editor actions

* Include enabledOnly in debug model interface

* remove from schema

microsoft#47774

* microsoft#45663 fix compilations

* change openBreakpointSource to use IBreakpoint

* return passed context if resourceOrContext is not useful

* fix off by one issue

* Simplify implementation

* return context if it has a groupId

* polish

* Fixes microsoft/monaco-editor#891: Focus editor when returning from context menu

* go to next / previosu breakpoint minor polish

* Add Endgame for 1.24

* fix index computing for editor close case

* Fix microsoft#49145

- Enable/Disable viewlets by views when context
- Move viewlet enablement/disablement to views service
- Fix - adopt if a viewlet extension is disabled/removed

* Avoid command palette entry (microsoft#49340)

* [handlebars] code-folding algorithm is less useful than Indentation-based one. Fixes microsoft#48457

* use upper lang id

* Fixes microsoft#49378

* microsoft#49145 Store placeholder states for all viewlets

* Add `logicalLine` argument to `mutlicursor.insertBelow` and `mutlicursor.insertAbove`

* Removed duplicate loop

* eliminate code duplication; use resolveWorkspaceFolder()

* microsoft#45663 Get all asset types

* [css] update service

* [html] update services

* grid - fix compile

* add User-Id to marketplace build requests

* Improve rendererType description

Fixes microsoft#50617

* fix microsoft#50672

* Fixes microsoft#49173

* fix microsoft#50321

* fix microsoft#50670

* Install option not needed when we have Install and Restart Fixes microsoft#50669

* Fixes microsoft#50616, Showing/hiding details in Issue Reporter resets scroll position

* Fix microsoft#50642, fix microsoft#50673

* Issue reporter and process explorer windows should close when the parent window closes, fixes microsoft#50716

* Issue Reporter should clear the error state when selecting an extension, fixes microsoft#50702

* Extracting methods

* Make sure we update the webview's state on setState

* Avoid prompting for same file type in the same session microsoft#50537

* Use language name instead of locale in the prompt microsoft#50738

* Use correct failed error code for webview resource load error

* Mark unused parameter

* Restore restart

* Try serializing webviews even if they have not called setState

Fixes microsoft#50752

* Change parameter name to match interface name

Fixes microsoft#50745

* Fix non-serializable webviews being revived if they called setState

* Include recommended extensions in telemetry events to determine success

* Fix microsoft#49777 - Emmet balance In after balance out should go back to initial selection and not first child (microsoft#49996)

* Update Emmet - balance.ts

* Cover all cases

* Update prompt msg Fixes microsoft#50738

* Fix microsoft#50632

* search-rg has no icon

* EH search - replace search.enableSearchProviders with a search-rg specific setting

* fix microsoft#50742

* fix microsoft#50691

* part of microsoft#50760

* smoke: uncomment debug suite

* smoketest: use yarn

* grid - allow to close empty groups via middle click (microsoft#50733)

* careful with path comparisons on windows

related to microsoft#50760

* smoke: run yarn verbose

* fix microsoft#50710

* Improve description (microsoft#43208)

* fix microsoft#50722

* Fix microsoft#50663

* Fix microsoft#47586

* fixes microsoft#50609

* fixes microsoft#50026

* 💄

* fix  microsoft#50773

* improve perf of findId and getElementById, microsoft#50575

* don't show outline when there are more then 7500 symbols, microsoft#50575

* fix typos

* polish for microsoft#50667

* fix microsoft#50700

* grid keep sizing after branch node demotion

fixes microsoft#50564

* grid - for now remove editorGroup.activeEmptyBackground

* fix microsoft#50726

* grid: resize views when branch is demoted

fixes microsoft#50675

* fix microsoft#50727

* fix microsoft#50734

* remove preview rant from search.location

* splitview: dont propagate white-sapce property

fixes microsoft#50720

* fix microsoft#50774

* Fixes microsoft#50296

* fix  microsoft#50207

* Fixes microsoft#24670: Add API to determine if the task service supports parallel task execution.

* fix microsoft#50741

* Fixes  microsoft#49400: Cannot read property 'then' of undefined

* Fix microsoft#50717

* Fix microsoft#48901

* polish debug.toolBarLocation

fixes microsoft#50786

* Fix microsoft#50717

* missing compilation

* Fix microsoft#48225

* fix microsoft#50711

* Fix errors installing extension from Welcome page (fixes microsoft#50753)

* fixes microsoft#50794

* fixes microsoft#50787

* continuous build: use vsts npm feed

* Fix for microsoft#50792: empty pre post scripts not rendered properly

* Fix microsoft#50728

* continuous build: use vsts feed

* Fix for microsoft#50792 npm exclude doesn´t work for folder names

* Move previewOptions, microsoft#50788

* build: remove npm auth

* better fix for microsoft#50710

* Revert "Merge pull request microsoft#49790 from usernamehw/raw_settings"

This reverts commit f92a162, reversing
changes made to fc710af.

* Remove unused variable

* Fix regression in C# TextMate grammar related to lambda parameters

This tweak fixes a regression recently introduced in the C# TextMate grammar around
implicitly-typed lambda parameters.

* dotnet/csharp-tmLanguage#119
* dotnet/csharp-tmLanguage#120

* Fix parameter not being escaped

Fixes microsoft#49057

* wip: grid.getNeighborViews

* Make sure we use normalized path for compare

Fixes microsoft#50760

(the path should already have been normalized at this point so I don't think there is any user impact)

* Clean up importModuleSpecifier wording

Fixes microsoft#50817

* Remove duplicate strings and small cleanup

* Spelling fix Ouoting -> Quoting

* Restore default settings editor keybinding

* Dont trigger emmet for # in selectors Fixes microsoft#49269

* Hide settings editor feedback button in stable

* Use better colors for markdown code blocks

* Moving webview api back to proposed for more polishing

microsoft#49022

* Shorten showUnused setting name

Fixes microsoft#50648

* fix microsoft#50678

* Don't include imports in js/ts document symbol results

Fixes microsoft#50829

* smoketest: remove verbose from yarn

* smoketest: clear state from failing search tests

* minor rename

* Fix markdown preview not setting context properly on first creation

Fixes microsoft#50558

* smoketest: update to package-lock.json

fixes microsoft#50857

* smoketest: bring back search tests

* microsoft#48901 Expand by default only in custom viewlets

* fixes microsoft#50812

* open editors: clear focus when group without editors is active

fixes microsoft#50563

* Fix MaxNumber incrementFileName bug

* Fixes microsoft#48758: Do not import standalone editor modules

* fixes microsoft#50864

* Use `MAX_SAFE_SMALL_INTEGER`

* Fixes microsoft#50858: Task api is still marked as proposed

* fixes microsoft#50737

* fix microsoft#50867

* Fix: showQuickPick keeps showing infinite progress (fixes microsoft#50863)

* Clarify for translation (fixes microsoft#50634)

* fixes microsoft#50870

* fix microsoft#50479

* grid - todo@grid => todo@ben

* grid - adopt new getNeighborViews() method

* grid - add workbench.action.closeEditorsAndGroup

* open editors: close group action

fixes microsoft#50625

* open editors: use actionRunner for contexts

fixes microsoft#50621

* [css][html] update service

* grid: getNeighborViews

* grid - adopt wrapping

* Some menu action clients (ActionBar) handle alternative actions on their own so the alternative actions should be ignored

fixes microsoft#50581

* fillInActionBarActions and fillInContextMenuActions for clarity

* Remove inconsistent use of active in terminal API

Related microsoft#48434

* Try using fixAllDescription for js/ts quick fixes

Fixes microsoft#42549

* Fixes microsoft#50828, code-insiders --status triggered exception in main process

* Pick up ts 2.9.1 final

* Fix for microsoft#50868 and microsoft#50884

* Still gray out vars even if user has disabled suggestions

Fixes microsoft#50890

* grid: fix neighbor npe issue

* grid: more getNeighborViews tests

* grid: use boundaries for neighbor finding

* grid: getNeighborViews perf

* grid: add todo

* 2018-05-31. Merged in translations from Transifex.

* Update GDPR annotation

* update message

* Add specific `refactor.move` scope for js/ts move to new file action

* Use localized language name when available

* Add proper contribution schema for ts server pluguins

microsoft#50682

* Fix extra word

* Fix microsoft#50909 - add link to new settings editor from old one

* Left-align settings editor preview prompt

* grid - fix issues with touchbar updating

* grid - use viewColumnToEditorGroup in more places

* grid - close all groups should preserve first group

* Reapply fix for microsoft#50790

* grid - assign some default keybindings for focus group

* Fixes microsoft#50763: All running Visual Studio code instances shutdown all at once

* grid - 💄

* fixes microsoft#50926

* show alternative actions when alt is pressed, not only on mouse over

* open editors: click on groups should activate them

* update description (for microsoft#50677)

* Fix microsoft#50835

* grid - tweak some groups access order

* one more time, fix microsoft#50321

* beware of keybinding command null arguments

fixes microsoft#50821

* grid - prevent multiple confirmation dialogs for the same editor to show up (for  microsoft#50539)

* smoketest: go-to-definition is flaky

fixes microsoft#50944
fixes microsoft#49107

* Fixes microsoft#50943: TSC version 2.9.x in watch mode prints different end compile message

* grid - fix bad === for editors

* 💄 grid, cleanup getViews()

* Update distro commit and third party notices

* Use localizedLanguageName instead of languageNameLocalized which doesnt exist

* Remove now unneeded any

* Make sure unused diags are still updated when suggestions are disabled

Fixes microsoft#50890

* Fix unused var

* Update version

* Update issue templates, microsoft#49380

* Fix microsoft#50893 - escape regex chars when copy search string from editor

* Fix microsoft#50985 - don't take editor text when toggling search details
@roblourens
Copy link
Member

I prototyped this today in https://github.com/Microsoft/vscode/tree/roblou/cachedSearchProvider

provideFileSearchResults also gets a query, typed like

export interface FileSearchQuery {
	pattern: string;
	cacheKey?: string;
}

where cacheKey is different between quickopen instances. The provider could ignore this, but this lets the provider know whether it can be filtering down the same set of results, or must start a fresh search.

To get the caching, fuzzy matching, and sorting code into the extension, I had to copy several hundred lines of code from other places in vscode into the extension. This code should stay in sync with changes on the vscode side. Since we don't have another way to share code between vscode and builtin extensions, we should consider copying that shared code into another repo and publishing it as an npm module.

Then since the implementation is complicated, we should publish this code as a library that search providers can use if they want to use the simpler API (just list out all files in the workspace) and be guaranteed to fuzzy filter in a consistent way. It will also be useful for what live share needs to do - they can use this library on the host side, hook it up to findFiles... and that's it.

Implementing sibling clauses gets complicated here. We decided that they should be hidden from the SearchProvider API and implemented by vscode. Currently we get files from the provider once and run them through the list of include/exclude patterns, including sibling clause patterns. This is slow but only happens once. With this new model, we will have to run all results through glob.ts on every keypress in quickopen which will be slightly slower.

@roblourens
Copy link
Member

@hansonw, we are still discussing what the API should look like and I would like to get a better feel for the needs of implementors. Can you tell me more about how you would implement search against a remote filesystem? Is there a preexisting API that you would like to invoke on every keypress, and is it able to handle fuzzy matching and include/excludes by glob pattern?

Feel free to email or DM me if you'd prefer to discuss offline.

@hansonw
Copy link

hansonw commented Jun 13, 2018

@roblourens Yup, that's correct! You can take a look at what we've done with Atom + Nuclide: here is the "queryFuzzyFile" API that we've implemented right now (which works with remote filesystems):

https://github.com/facebook/nuclide/blob/master/pkg/nuclide-fuzzy-file-search-rpc/lib/FuzzyFileSearchService.js#L71

We've built a custom native library (https://github.com/facebook-atom/nuclide-prebuilt-libs/tree/master/fuzzy-native) to store the 1M+ files in memory and do fuzzy matching with a similar API to the above.

@roblourens
Copy link
Member

Thanks, I see that you have some sort of cache for results in FileSearchProcess.js, what causes that cache to be invalidated? We will probably include a cacheKey in the call to a search provider which is the same while the user is typing in the same quickopen instance.

Also I see that you have a list of ignoredFiles, are those just filenames or can they be glob patterns or regexes? Our API has negative patterns (excludes) and positive patterns (includes), would you support both of those? We can either leave checking these patterns entirely up to the provider, or we can "double-check" all the results that come back against all the patterns, which will ensure that they're applied consistently.

@eamodio
Copy link
Contributor

eamodio commented Jun 18, 2018

Fyi here are a couple of other search provider implementations:
https://github.com/eamodio/vscode-remotehub/blob/feature/search/src/gitHubSearchProvider.ts

https://github.com/eamodio/vscode-remotehub/blob/feature/search/src/sourcegraphSearchProvider.ts

One thing, I'm currently not dealing with in either of those implementations, is paging -- since these api, won't always return all the results. It is expected that paging of results would be something that would have to be inside the provider and all results would have to wait for the looping through pages to get them all?

@roblourens
Copy link
Member

I'll close this in favor of the master issue #47058. I've implemented almost everything brought up here - file search now gets the query, both APIs are URI-based, added some docs. There's still a TODO with maxResults and hitLimit but there's a note in the other issue about that.

@roblourens roblourens added under-discussion Issue is under discussion for relevance, priority, approach and removed api-proposal labels Jul 4, 2018
@roblourens
Copy link
Member

@hansonw I'm trying to make sure I understand one more thing about your needs for search. It looks like you create a cache of file paths in initFileSearchForDirectory which you then search inside. When exactly is this cache created/invalidated?

I ask because internally vscode has used a similar cache that is created when quickopen is opened, then we search in the cache as the user types, and is destroyed when quickopen is closed. I'm trying to understand whether this needs to be reflected in the API. In other words, do you need to know about this search "session" to manage a cache, or would it fit your needs to have a stateless API that has no "cache key" or state, and is invoked on each keypress during quickopen?

@hansonw
Copy link

hansonw commented Jul 16, 2018

Hi @roblourens, apologies for the delayed response! That's actually not really a cache, per se - we start up a process for each project root, which holds the list of files in memory (in a native Node modules, which allows very fast multi-threaded searching over all files.) It wouldn't really make sense to tear down & restart this process for every unique session, so it manages caching internally in its own way (for example, we re-filter the results of the previous query if it was a prefix of the current query).

Only if the user changes the include/exclude flags do we restart the process. We expect these config changes to be very infrequent so it's easiest to reflect these flags by having a different file index, rather than paying the cost of applying the filter at query time.

To answer your question more directly: I don't think we really need to know about the session & cache key. We'd be fine with a stateless API that is just invoked on each keypress.

@roblourens
Copy link
Member

Makes sense, thanks for the explanation.

@vscodebot vscodebot bot locked and limited conversation to collaborators Aug 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
search Search widget and operation issues under-discussion Issue is under discussion for relevance, priority, approach
Projects
None yet
Development

No branches or pull requests

5 participants