Language detection #119325

isidorn · 2021-03-19T13:24:29Z

Creating this PR just as a sketch. After some discussions with team mates from Zurich here are some thoughts:

It defeats the purpose to have this functionality in an extension, since nobody will discover the extension and this functionality should help new users.

Here's the things which I think we need to do in order to have this in the core:

We need to convert the model to the tensorflow-js model and not have a node dependency. We need to do this so we can run in the browser. Without this, even for VS Code desktop we would have to run the classification in the shared process, since we are moving towards node free renderers, and putting additional work in the shared process is not good.
We should look into reducing the size of the model so we do not increase the install size too much. @TylerLeonhardt I think you already started looking into this
I believe this work is best suited for a new service in VS Code, for example ILanguageDetectionService, I created a skeleton for this and I can look into this further to have it integrated on the vscode side @isidorn
Figure out how often to call the service (after n characters) and what is the confidence we should use
We need to measure time performance for loading the model and classification
We need to measure memory footprint of model loading

If we get the above working in a good way then we need to:

Look into supporting JSON, xml and other not supported languages
Try to improve corner case classification - like JS/TS confusion

If all of this proves to be too much overhead we can look into some simpler heuristic to detect a language and not use machine learning.

@TylerLeonhardt let me know what you think, and feel free to add your name to some of the items you are interested in and also feel free to add items I might have forgot.

fixes #118455

jogo- · 2021-05-12T12:38:13Z

src/vs/workbench/services/languageDetection/common/languageDetection.ts

+export interface ILanguageDetectionService {
+	readonly _serviceBrand: undefined;
+
+	detectLanguage(contet: string): Promise<string | undefined>;


Typo: contet -> content

TylerLeonhardt · 2021-07-19T22:31:44Z

Closing this in favor of #128708

add service sketch

5357202

isidorn added the under-discussion Issue is under discussion for relevance, priority, approach label Mar 19, 2021

isidorn added this to the April 2021 milestone Mar 19, 2021

isidorn assigned TylerLeonhardt and isidorn Mar 19, 2021

isidorn marked this pull request as draft March 19, 2021 13:31

isidorn mentioned this pull request Mar 19, 2021

Automatic language classification for Untitled files #118455

Closed

isidorn modified the milestones: April 2021, May 2021 Mar 29, 2021

jogo- reviewed May 12, 2021

View reviewed changes

TylerLeonhardt modified the milestones: May 2021, June 2021 May 28, 2021

TylerLeonhardt modified the milestones: June 2021, July 2021 Jun 16, 2021

TylerLeonhardt closed this Jul 19, 2021

github-actions bot locked and limited conversation to collaborators Sep 7, 2021

isidorn deleted the isidorn/languageDetection branch November 2, 2023 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language detection #119325

Language detection #119325

isidorn commented Mar 19, 2021 •

edited

Loading

jogo- May 12, 2021

TylerLeonhardt commented Jul 19, 2021

Language detection #119325

Language detection #119325

Conversation

isidorn commented Mar 19, 2021 • edited Loading

jogo- May 12, 2021

Choose a reason for hiding this comment

TylerLeonhardt commented Jul 19, 2021

isidorn commented Mar 19, 2021 •

edited

Loading