-
Notifications
You must be signed in to change notification settings - Fork 29.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic language classification for Untitled files #118455
Comments
Some related issues:
languages-guessing
|
I would actually also use this for the CLI support for reading from stdin. Currently we create a tmp txt file that contains the contents that are piped into VS Code, but it would be awesome if we could change the language mode based on contents 👍 |
@bpasero cool, glad there are more use cases. |
@TylerLeonhardt did a great experiment, we plan to continue on this next milestone thus assigning to April |
Yeah unfortunately opening an untitled file from piping is currently not an option because we have no good way of talking from the one process directly into an untitled file. But you are probably right that this would be the right solution for that issue. However, do we consider to support language detection also for plain text files? I wonder if the detection should also run when you open a txt file without having a language mode set. If we think that makes sense, it would solve the case for piping too. |
In my eyes a .txt file is a plaintext file so I don't really think that scenario needs to be addressed but if we were to run the detect, we'd need to:
|
I am not sure the file extension needs to change, we can change the language today in any file without the need to change the file extension. And this is also persisted between restarts for as long as the editor is not closed. |
I'm not sure all language runtimes can handle non-expected file extensions (PowerShell can't, for example) so if you try to F5 debug a file, and that file is a I'd want the language mode to change and then F5 debugging/running in their terminal ( |
Yeah I think that is fine, I actually think #41614 is doable with some file watching tricks so I guess it is another good reason to look into it eventually. |
Assigning also @TylerLeonhardt since I know he is interested and half of June and most of July I will be on vacation. |
After discussion with @joaomoreno here are some of the things we should do in order to successfully ship this as part of VS Code:
@TylerLeonhardt feel free to update this list and let me know what you think fyi @yoeo |
A couple of pre-reqs:
Also if you'd like to play with the experience today (not fine tuned, but works...) the code is published as an extension here: |
Moving this to July as I think there are few things we are blocked on above that I don't think we'll figure out in the next week or so. |
Is it possible to increase the accuracy using language heuristics data from github/linguist? |
@4086606 Good idea. However that library is a ruby library so we would have a dependency on ruby, which is a no-go for us, since we want this working in the browser. Unless there is a js alternative? |
Should be able to compile the entire gem to WASM, but I do wonder about size |
If you succeed do let us know how it went and the size. Though I see this as step 2, only in the case that the initial model classification does not prove super accurate. |
It seems like we will need to take the YAML files from that library and reimplement the heuristics logic. You're right that it should be a step 2 seeing as it's rather blind to content and the accuracy is limited.. |
@isidorn didn't work out - github/linguist can only disambiguate file extensions. Guesslang could use the linguist samples to support every single Github language |
I was able to address the pre-reqs and published a new version of my extension that doesn't make a single network request: Next order of business is to understand how to bring this into Core. Also I used @yoeo's updated model with extra language supports and JSON + YAML is sooooo nice. |
Closing in favor of #129004 |
We could do automatic language classification for Untitled files so users do not have to explicitly choose what language mode to use.
There are already modules for this in Python https://github.com/yoeo/guesslang
However it would be cool to have something in Javascript. One idea is to use TensurFlowJS and try to reuse the guesslang model.
Nice demo by Tyler:
screen_recording_2021-03-17_at_9.27.13_pm.mov
The text was updated successfully, but these errors were encountered: