-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: is pure content-based detection possible? #186
Comments
I don't know all the details of the original linguist library that Enry emulates, but I think Enry uses more or less the same tagging strategy and requires a filename to seed the set of possible languages. It uses file contents to narrow the selection, but I don't believe linguist supports using the content alone. In general one can get reasonable guesses from content alone: Many languages have enough shibboleths to at least narrow the field. That's partly how file(1) works, for example. But that isn't how linguist is constructed, so it'd be a fairly substantial change. Cc @bzz in case I missed something. |
@quasilyte TL;DR there is one option you can try, but overall, main use-case for Enry is file-level language detection so its high-level API always assume the file name is present. I'm curious if you could you a little bit about your use-case here, for cases the file names are not available? More specifically, as @creachadair noted above, enry follows the design of github linguist so it consists of a number of Strategies for detecting a language of a given file. Typically, if there is no single 100% match, all strategies are applied sequentially to narrow down possible options. They are:
There is one strategy though, The good news is that it's exposed as low-level public API and you can try using it directly though The bad news is though - as it's a "last resort" strategy, an API is designed to disambiguate between given set of languages (typically, guess by previous strategies). So you have to pass in a slice of potential language name aliases. If you pass in aliases for all possible languages ( On Hope this helps! And looking forward learning more about your use case. |
For example, you extracted various snippets from the book or README or any other source. Instead of rolling something on my own, I tried enry.
It does help. Thank you! |
Closing as there is no further discussion and the question seems to be answered. |
If code snippet is extracted from somewhere and you don't have original filename anymore, it seems to be impossible to detect the language with enry.
For example, user might have a source code like:
It could be inferred to be Go.
I guess this falls out of enry features and responsibilities?
The text was updated successfully, but these errors were encountered: