Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-based filetype detection #726

Open
scop opened this issue May 21, 2023 · 2 comments
Open

Content-based filetype detection #726

scop opened this issue May 21, 2023 · 2 comments
Labels
question Uncertainty is involved

Comments

@scop
Copy link
Contributor

scop commented May 21, 2023

It seems file type detection is based on only the file (base)name and its extension.

Would be nice to improve on that some and peek into the file content, and take into account for example

  • Shebang, if any
  • File magic
  • Emacs style -*- something -*- embedded configs
  • Vi and friends configs like 'ex: filetype=sh(ditto forvi:` at least)

I imagine something like https://github.com/ebassi/xdg-mime-rs could be useful at least for the two first items in the above list.

@epage
Copy link
Collaborator

epage commented May 22, 2023

Concerns for each of these:

  • Translating what we find to our standard term for a file type
  • Deciding precedence
  • Whether to allow the user to extend it and how

Shebang, if any

This will be important for cases like rust-lang/rfcs#3424

Specifically for shebangs, we'd need to decide how to parse the various cases (whether /usr/bin/env is used, flags passed to it, its in a different path, etc)

@epage
Copy link
Collaborator

epage commented May 22, 2023

Forgot to add, xdg-mime-rs seems like it would be platform specific (loading a mime DB that won't be on all platforms) while we need to be cross-platform.

We also need to weigh out the costs of this since this can lead to extra IO

@epage epage added the question Uncertainty is involved label Dec 13, 2023
@epage epage changed the title RFE: better file type detection RFE: content-based filetype detection Dec 13, 2023
@epage epage changed the title RFE: content-based filetype detection Content-based filetype detection Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Uncertainty is involved
Projects
None yet
Development

No branches or pull requests

2 participants