-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore other ways to identify file types #175
Comments
The addition technique could be inserted into this block: Lines 648 to 654 in 3d269d4
Might be better to search known mimetypes first since identification by file content is the more expensive operation. Associate each identified file type with ad hoc, unique mimetype. |
I support this. I intentionally kept it simple to start, looking at file extension only, but I agree it's time to enable more sophisticated techniques. I propose to add a configuration setting: # config.yml
...
mimetype_detection_hook: my_custom_module:my_sniifer which would enable you and anyone to experiment with this outside the tiled package like this: # my_custom_module.py
def my_sniffer(filepath):
...
return "..." The function may inspect the filename and, if it needs to, open the file and read as many bytes as it wants to. The return value should be MIME type, either a registered one like This would override the code you excerpted above, so it would be in total control over how types were determined. It could decide whether to copy the mimetype search approach as a first pass or to overrule it. If people developed "sniffers" that prove to be generally useful, we can always move them into tiled proper at some later point. Either way, I think it will be important to enable people who deploy tiled to customize the sniffer behavior like this on their own. What do you think? |
That seems very general. I like it. |
@prjemian This is now implemented in v0.1.0a67 and documented at https://blueskyproject.io/tiled/how-to/read-custom-formats.html. Let me know if you get a chance to try it out on SPEC or NeXus. |
Starting to look at this now. Case 2 is the most likely scenario since our data files may have extensions. Yet that extension cannot be trusted to be informative when the extension content is overloaded for various data formats (such as The interface is called for each file: # custom.py
def detect_mimetype(filepath, mimetype):
if mimetype is None:
# If we are here, detection based on file extension came up empty.
...
mimetype = "text/csv"
return mimetype While this could become time-expensive when repeating over a directory structure with many similar files (a typical pattern), it could be optimized. One optimization (in the custom handler) could be a sense of recognition that files in a directory likely follow a pattern, such as any combination of these rules:
Even if that handling is better suited to a class, the optimizing class would be called from the |
Another optimization:
|
This aligns with two optimizations I have been working on:
|
The local mapping may provide more flexibility. Our directories tend to have mixed content such that an Unless you have some specifics in mind, let's work up some custom handlers and compare. |
Sounds good, let’s! |
When serving a directory of files, there may exist valid data files that lack a feature in the file name (such as a file extension) to identify the type of file. For example, there is no common file extension for SPEC data files and some users are accustomed to omitting a file extension. As shown in #174, the file extension may be too complicated to examine or not one of the recognized values. The
.dat
and.txt
extensions are also used for various types of data files, including CSV.Need some programmatic technique to identify the type of file, similar to the UNIX
file
command. Python examples includeis_spec_file(filename)
,isNeXusFile(filename)
Such routines could be called with unrecognized files.
The text was updated successfully, but these errors were encountered: