Add mimetype_detection_hook. #259

danielballan · 2022-07-21T13:54:08Z

@J-avery32 Would you check out this branch and give it a try?

pip install git+https://github.com/danielballan/tiled@mimetype-detection-hook

You need to place a Python script next to your config.yml file. It can be named anything:

# custom.py
def detect_mimetype(path):
    ...

That function has access to the file path and it can do anything it wants to decide what the mimetype is, including opening the file.

To tell DirectoryAdapter to use it, add this to the config.yml:

...
- path: /
  tree: files
  args:
    directory: ...
    mimetype_detection_hook: custom:detect_mimetype

If that function returns None, then the DirectoryAdapter will fall back to its usual method of looking at file extensions.

By default, Tiled will strip everything after the first . from the name. That is, thing.csv will appear as just thing. We do this to avoid "leaking" to the user details about how data happens to be stored, so that it can change over time without breaking our contract with the user.

In your case, dropping everything after the . may not be what you want. You can add:

...
- path: /
  tree: files
  args:
    directory: ...
    mimetype_detection_hook: custom:detect_mimetype
    key_from_filename: tiled.adapters.files:identity

Or write your own function in custom.py for transforming the filename into whatever you want to name this node in Tiled:

...
- path: /
  tree: files
  args:
    directory: ...
    mimetype_detection_hook: custom:detect_mimetype
    key_from_filename: custom:key_from_filename

If this works for you we'll add documentation and merge.

Closes #255
Closes #175

J-avery32 · 2022-07-21T18:55:49Z

Yes I'll try it right now

J-avery32 · 2022-07-21T19:13:49Z

Not sure if this is asking too much, but is there a way to set precedence? For example, making it so that this function will only be used after tiled has exhausted its default mimetypes and the mimetypes_by_file_ext option?

J-avery32 · 2022-07-21T19:14:55Z

Ideally that order could go in any direction.

J-avery32 · 2022-07-21T19:21:16Z

This way I don't have to return None for every other file extension I might encounter in my folder.

danielballan · 2022-07-21T20:39:00Z

Definitely not asking too much. :-)

I did give precedence some thought, but I am not 100% I got it right. My thinking was that the function go first so it can get in before the file extension detector might misidentify something.

I had in mind that it would look for a certain naming pattern, perhaps using regex in your case, and then just do nothing if it sees nothing it recognizes, implicitly returning None. The example in my unit test in this PR works that way: isn’t filed that don’t match what it is looking for are effectively ignored, and no extra code is required to handle that.

But if for any reason you need to run the file extension detection first, you can copy that part of the tiled code directly inside your custom function, at the top.

J-avery32 · 2022-07-21T21:09:02Z

Perhaps instead of copying the code would it be possible to wrap the other checker in a function that I can import?

J-avery32 · 2022-07-21T21:14:24Z

Though this is picky on my part, and I can do it any of the other ways.

J-avery32 · 2022-07-21T21:19:51Z

How would I access the variable mimetypes_by_file_ext in my function? Is there a way to import this?

danielballan · 2022-07-21T21:24:59Z

You'd have to duplicate it. Alternatively, we could provide it as an argument to the function, i.e.

def detect_mimetype(path, mimetypes_by_file_ext):
    ...

That occurred to me but seemed a little...overcooked. I think it depends on how frequently we need to do the mimetype check first. What does you function look like without the mimetype check?

J-avery32 · 2022-07-21T21:27:43Z

Without a mimetype check I would probably use a regex. But I'm not sure how guaranteed it is that all the files will have the numbers as their extension. How does duplicating it look like? Do you mean parsing the yaml within the detect_mimetype function?

J-avery32 · 2022-07-21T22:09:17Z

Also I am getting this error:

Traceback (most recent call last):

  File "/home/j/programming/work/tiled_als/venv/bin/tiled", line 8, in <module>
    sys.exit(main())

  File "/home/j/programming/work/tiled_als/venv/lib/python3.8/site-packages/tiled/commandline/main.py", line 613, in serve_config
    kwargs = construct_build_app_kwargs(parsed_config, source_filepath=config_path)

  File "/home/j/programming/work/tiled_als/venv/lib/python3.8/site-packages/tiled/config.py", line 122, in construct_build_app_kwargs
    tree = obj(**item["args"])

  File "/home/j/programming/work/tiled_als/venv/lib/python3.8/site-packages/tiled/adapters/files.py", line 330, in from_directory
    reader_factory = _reader_factory_for_file(

  File "/home/j/programming/work/tiled_als/venv/lib/python3.8/site-packages/tiled/adapters/files.py", line 677, in _reader_factory_for_file
    mimetype = mimetype_detection_hook(path)

TypeError: 'str' object is not callable

It seems that the mimetype_detection_hook is a string and not the actual function.

danielballan · 2022-07-21T23:17:31Z

Oops, yes I missed something important there. Please stand by.

danielballan · 2022-07-21T23:41:46Z

Fix for str issue pushed above.

I meant hard-coding the dictionary of custom mimetypes in the definition of the function, duplicating whatever is in the config. Not ideal, but possibly better than the alternative of overly-magical complexity in the precedence rules.

Concrete use cases will be very helpful in landing on a good design. Does your directory mix standard files like TIFF or CSV with these unusually-named ones?

J-avery32 · 2022-07-21T23:49:31Z

Yes, I have a csv file mixed in with them.

J-avery32 · 2022-07-21T23:51:20Z

It would not be too hard to parse the config file and then duplicate the mimetypes in that way.

J-avery32 · 2022-07-22T00:28:54Z

Thanks, it seems to be working from the python client, however for some reason the browser is now only returning 404s for the browse UI. Not sure if it's just on my end though.

danielballan · 2022-07-22T07:08:51Z

The distributions on PyPI include the pre-built React app. When you install from GitHub you need to build the React app yourself: see web-frontend/README.md.

J-avery32 · 2022-07-22T20:46:43Z

Everything seems to work!

danielballan · 2022-07-22T21:34:57Z

I thought of a new possibility that seems a better balance of flexibility and simplicity. My goals are:

avoid duplicated code
avoid duplicated effort
make it possible to override extension-based detection sometimes
avoid having multiple hooks or anything too hard to explain

What if the file extension detection runs first and then calls your hook with two parameters: the path and the mimetype. If it doesn’t match, the mimetype with be None. Therefore, if you only care about files that do not match based on the extension you can do

def detect_mimetype(path, mimetype):
    if mimetype is None:

But, in other situations, it is still possible to override the mimetype detected based on ext if it was wrong.

J-avery32 · 2022-07-22T21:41:20Z

That looks perfect. Trying it out now.

J-avery32 · 2022-07-22T21:45:28Z

Oh wait you haven't implemented it yet lol.

danielballan · 2022-07-23T12:58:23Z

Haha, yeah. Implemented and pushed now. If it works for you, I'll update the documentation and merge.

Wrote it out while it's fresh in my brain...now on to the weekend! 🌴

J-avery32 · 2022-07-25T18:24:44Z

Works for me! Not sure why the tests are failing though.

danielballan · 2022-07-26T22:05:23Z

Looks like an unrelated issue started failing the unit tests. I've been meaning to adjust that anyway; fix pushed.

danielballan · 2022-07-27T18:16:07Z

@J-avery32 The documentation you used is sort of a "case study" that covers a specific situation, and it gets into some details that might not be immediately needed by all users.

I tried writing more entry-level documentation here: https://github.com/bluesky/tiled/pull/259/files#diff-2a446760a1636bbd9aae3291ce4a430b145cda1ec4fe01f33e1a5e829a8f0b2a

Do you think it is understandable and useful?

danielballan · 2022-07-27T18:16:46Z

GitHub will display a more readable preview of that new docs page at https://github.com/bluesky/tiled/blob/fa42c52bc5b27c9a62fc521615999f95b3399e03/docs/source/how-to/read-custom-formats.md

J-avery32 · 2022-07-27T21:42:27Z

Yes it is readable to me.

Add mimetype_detection_hook.

97c041c

Resolve object from config.

8ccfc57

Change precedence and signature.

1dfd338

danielballan added 3 commits July 27, 2022 08:38

WIP: Document custom file types.

ed15aa2

Add new docs page to index.

1086e92

Bump max overflow. We misunderstood what this is for.

c980bed

danielballan force-pushed the mimetype-detection-hook branch from 89bfca0 to c980bed Compare July 27, 2022 12:38

Cover end-to-end story for custom formats.

fa42c52

danielballan added 2 commits July 27, 2022 14:17

Consistent headers

315b4db

more polish

7aa5fbe

danielballan mentioned this pull request Jul 27, 2022

Enhancements to the directory tree #48

Closed

3 tasks

danielballan merged commit 34ede8a into bluesky:main Jul 27, 2022

danielballan deleted the mimetype-detection-hook branch July 27, 2022 23:25

danielballan mentioned this pull request Jul 29, 2022

Enable matching wildcards in file_ext. #256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mimetype_detection_hook. #259

Add mimetype_detection_hook. #259

danielballan commented Jul 21, 2022 •

edited

Loading

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

danielballan commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

danielballan commented Jul 21, 2022 •

edited

Loading

J-avery32 commented Jul 21, 2022 •

edited

Loading

J-avery32 commented Jul 21, 2022

danielballan commented Jul 21, 2022

danielballan commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 22, 2022

danielballan commented Jul 22, 2022

J-avery32 commented Jul 22, 2022

danielballan commented Jul 22, 2022

J-avery32 commented Jul 22, 2022

J-avery32 commented Jul 22, 2022

danielballan commented Jul 23, 2022

J-avery32 commented Jul 25, 2022

danielballan commented Jul 26, 2022

danielballan commented Jul 27, 2022

danielballan commented Jul 27, 2022

J-avery32 commented Jul 27, 2022

Add mimetype_detection_hook. #259

Add mimetype_detection_hook. #259

Conversation

danielballan commented Jul 21, 2022 • edited Loading

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

danielballan commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

danielballan commented Jul 21, 2022 • edited Loading

J-avery32 commented Jul 21, 2022 • edited Loading

J-avery32 commented Jul 21, 2022

danielballan commented Jul 21, 2022

danielballan commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 21, 2022

J-avery32 commented Jul 22, 2022

danielballan commented Jul 22, 2022

J-avery32 commented Jul 22, 2022

danielballan commented Jul 22, 2022

J-avery32 commented Jul 22, 2022

J-avery32 commented Jul 22, 2022

danielballan commented Jul 23, 2022

J-avery32 commented Jul 25, 2022

danielballan commented Jul 26, 2022

danielballan commented Jul 27, 2022

danielballan commented Jul 27, 2022

J-avery32 commented Jul 27, 2022

danielballan commented Jul 21, 2022 •

edited

Loading

danielballan commented Jul 21, 2022 •

edited

Loading

J-avery32 commented Jul 21, 2022 •

edited

Loading