404 returned if only compressed file exists #8182

steverep · 2024-02-23T19:16:27Z

Is your feature request related to a problem?

Currently, the StaticResource handler immediately returns a 404 status if the requested file is not on disk: elif filepath.is_file():. This is done before the response class checks for compressed versions of the file, making it impossible to store only compressed versions of static resources in order to conserve disk space.

Furthermore, this is an extra unnecessary call for a file system operation that should be eliminated to gain performance. The file response class already gets a stat of the file in the executor to get things like the content length.

Describe the solution you'd like

Return the compressed file when permitted by the Accept-Encoding header, regardless of uncompressed existence.
Return a 406 if the compression is not supported by the client, or permit dynamic decompression.
Eliminate the extra is_file() check.

Describe alternatives you've considered

Only solution is to make sure uncompressed files always exist.

Related component

Server

Additional context

Technically, I think serving a 404 when a representation with a supported content coding exists is a violation of RFC 9110.

Code of Conduct

I agree to follow the aio-libs Code of Conduct

The text was updated successfully, but these errors were encountered:

steverep · 2024-07-09T03:55:04Z

@Dreamsorcerer and @bdraco I'm looking at how to fix this but could use some guidance. I basically see two possible routes:

Move all the os.stat() calls now in FileResponse.prepare() to StaticResource._handler() (which is currently making is_dir and is_file calls to raise any 404). This would basically keep all exception handling the same but would need to alter FileResponse.__init__ to accept the stat result.
Remove the call to is_file from StaticResource._handler(), and instead raise any 404 exception from FileResponse.prepare(), which AFAICT would then need to be handled in the RequestHandler class where prepare is called.

Both approaches would make the assumption that the request passed to prepare is the same as that passed to _handler (to use the Accept-Encoding header). Is that at least correct?

After hours of reading code, my gut says the 1st is the right way, but everything going on in the middle of those methods is still very much a black box to me. Thoughts? Are there downstream, user, performance, etc. effects here I'm missing?

bdraco · 2024-07-09T16:46:14Z

Move all the os.stat() calls now in FileResponse.prepare() to StaticResource._handler() (which is currently making is_dir and is_file calls to raise any 404). This would basically keep all exception handling the same but would need to alter FileResponse.__init__ to accept the stat result.

From your comment, its not clear if passing the stat result to FileResponse would be required, and since FileResponse is a documented API it would need to be optional.

The is_dir and is_file calls in the StaticResource._handle do blocking I/O in the event loop. Thats definitely a problem. Home Assistant avoids that for cached resources (but still uses StaticResource for uncached ones) since we don't expect the asset type to change on disk when using CachingStaticResource https://github.com/home-assistant/core/blob/898803abe9eae9ed24f8024d7226e5a12968a103/homeassistant/components/http/static.py#L52 Thats not an assumption that would be safe for the base StaticResource handler though.

steverep · 2024-07-09T18:11:35Z

From your comment, its not clear if passing the stat result to FileResponse would be required, and since FileResponse is a documented API it would need to be optional.

I guess strictly speaking it would not be "required", but without it you'd have to repeat the blocking stat calls. Basically, both classes would be running some equivalent of FileResponse._get_file_path_stat_encoding, which is obviously not desirable. I guess it could be optional, passed by StaticResource, and then skipped by FileResponse.prepare if already passed during instantiation?

Although given #8013, I wonder if the 2nd is the better approach so that the existence check and opening are in the same method?

The is_dir and is_file calls in the StaticResource._handle do blocking I/O in the event loop. Thats definitely a problem. Home Assistant avoids that for cached resources (but still uses StaticResource for uncached ones) since we don't expect the asset type to change on disk when using CachingStaticResource https://github.com/home-assistant/core/blob/898803abe9eae9ed24f8024d7226e5a12968a103/homeassistant/components/http/static.py#L52 Thats not an assumption that would be safe for the base StaticResource handler though.

Yeah I noticed that override. No argument on running the file system calls in an executor as I'd certainly fix that either way. Is that caching really giving much benefit though? Really you'd want to cache the stat result so that repeated requests can get a 304 response without repeating those, but that only saves one of the trips. Also, in practice, properly configured Cache-Control headers or a running service worker should mean browser repeated calls are absolutely minimized or even eliminated.

bdraco · 2024-07-09T18:19:01Z

Is that caching really giving much benefit though?

Keep in mind that many users will have multiple clients hitting the same files so the caching prevents a thundering herd of executor jobs when HA restarts

bdraco · 2024-07-09T18:24:17Z

Will need to think some more on case 2 when everything isn’t so chaotic

steverep · 2024-07-09T18:44:11Z

Keep in mind that many users will have multiple clients hitting the same files so the caching prevents a thundering herd of executor jobs when HA restarts

😕 But wouldn't that only occur if all those clients simultaneously deleted their cache and pressed F5 like a home grown DOS attack?

Will need to think some more on case 2 when everything isn’t so chaotic

👍🏻 I think it scares me because I presume there is a good reason for checking permission and existence outside of FileResponse, but my head is spinning trying to find it.

bdraco · 2024-07-09T18:47:59Z

Keep in mind that many users will have multiple clients hitting the same files so the caching prevents a thundering herd of executor jobs when HA restarts

😕 But wouldn't that only occur if all those clients simultaneously deleted their cache and pressed F5 like a home grown DOS attack?

Will need to think some more on case 2 when everything isn’t so chaotic

👍🏻 I think it scares me because I presume there is a good reason for checking permission and existence outside of FileResponse, but my head is spinning trying to find it.

Keep in mind that many users will have multiple clients hitting the same files so the caching prevents a thundering herd of executor jobs when HA restarts

😕 But wouldn't that only occur if all those clients simultaneously deleted their cache and pressed F5 like a home grown DOS attack?

What you are saying makes sense and I’m probably misremembering the cause but I do remember the caching solved a thundering herd issue.

Will need to think some more on case 2 when everything isn’t so chaotic

👍🏻 I think it scares me because I presume there is a good reason for checking permission and existence outside of FileResponse, but my head is spinning trying to find it.

steverep · 2024-07-09T20:09:28Z

Okay I think I figured this out, so anyone correct me if I'm off...

The request _handle and response prepare methods are basically called one after the other in the RequestHandler class. So there should be no performance difference between case 1 and 2. 😌

The difference is that HTTPException is only caught and logged properly when it's raised within the request handler. Since prepare gets called within finish_response, raising any not found or permission error there would get caught as an unhandled generic Exception.

steverep · 2024-07-10T04:28:27Z

From your comment, its not clear if passing the stat result to FileResponse would be required, and since FileResponse is a documented API it would need to be optional.

Actually, it is not documented on the website at all. Looks like docs were added to the master branch in #3991 that still exist, but those never made it to the 3.9 branch so are not live.

Dreamsorcerer · 2024-07-10T11:26:25Z

Ah, that longs predates my backport label checks, so appears to have been missed.
Feel free to sort out the backport: #3991 (comment)

Dreamsorcerer · 2024-07-13T11:44:14Z

I'm going to leave this one with bdraco to decide what should be done. As far as I understand, StaticResource is only related to the static() setup, which we tell users not to use in production anyway, so I don't really care about performance or anything here personally.

steverep · 2024-07-13T18:05:09Z

@Dreamsorcerer FileResponse may only be used internally as part of add_static, but since it's part of the public API, I'm assuming you'd have some thoughts on the appropriate higher level design?

Right now, FileResponse makes the assumption that the path it's instantiated with exists, i.e. it will not respond with 404 and instead just let the OSError propagate if it doesn't. If that's the intended design, then it doesn't make sense at all for FileResponse to be calling stat() on the file and performing content negotiation for encoding since the handler would have already done that.

Dreamsorcerer · 2024-07-13T19:15:53Z

Right, I guess if it has an explicit path, then it suggests to me a developer mistake that should be fixed. In which case, maybe it's not worth doing a 404 and dropping the stat calls?

steverep · 2024-07-13T20:07:48Z

I think there's a reasonable argument to be made for dropping them, but it would be a breaking change as it would then need to be passed by the user. It's used heavily in prepare() for ETag checks, range requests, etc. And playing devil's advocate, those processes both have short circuits to return 304 or 416, so why not 404?

I don't know what the right approach is for v4.0, but putting my semver hat on for 3.10, what about something like:

If user passes the Content-Encoding header, then content negotiation loop is skipped, i.e. aiohttp just adds the appropriate extension and gets the stat().
Add a new stat parameter on initialization. If provided, skip the call to stat(). This would presume the user also set Content-Encoding if necessary.

steverep · 2024-07-15T04:28:11Z

Addendum to this issue: the entire exception handling structure in StaticResource._handle() seems to be based on the behavior of Path.resolve() prior to python 3.6, i.e. strict=True. It tries to catch FileNotFound and there is a comment on permission errors, but those will never be raised since it's called with the default strict=False.

Strict isn't really needed and not desirable in order to fix this issue, so I'm working on cleaning that up and running the IO stuff in an executor.

steverep added the enhancement label Feb 23, 2024

steverep mentioned this issue Jul 9, 2024

Fix response headers for compressed file requests #8485

Merged

5 tasks

steverep mentioned this issue Jul 16, 2024

Remove blocking IO for static resources and refactor exception handling #8507

Merged

5 tasks

steverep mentioned this issue Jul 25, 2024

Handle 403 and 404 issues in FileResponse class #8538

Merged

5 tasks

bdraco closed this as completed in #8538 Jul 26, 2024

steverep mentioned this issue Aug 8, 2024

Fix response to circular symlinks with Python v3.13 #8642

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

404 returned if only compressed file exists #8182

404 returned if only compressed file exists #8182

steverep commented Feb 23, 2024

steverep commented Jul 9, 2024

bdraco commented Jul 9, 2024

steverep commented Jul 9, 2024

bdraco commented Jul 9, 2024

bdraco commented Jul 9, 2024

steverep commented Jul 9, 2024

bdraco commented Jul 9, 2024

steverep commented Jul 9, 2024

steverep commented Jul 10, 2024

Dreamsorcerer commented Jul 10, 2024

Dreamsorcerer commented Jul 13, 2024

steverep commented Jul 13, 2024

Dreamsorcerer commented Jul 13, 2024

steverep commented Jul 13, 2024

steverep commented Jul 15, 2024

404 returned if only compressed file exists #8182

404 returned if only compressed file exists #8182

Comments

steverep commented Feb 23, 2024

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Related component

Additional context

Code of Conduct

steverep commented Jul 9, 2024

bdraco commented Jul 9, 2024

steverep commented Jul 9, 2024

bdraco commented Jul 9, 2024

bdraco commented Jul 9, 2024

steverep commented Jul 9, 2024

bdraco commented Jul 9, 2024

steverep commented Jul 9, 2024

steverep commented Jul 10, 2024

Dreamsorcerer commented Jul 10, 2024

Dreamsorcerer commented Jul 13, 2024

steverep commented Jul 13, 2024

Dreamsorcerer commented Jul 13, 2024

steverep commented Jul 13, 2024

steverep commented Jul 15, 2024