Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strong and weak etags for StaticFiles #2298

Closed
Kludex opened this issue Oct 7, 2023 Discussed in #2297 · 2 comments · Fixed by #2334
Closed

Strong and weak etags for StaticFiles #2298

Kludex opened this issue Oct 7, 2023 Discussed in #2297 · 2 comments · Fixed by #2334
Labels
need confirmation This issue needs confirmation.

Comments

@Kludex
Copy link
Member

Kludex commented Oct 7, 2023

Discussed in #2297

Originally posted by markus1978 October 6, 2023
tl;tr: It seems that starlette.staticfiles.StaticFiles does not use "proper" HTTP etags which causes issues with reverse proxies and browsers.

The starlette.staticfiles.StaticFiles class supports etags, e.g. it creates a Etag header in all responses and considers If-None-Match headers in its is_not_modified method. However, it only puts the plain hash values in the response Etag header and also only interprets If-None-Match values as plain hash values . HTTP Etags are either weak or strong and the values are supposed to be wrapped in W/"..." or "..." respectively (https://datatracker.ietf.org/doc/html/rfc7232#section-2.3.1).

This causes two problems for me. First, nginx as a reverse proxy is considering all starlette etag headers as weak etags and the gzip module removes. Secondly, when the browser sends a strong or weak etag in the If-None-Match header, it never matches because '...some-tag...' != '"...some-tag..."'.

I currently use this work arround to solve my issues:

class StaticFiles(StarletteStaticFiles):

    etag_re = r'^(W/)?"?([^"]*)"?$'

    def is_not_modified(
        self, response_headers: Headers, request_headers: Headers
    ) -> bool:
        try:
            if_none_match = request_headers["if-none-match"]
            # I remove all weak/strong etag syntax, basically doing a weak match 
            match = re.match(StaticFiles.etag_re, if_none_match)
            if_none_match = match.group(2)
            etag = response_headers["etag"]
            if if_none_match == etag:
                return True
        except KeyError:
            pass

        return super().is_not_modified(response_headers, request_headers)

    def file_response(
        self, full_path: PathLike, stat_result: os.stat_result, scope: Scope, 
        status_code: int = 200,
    ) -> Response:
        response = super().file_response(full_path, stat_result, scope, status_code)
        # I add strong etag syntax, basically wrapping the etag value in "..."
        if 'etag' in response.headers:
            response.headers['etag'] = f'"{response.headers.get("etag")}"'
        return response

I am not an expert in etags, just trying to make it work in our app. I am glad for any hints, if i am getting it wrong or missing anything here.

For reference, the relevant part from the current starlette.
From staticfiles.py::StaticFiles

    def is_not_modified(
        self, response_headers: Headers, request_headers: Headers
    ) -> bool:
        """
        Given the request and response headers, return `True` if an HTTP
        "Not Modified" response could be returned instead.
        """
        try:
            if_none_match = request_headers["if-none-match"]
            etag = response_headers["etag"]
            if if_none_match == etag:
                return True
        except KeyError:
            pass

        try:
            if_modified_since = parsedate(request_headers["if-modified-since"])
            last_modified = parsedate(response_headers["last-modified"])
            if (
                if_modified_since is not None
                and last_modified is not None
                and if_modified_since >= last_modified
            ):
                return True
        except KeyError:
            pass

        return False

From responses.py::FileResponse

    def set_stat_headers(self, stat_result: os.stat_result) -> None:
        content_length = str(stat_result.st_size)
        last_modified = formatdate(stat_result.st_mtime, usegmt=True)
        etag_base = str(stat_result.st_mtime) + "-" + str(stat_result.st_size)
        etag = md5_hexdigest(etag_base.encode(), usedforsecurity=False)

        self.headers.setdefault("content-length", content_length)
        self.headers.setdefault("last-modified", last_modified)
        self.headers.setdefault("etag", etag)
```</div>

<!-- POLAR PLEDGE BADGE START -->
> [!IMPORTANT]
> - We're using [Polar.sh](https://polar.sh/encode) so you can upvote and help fund this issue.
> - We receive the funding once the issue is completed & confirmed by you.
> - Thank you in advance for helping prioritize & fund our backlog.

<a href="https://polar.sh/encode/starlette/issues/2298">
<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://polar.sh/api/github/encode/starlette/issues/2298/pledge.svg?darkmode=1">
  <img alt="Fund with Polar" src="https://polar.sh/api/github/encode/starlette/issues/2298/pledge.svg">
</picture>
</a>
<!-- POLAR PLEDGE BADGE END -->
@Kludex Kludex added the need confirmation This issue needs confirmation. label Oct 7, 2023
@karpetrosyan
Copy link
Member

It seems the ETag header was incorrectly generated.
It should be a double-quoted string.
That's why I guess some implementations in the middle of the client and the Starlette application can entirely remove ETag.

See https://www.rfc-editor.org/rfc/rfc9110.html#field.etag and https://www.rfc-editor.org/rfc/rfc7232#section-2.3:

Also, when the If-None-Match header is used, the server should use weak comparison, so W/"1" should match "1", but Starlette does not respect that rule, so removing the W/ prefix is also necessary.

Invalid etag demonstration:

curl -I https://github.com/encode/starlette
...
etag: W/"36d33433b68fec65661ba26d49f312cb"
...

Starlette app

curl -I http://127.0.0.1:8000/static/index.html
...
etag: e38f027e042ece8271ca51e16756746c
...

App:

from starlette.applications import Starlette
from starlette.routing import Mount
from starlette.staticfiles import StaticFiles

routes = [
    Mount('/static', app=StaticFiles(directory='static'), name="static"),
]

app = Starlette(routes=routes)

@neolooong
Copy link
Contributor

Also, If-None-Match may contain multiple etag like this

If-None-Match: W/"67ab43", "54ed21", "7892dd"

Any one of those etag matched, should return 304 not modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need confirmation This issue needs confirmation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants