Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Caching of request body for passing down to middlewares #2387

Conversation

ayushbisht2001
Copy link

Summary

Hey all, I was trying to figure out a way to get the request body inside a middleware that I need for logging the audit logs. But seems like, we don't have that much flexibility for getting the multipart/form-data #495 .

I tried to make it work by caching the data in the state itself. I don't know whether it is the best way to fix this thing. I'm quite new to this community, please assist me for fixing this issue.

Thanks

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@adriangb
Copy link
Member

This has been discussed extensively. Putting the body in state breaks things in multiple ways. Consider for example a middleware that modifies the request body, e.g. unzips it. If a middleware upstream put it in state then the request handler won't see the unzipped body, it will see the original body.

That said, I'm not sure what your actual use case is but there are safe ways to access the request body in a middleware and in the downstream app. Take a look at

class _CachedRequest(Request):
"""
If the user calls Request.body() from their dispatch function
we cache the entire request body in memory and pass that to downstream middlewares,
but if they call Request.stream() then all we do is send an
empty body so that downstream things don't hang forever.
"""
def __init__(self, scope: Scope, receive: Receive):
super().__init__(scope, receive)
self._wrapped_rcv_disconnected = False
self._wrapped_rcv_consumed = False
self._wrapped_rc_stream = self.stream()
async def wrapped_receive(self) -> Message:
# wrapped_rcv state 1: disconnected
if self._wrapped_rcv_disconnected:
# we've already sent a disconnect to the downstream app
# we don't need to wait to get another one
# (although most ASGI servers will just keep sending it)
return {"type": "http.disconnect"}
# wrapped_rcv state 1: consumed but not yet disconnected
if self._wrapped_rcv_consumed:
# since the downstream app has consumed us all that is left
# is to send it a disconnect
if self._is_disconnected:
# the middleware has already seen the disconnect
# since we know the client is disconnected no need to wait
# for the message
self._wrapped_rcv_disconnected = True
return {"type": "http.disconnect"}
# we don't know yet if the client is disconnected or not
# so we'll wait until we get that message
msg = await self.receive()
if msg["type"] != "http.disconnect": # pragma: no cover
# at this point a disconnect is all that we should be receiving
# if we get something else, things went wrong somewhere
raise RuntimeError(f"Unexpected message received: {msg['type']}")
return msg
# wrapped_rcv state 3: not yet consumed
if getattr(self, "_body", None) is not None:
# body() was called, we return it even if the client disconnected
self._wrapped_rcv_consumed = True
return {
"type": "http.request",
"body": self._body,
"more_body": False,
}
elif self._stream_consumed:
# stream() was called to completion
# return an empty body so that downstream apps don't hang
# waiting for a disconnect
self._wrapped_rcv_consumed = True
return {
"type": "http.request",
"body": b"",
"more_body": False,
}
else:
# body() was never called and stream() wasn't consumed
try:
stream = self.stream()
chunk = await stream.__anext__()
self._wrapped_rcv_consumed = self._stream_consumed
return {
"type": "http.request",
"body": chunk,
"more_body": not self._stream_consumed,
}
except ClientDisconnect:
self._wrapped_rcv_disconnected = True
return {"type": "http.disconnect"}
. Maybe you can do something similar?

If you're writing your own middleware I'd recommend you read this section of the docs. At the very least it will give you a better understanding of what's going on under the hood: https://www.starlette.io/middleware/#pure-asgi-middleware

@ayushbisht2001 ayushbisht2001 force-pushed the feature/caching_request_body branch from 8fdb171 to 1b0ceb4 Compare December 28, 2023 22:53
@ayushbisht2001
Copy link
Author

ayushbisht2001 commented Dec 28, 2023

That said, I'm not sure what your actual use case is but there are safe ways to access the request body in a middleware and in the downstream app. Take a look at

class _CachedRequest(Request):
"""
If the user calls Request.body() from their dispatch function
we cache the entire request body in memory and pass that to downstream middlewares,
but if they call Request.stream() then all we do is send an
empty body so that downstream things don't hang forever.
"""
def __init__(self, scope: Scope, receive: Receive):
super().__init__(scope, receive)
self._wrapped_rcv_disconnected = False
self._wrapped_rcv_consumed = False
self._wrapped_rc_stream = self.stream()
async def wrapped_receive(self) -> Message:
# wrapped_rcv state 1: disconnected
if self._wrapped_rcv_disconnected:
# we've already sent a disconnect to the downstream app
# we don't need to wait to get another one
# (although most ASGI servers will just keep sending it)
return {"type": "http.disconnect"}
# wrapped_rcv state 1: consumed but not yet disconnected
if self._wrapped_rcv_consumed:
# since the downstream app has consumed us all that is left
# is to send it a disconnect
if self._is_disconnected:
# the middleware has already seen the disconnect
# since we know the client is disconnected no need to wait
# for the message
self._wrapped_rcv_disconnected = True
return {"type": "http.disconnect"}
# we don't know yet if the client is disconnected or not
# so we'll wait until we get that message
msg = await self.receive()
if msg["type"] != "http.disconnect": # pragma: no cover
# at this point a disconnect is all that we should be receiving
# if we get something else, things went wrong somewhere
raise RuntimeError(f"Unexpected message received: {msg['type']}")
return msg
# wrapped_rcv state 3: not yet consumed
if getattr(self, "_body", None) is not None:
# body() was called, we return it even if the client disconnected
self._wrapped_rcv_consumed = True
return {
"type": "http.request",
"body": self._body,
"more_body": False,
}
elif self._stream_consumed:
# stream() was called to completion
# return an empty body so that downstream apps don't hang
# waiting for a disconnect
self._wrapped_rcv_consumed = True
return {
"type": "http.request",
"body": b"",
"more_body": False,
}
else:
# body() was never called and stream() wasn't consumed
try:
stream = self.stream()
chunk = await stream.__anext__()
self._wrapped_rcv_consumed = self._stream_consumed
return {
"type": "http.request",
"body": chunk,
"more_body": not self._stream_consumed,
}
except ClientDisconnect:
self._wrapped_rcv_disconnected = True
return {"type": "http.disconnect"}

I've look into this file, but my concern is for the multipart/form-data, How can I get the form data values ? Other json based request are properly fetched, while with stream request, it showing stream consumed.
Right now, I'm trying some hacky way to parse the chunks to some custom MutliPartParser to create FormData. Is there a better way to get the multipart/form-data record.

@ayushbisht2001
Copy link
Author

@adriangb Also, about the issue with Gzip, I guess it will break, if the we place the cache middleware before the Gzip middleware. Creating a separate middleware would be better way to do this I guess, with an instructions to always keep it at the lowest stream.
WHAT U THINK ?

@adriangb
Copy link
Member

I don't think we should move forward with this or similar changes. Please look at the links in my previous comment and see if you can solve the problem that way or report back why that won't work.

@ayushbisht2001
Copy link
Author

ayushbisht2001 commented Dec 28, 2023

@adriangb Actually, for my usecase I was able to solve this issue. But what I feel is , it would be really helpful if the caching of request body is supported by library ( more specifically multipart/form-data which throws stream consumed error ).
BTW thanks @adriangb for all your responses.

@Kludex Kludex closed this Dec 29, 2023
@Kludex
Copy link
Member

Kludex commented Dec 29, 2023

As Adrian said, this was discussed extensively. Also, you solved your use case, so let's close this.

If you need more guidance, please open a discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants