Fix: Caching of request body for passing down to middlewares #2387

ayushbisht2001 · 2023-12-28T22:23:17Z

Summary

Hey all, I was trying to figure out a way to get the request body inside a middleware that I need for logging the audit logs. But seems like, we don't have that much flexibility for getting the multipart/form-data #495 .

I tried to make it work by caching the data in the state itself. I don't know whether it is the best way to fix this thing. I'm quite new to this community, please assist me for fixing this issue.

Thanks

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.

adriangb · 2023-12-28T22:33:39Z

This has been discussed extensively. Putting the body in state breaks things in multiple ways. Consider for example a middleware that modifies the request body, e.g. unzips it. If a middleware upstream put it in state then the request handler won't see the unzipped body, it will see the original body.

That said, I'm not sure what your actual use case is but there are safe ways to access the request body in a middleware and in the downstream app. Take a look at

starlette/starlette/middleware/base.py

Lines 19 to 91 in 483849a

    
           class _CachedRequest(Request): 
        
               """ 
        
               If the user calls Request.body() from their dispatch function 
        
               we cache the entire request body in memory and pass that to downstream middlewares, 
        
               but if they call Request.stream() then all we do is send an 
        
               empty body so that downstream things don't hang forever. 
        
               """ 
        
               def __init__(self, scope: Scope, receive: Receive): 
        
                   super().__init__(scope, receive) 
        
                   self._wrapped_rcv_disconnected = False 
        
                   self._wrapped_rcv_consumed = False 
        
                   self._wrapped_rc_stream = self.stream() 
        
               async def wrapped_receive(self) -> Message: 
        
                   # wrapped_rcv state 1: disconnected 
        
                   if self._wrapped_rcv_disconnected: 
        
                       # we've already sent a disconnect to the downstream app 
        
                       # we don't need to wait to get another one 
        
                       # (although most ASGI servers will just keep sending it) 
        
                       return {"type": "http.disconnect"} 
        
                   # wrapped_rcv state 1: consumed but not yet disconnected 
        
                   if self._wrapped_rcv_consumed: 
        
                       # since the downstream app has consumed us all that is left 
        
                       # is to send it a disconnect 
        
                       if self._is_disconnected: 
        
                           # the middleware has already seen the disconnect 
        
                           # since we know the client is disconnected no need to wait 
        
                           # for the message 
        
                           self._wrapped_rcv_disconnected = True 
        
                           return {"type": "http.disconnect"} 
        
                       # we don't know yet if the client is disconnected or not 
        
                       # so we'll wait until we get that message 
        
                       msg = await self.receive() 
        
                       if msg["type"] != "http.disconnect":  # pragma: no cover 
        
                           # at this point a disconnect is all that we should be receiving 
        
                           # if we get something else, things went wrong somewhere 
        
                           raise RuntimeError(f"Unexpected message received: {msg['type']}") 
        
                       return msg 
        
                   # wrapped_rcv state 3: not yet consumed 
        
                   if getattr(self, "_body", None) is not None: 
        
                       # body() was called, we return it even if the client disconnected 
        
                       self._wrapped_rcv_consumed = True 
        
                       return { 
        
                           "type": "http.request", 
        
                           "body": self._body, 
        
                           "more_body": False, 
        
                       } 
        
                   elif self._stream_consumed: 
        
                       # stream() was called to completion 
        
                       # return an empty body so that downstream apps don't hang 
        
                       # waiting for a disconnect 
        
                       self._wrapped_rcv_consumed = True 
        
                       return { 
        
                           "type": "http.request", 
        
                           "body": b"", 
        
                           "more_body": False, 
        
                       } 
        
                   else: 
        
                       # body() was never called and stream() wasn't consumed 
        
                       try: 
        
                           stream = self.stream() 
        
                           chunk = await stream.__anext__() 
        
                           self._wrapped_rcv_consumed = self._stream_consumed 
        
                           return { 
        
                               "type": "http.request", 
        
                               "body": chunk, 
        
                               "more_body": not self._stream_consumed, 
        
                           } 
        
                       except ClientDisconnect: 
        
                           self._wrapped_rcv_disconnected = True 
        
                           return {"type": "http.disconnect"}

. Maybe you can do something similar?

If you're writing your own middleware I'd recommend you read this section of the docs. At the very least it will give you a better understanding of what's going on under the hood: https://www.starlette.io/middleware/#pure-asgi-middleware

ayushbisht2001 · 2023-12-28T23:05:09Z

That said, I'm not sure what your actual use case is but there are safe ways to access the request body in a middleware and in the downstream app. Take a look at

starlette/starlette/middleware/base.py

Lines 19 to 91 in 483849a

class _CachedRequest(Request):

"""

If the user calls Request.body() from their dispatch function

we cache the entire request body in memory and pass that to downstream middlewares,

but if they call Request.stream() then all we do is send an

empty body so that downstream things don't hang forever.

"""

def __init__(self, scope: Scope, receive: Receive):

super().__init__(scope, receive)

self._wrapped_rcv_disconnected = False

self._wrapped_rcv_consumed = False

self._wrapped_rc_stream = self.stream()

async def wrapped_receive(self) -> Message:

# wrapped_rcv state 1: disconnected

if self._wrapped_rcv_disconnected:

# we've already sent a disconnect to the downstream app

# we don't need to wait to get another one

# (although most ASGI servers will just keep sending it)

return {"type": "http.disconnect"}

# wrapped_rcv state 1: consumed but not yet disconnected

if self._wrapped_rcv_consumed:

# since the downstream app has consumed us all that is left

# is to send it a disconnect

if self._is_disconnected:

# the middleware has already seen the disconnect

# since we know the client is disconnected no need to wait

# for the message

self._wrapped_rcv_disconnected = True

return {"type": "http.disconnect"}

# we don't know yet if the client is disconnected or not

# so we'll wait until we get that message

msg = await self.receive()

if msg["type"] != "http.disconnect": # pragma: no cover

# at this point a disconnect is all that we should be receiving

# if we get something else, things went wrong somewhere

raise RuntimeError(f"Unexpected message received: {msg['type']}")

return msg

# wrapped_rcv state 3: not yet consumed

if getattr(self, "_body", None) is not None:

# body() was called, we return it even if the client disconnected

self._wrapped_rcv_consumed = True

return {

"type": "http.request",

"body": self._body,

"more_body": False,

}

elif self._stream_consumed:

# stream() was called to completion

# return an empty body so that downstream apps don't hang

# waiting for a disconnect

self._wrapped_rcv_consumed = True

return {

"type": "http.request",

"body": b"",

"more_body": False,

}

else:

# body() was never called and stream() wasn't consumed

try:

stream = self.stream()

chunk = await stream.__anext__()

self._wrapped_rcv_consumed = self._stream_consumed

return {

"type": "http.request",

"body": chunk,

"more_body": not self._stream_consumed,

}

except ClientDisconnect:

self._wrapped_rcv_disconnected = True

return {"type": "http.disconnect"}

I've look into this file, but my concern is for the multipart/form-data, How can I get the form data values ? Other json based request are properly fetched, while with stream request, it showing stream consumed.
Right now, I'm trying some hacky way to parse the chunks to some custom MutliPartParser to create FormData. Is there a better way to get the multipart/form-data record.

ayushbisht2001 · 2023-12-28T23:18:08Z

@adriangb Also, about the issue with Gzip, I guess it will break, if the we place the cache middleware before the Gzip middleware. Creating a separate middleware would be better way to do this I guess, with an instructions to always keep it at the lowest stream.
WHAT U THINK ?

adriangb · 2023-12-28T23:20:09Z

I don't think we should move forward with this or similar changes. Please look at the links in my previous comment and see if you can solve the problem that way or report back why that won't work.

ayushbisht2001 · 2023-12-28T23:29:56Z

@adriangb Actually, for my usecase I was able to solve this issue. But what I feel is , it would be really helpful if the caching of request body is supported by library ( more specifically multipart/form-data which throws stream consumed error ).
BTW thanks @adriangb for all your responses.

Kludex · 2023-12-29T08:50:11Z

As Adrian said, this was discussed extensively. Also, you solved your use case, so let's close this.

If you need more guidance, please open a discussion.

flag added for conditionally caching the request body

1b0ceb4

ayushbisht2001 force-pushed the feature/caching_request_body branch from 8fdb171 to 1b0ceb4 Compare December 28, 2023 22:53

Kludex closed this Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Caching of request body for passing down to middlewares #2387

Fix: Caching of request body for passing down to middlewares #2387

ayushbisht2001 commented Dec 28, 2023

adriangb commented Dec 28, 2023

ayushbisht2001 commented Dec 28, 2023 •

edited

Loading

ayushbisht2001 commented Dec 28, 2023

adriangb commented Dec 28, 2023

ayushbisht2001 commented Dec 28, 2023 •

edited

Loading

Kludex commented Dec 29, 2023

Fix: Caching of request body for passing down to middlewares #2387

Fix: Caching of request body for passing down to middlewares #2387

Conversation

ayushbisht2001 commented Dec 28, 2023

Summary

Checklist

adriangb commented Dec 28, 2023

ayushbisht2001 commented Dec 28, 2023 • edited Loading

ayushbisht2001 commented Dec 28, 2023

adriangb commented Dec 28, 2023

ayushbisht2001 commented Dec 28, 2023 • edited Loading

Kludex commented Dec 29, 2023

ayushbisht2001 commented Dec 28, 2023 •

edited

Loading

ayushbisht2001 commented Dec 28, 2023 •

edited

Loading