You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking at the sidecar and prediction service, I noticed that the internal API marshalls requests between the InternalPredictionService to the inference server using x-application-url-encoding. Is that really the right idea?
The reason why is for large payloads the URL encoding scheme is going to suffer for some overhead versus say multipart/form-encoded which I believe is the recommended scheme for file uploads and large payloads in general by the W3C.
Another aspect to all of this is perhaps compressing requests on-the-fly. I realize gRPC is not compressed per se but it does use simple variable length encoding schemes to reduce payload overhead and perhaps Seldon Core can "feature" something similar for the JSON that is passed back and forth between the sidecar and prediction service.
Looking at the sidecar and prediction service, I noticed that the internal API marshalls requests between the InternalPredictionService to the inference server using x-application-url-encoding. Is that really the right idea?
The reason why is for large payloads the URL encoding scheme is going to suffer for some overhead versus say multipart/form-encoded which I believe is the recommended scheme for file uploads and large payloads in general by the W3C.
Another aspect to all of this is perhaps compressing requests on-the-fly. I realize gRPC is not compressed per se but it does use simple variable length encoding schemes to reduce payload overhead and perhaps Seldon Core can "feature" something similar for the JSON that is passed back and forth between the sidecar and prediction service.
An interesting and I think relevant read: https://eng.uber.com/trip-data-squeeze
The text was updated successfully, but these errors were encountered: