You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we have a source plugin producing records faster than a destination plugin can process them, we need to ensure that we apply back-pressure and throttle down the speed of records being produced.
Using plugins in the built-in mode is not a problem, because we are using channels to send/receive records to/from the plugin. We control the size of the buffer in those channels (currently unbuffered) and back-pressure will automatically be applied.
It's more complicated in standalone mode. Connector plugins are communicating with Conduit via gRPC streams when running in standalone mode. Conduit does not wait for a record to be successfully processed by the destination plugin before sending the next record into the stream, instead, it allows the plugin to asynchronously process the records and send back an acknowledgment once it is done. gRPC by default dynamically changes the buffer size based on bandwidth estimation. Since the plugin is running on the same machine as Conduit latency will be near 0 and bandwidth will be very high, so we can assume that the buffer will grow to the maximum, which is 4MB. Because we have 2 plugins on a pipeline (source and destination) a pipeline can thus buffer 8MB of records before it starts applying back-pressure, which sounds quite a lot (e.g. in case a record is 1kB in size that equals 8000 records).
A solution to this would be to set the initial window size of a gRPC stream to a fixed size. The question is what size would make sense? We should investigate this further and possibly set a fixed window size once we have benchmarks set up (#198) and can measure how that impacts the performance of Conduit.
The text was updated successfully, but these errors were encountered:
Feature description
If we have a source plugin producing records faster than a destination plugin can process them, we need to ensure that we apply back-pressure and throttle down the speed of records being produced.
Using plugins in the built-in mode is not a problem, because we are using channels to send/receive records to/from the plugin. We control the size of the buffer in those channels (currently unbuffered) and back-pressure will automatically be applied.
It's more complicated in standalone mode. Connector plugins are communicating with Conduit via gRPC streams when running in standalone mode. Conduit does not wait for a record to be successfully processed by the destination plugin before sending the next record into the stream, instead, it allows the plugin to asynchronously process the records and send back an acknowledgment once it is done. gRPC by default dynamically changes the buffer size based on bandwidth estimation. Since the plugin is running on the same machine as Conduit latency will be near 0 and bandwidth will be very high, so we can assume that the buffer will grow to the maximum, which is 4MB. Because we have 2 plugins on a pipeline (source and destination) a pipeline can thus buffer 8MB of records before it starts applying back-pressure, which sounds quite a lot (e.g. in case a record is 1kB in size that equals 8000 records).
A solution to this would be to set the initial window size of a gRPC stream to a fixed size. The question is what size would make sense? We should investigate this further and possibly set a fixed window size once we have benchmarks set up (#198) and can measure how that impacts the performance of Conduit.
The text was updated successfully, but these errors were encountered: