perf: allow concurrent decompress away from network loop #162

jasonk000 · 2025-01-30T00:38:52Z

We want to ensure the transcode passes everything that is expected to be run asynchronously over to the decode loop. In general, memcached calls us back with gotData, then receivedStatus, then complete. We use gotData to launch the work onto the transcode threadpool and return control to memcached, which would immediately then call receivedStatus.

Previously, receivedStatus and complete were set up to interact and set a value on the underlying future but only by synchronously blocking for the transcode future. Given this callback is happening nearly immediately after the gotData callback, we were firing the transcode and nearly always performing a blocking get, which triggers a synchronous decompression on the network IO loop. This is of course very detrimental to evcache driver performance, since the driver cannot even accept new requests to issue to them to memcached backends, and must wait until decompression completes.

In this fix, we rearrange things a little to ensure that if the async decode is requested, that we push the completion status updates to happen only after the async decode completes. This is a little ugly because of the current arrangement of the memcached decoder. A future change might be to overhaul this integration and pull it out of the memcached transcode framework and use something a bit more friendly.

We want to ensure the transcode passes everything that is expected to be run asynchronously over to the decode loop. In general, memcached calls us back with gotData, then receivedStatus, then complete. We use gotData to launch the work onto the transcode threadpool and return control to memcached, which would immediately then call receivedStatus. Previously, receivedStatus and complete were set up to interact and set a value on the underlying future but only by synchronously blocking for the transcode future. Given this callback is happening nearly immediately after the gotData callback, we were firing the transcode and nearly always performing a blocking get, which triggers a synchronous decompression on the network IO loop. This is of course very detrimental to evcache driver performance, since the driver cannot even accept new requests to issue to them to memcached backends, and must wait until decompression completes. In this fix, we rearrange things a little to ensure that if the async decode is requested, that we push the completion status updates to happen only after the async decode completes. This is a little ugly because of the current arrangement of the memcached decoder. A future change might be to overhaul this integration and pull it out of the memcached transcode framework and use something a bit more friendly.

jasonk000 · 2025-01-30T01:04:04Z

This change is tricky, actually, since it slightly changes timing it makes the system more scalable without tuning but a little less efficient. It needs some consideration. Maybe an alternative is to always do it in-thread and encourage users to tune poolSize much higher.

pkarumanchi9 · 2025-02-05T22:25:41Z

evcache-core/src/main/java/net/spy/memcached/EVCacheMemcachedClient.java


-                if (isWrongKeyReturned(key, k)) return;
+                boolean shouldLog = log.isDebugEnabled() && client.getPool().getEVCacheClientPoolManager().shouldLog(appName);
+                boolean alwaysSync = false;


should be exposed via a fp to go via async path?

fixed in 2b60f17

pkarumanchi9 · 2025-02-05T22:42:18Z

evcache-core/src/main/java/net/spy/memcached/EVCacheMemcachedClient.java

+                    dataSizeDS.record(data.length);
+                    Transcoder<T> transcoder = (tc == null) ? (Transcoder<T>) getTranscoder() : tc;
+                    if (tcService == null) {
+                        log.error("tcService is null, will not be able to decode");


should we do latch.countDown() here?

fixed in 2b60f17, I restructured it too. Hopefully reduce chance of bugs like this.

…ecode Previously had a concurrency issue and may have dropped some latch countDown. A refactor to track state differently and tidy up a little helps. Add a property to control sync decode threading behavior

pkarumanchi9 reviewed Feb 5, 2025

View reviewed changes

Improve asyncGet callback correctness, add property to control sync d…

2b60f17

…ecode Previously had a concurrency issue and may have dropped some latch countDown. A refactor to track state differently and tidy up a little helps. Add a property to control sync decode threading behavior

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: allow concurrent decompress away from network loop #162

perf: allow concurrent decompress away from network loop #162

jasonk000 commented Jan 30, 2025

jasonk000 commented Jan 30, 2025

pkarumanchi9 Feb 5, 2025

jasonk000 Feb 8, 2025

pkarumanchi9 Feb 5, 2025

jasonk000 Feb 8, 2025

perf: allow concurrent decompress away from network loop #162

Are you sure you want to change the base?

perf: allow concurrent decompress away from network loop #162

Conversation

jasonk000 commented Jan 30, 2025

jasonk000 commented Jan 30, 2025

pkarumanchi9 Feb 5, 2025

Choose a reason for hiding this comment

jasonk000 Feb 8, 2025

Choose a reason for hiding this comment

pkarumanchi9 Feb 5, 2025

Choose a reason for hiding this comment

jasonk000 Feb 8, 2025

Choose a reason for hiding this comment