Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Loader incorrectly creates a new batch before reaching maxBatchSize and batch dispatch #294

Closed
Tracked by #297
leonbe02 opened this issue Feb 23, 2022 · 0 comments · Fixed by #321
Closed
Tracked by #297
Labels

Comments

@leonbe02
Copy link

Expected Behavior

If I try to load multiple keys (some being duplicates) before a batch is dispatched, I would expect that my batch function would trigger Math.ceil(queueSize / maxBatchSize) times after the intentional delay using batchScheduleFn. queueSize being the number of unique keys that have yet to be dispatched.

Current Behavior

The getCurrentBatch function is creating a new batch when existingBatch.cacheHits.length reaches the maxBatchSize, thus spreading out the keys across more batches than are needed.

Possible Solution

https://github.com/graphql/dataloader/blob/master/src/index.js#L92
On the line above, it is currently pushing to the cacheHits array, regardless of if the key is a duplicate of a previous load on the same batch. If this were instead a Map<key, fn>, it would prevent duplicate loads from incrementing this array's length and causing new batches to be created when they aren't needed.

Steps to Reproduce

const loader = new DataLoader(async (keys) => {
  console.log(keys);
}, {
  maxBatchSize: 3,
  batchScheduleFn: (callback) => setTimeout(callback, 100)
});

const keys = ['a', 'b', 'a', 'a', 'a', 'b', 'c'];
for (const key of keys) {
  loader.load(key);
}

/*
The code above will log 2 separate batches:
[ 'a' , 'b' ]
[ 'c' ]
Instead of the expected single batch of:
[ 'a', 'b', 'c' ]
*/

Context

We are trying to batch as many IDs as we can into a single request using DataLoader, however, since we are using Apollo GraphQL, there are times where a query might attempt to load the same ID for different fields within the schema. When this happens enough times, it reaches the maxBatchSize and causes the issue I outlined above where new batches are being created when they don't need to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment