Skip to content

Batch neighbour retrieval in traversals #21809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: devel
Choose a base branch
from

Conversation

jvolmer
Copy link
Contributor

@jvolmer jvolmer commented Jun 18, 2025

We found that if we have a traversal with a limit, but one node that is hit is a supernode, we first retrieve all the neighbours of this supernode first and finally get rid of these again due to the limit. The unnecessary retrieval can be very expensive.
Therefore we decided to batch the neighbour retrieval for each node into batches of 1000. For the OneSidedEnumerator, the neighbourhood retrieval is done in computeNeighbourhoodOfNextVertex(). In single server, the expand method on the provider is the one that saves the neighbours all into memory. In cluster, already the fetching (via the provider) of the neighbour edges can be expensive and should be batched.
The current changes include a SingleServerNeighbourProvider that is an iterator over neighbours of one vertex that can be set (and therefore resets the internal cursors) via rearm. The next method is supposed to give the next 1000 neighbours - current it still gives all of them, this still needs more work. The SingleServerProvider behavior also has not changed so far, it iterates over all batches. This has to change such that it executes the callback only on the next batch. (Another this that has to be investigated: the TraversalStats have to be given to the SingleServerNeighbourProvider, possibly as a shared_ptr (?), using just a bare reference to the traversal stats of the SingleServerProvider crashes arangod.)
Also, the similar provider has to be implemented for the cluster case where we fetch edges.

@jvolmer jvolmer self-assigned this Jun 18, 2025
@cla-bot cla-bot bot added the cla-signed label Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant