-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] sql: add metadata router #50637
Conversation
This is still a WIP and there are tests that are still to be written and general polish to be done - I just wanted to get some early feedback on the approach and that things are looking sane before I polish this up a bit more. Please let me know if you have any questions, and of course any feedback is greatly appreciated! I chatted with @yuzefovich about this regarding the motivation behind this. I attempted to summarize the motivation in the verbose (sorry - planning on cleaning this up) commit message. |
This commit adds a DistSQL router that routes metadata to a given output stream based on the metadata's StreamIdx. This is used by flows which schedule DistSQL processors in order to coordinate work around the cluster. The motivation for this change is a refactoring to Restore which attempts to distribute the work of performing the restore across the cluster. RESTORE works by creating a pipeline of work with 2 stages. The first stage splits and scatters the ranges we are going to import. This shuffling means that a range could end up on a random node in the cluster. The second stage of the pipeline is to download the data from the backup file and restore the data, which is accomplished through an AddSSTable request. It is beneficial (as well as the motivation for this refactor) for the node which issues this request to also be the leaseholder of the range it is trying to import. This is to prevent a situation where many nodes are all waiting on one node that had the misfortune of being the recipient of many scatter ranges in a row. This router would allow restore to be implemented with 2 separate processors: one that splits and scatters the data, and one that imports the data. Using this router, the split and scatter processor could determine, on the fly, which processor is suitable to import the data. Release note: None
IIUC the intention here is to be able to plan second stage processors on the lease holder. I'm not convinced that metadata is the correct way to do this, because it'd be mixing planning with execution. This might be very hand wavy since I don't know much about the code in question, but this is how I would expect the restore to work: plan and execute one flow that splits and scatters ranges, then plan a second flow that would plan the processors on the leaseholders for the respective ranges, similar to what we do with scans. Would something like that work? |
Planning the second flow after the splitting and scattering is done was considered, and this is how restore used to work. However, the splitting and scattering itself takes a non-trivial amount of time, especially with large backups, so waiting for the first flow to finish before planning the second flow is expected to significantly affect performance. There's also another question that I had, if this approach ends up making sense, does it make sense to stream this information as a row or as metadata? Since the information being passed between processors is about the data the the processors are going to import (rather than the data rows themselves) I thought it made sense to pass them as metadata. |
I see. I guess I'm uncomfortable with the idea about having metadata change anything about execution. I think we have a precedent for streaming this information as a row, and I think the infra already exists for that (take a look at the |
Hm, I see. I looked at the I wonder if it would make sense to have a router that's very similar to the |
I don't think there would be a problem dynamically update the range router's mapping. The first processor would reference it and update before returning spans. I would prefer that than creating a new router that is the same minus this detail, but I can be convinced otherwise. |
I've been persuaded by Alfonso that the necessary information should flow as "rows". Paul, sorry for encouraging you to go down a different path. |
Summarizing an offline discussion with @asubiotto, I'm going to attempt to see if the processor can leverage the range router and update it's mapping per processor dynamically (during the execution of the processor). |
This commit adds a DistSQL router that routes metadata to a given output
stream based on the metadata's StreamIdx. This is used by flows which
schedule DistSQL processors in order to coordinate work around the
cluster.
The motivation for this change is a refactoring to Restore which
attempts to distribute the work of performing the restore across the
cluster. RESTORE works by creating a pipeline of work with 2 stages. The
first stage splits and scatters the ranges we are going to import. This
shuffling means that a range could end up on a random node in the
cluster. The second stage of the pipeline is to download the data from
the backup file and restore the data, which is accomplished through an
AddSSTable request. It is beneficial (as well as the motivation for this
refactor) for the node which issues this request to also be the
leaseholder of the range it is trying to import. This is to prevent a
situation where many nodes are all waiting on one node that had the
misfortune of being the recipient of many scatter ranges in a row.
This router would allow restore to be implemented with 2 separate
processors: one that splits and scatters the data, and one that imports
the data. Using this router, the split and scatter processor could
determine, on the fly, which processor is suitable to import the data.
Release note: None