[wip] sql: add metadata router #50637

pbardea · 2020-06-25T05:12:53Z

This commit adds a DistSQL router that routes metadata to a given output
stream based on the metadata's StreamIdx. This is used by flows which
schedule DistSQL processors in order to coordinate work around the
cluster.

The motivation for this change is a refactoring to Restore which
attempts to distribute the work of performing the restore across the
cluster. RESTORE works by creating a pipeline of work with 2 stages. The
first stage splits and scatters the ranges we are going to import. This
shuffling means that a range could end up on a random node in the
cluster. The second stage of the pipeline is to download the data from
the backup file and restore the data, which is accomplished through an
AddSSTable request. It is beneficial (as well as the motivation for this
refactor) for the node which issues this request to also be the
leaseholder of the range it is trying to import. This is to prevent a
situation where many nodes are all waiting on one node that had the
misfortune of being the recipient of many scatter ranges in a row.

This router would allow restore to be implemented with 2 separate
processors: one that splits and scatters the data, and one that imports
the data. Using this router, the split and scatter processor could
determine, on the fly, which processor is suitable to import the data.

Release note: None

cockroach-teamcity · 2020-06-25T05:13:00Z

This change is

pbardea · 2020-06-25T05:15:56Z

This is still a WIP and there are tests that are still to be written and general polish to be done - I just wanted to get some early feedback on the approach and that things are looking sane before I polish this up a bit more. Please let me know if you have any questions, and of course any feedback is greatly appreciated!

I chatted with @yuzefovich about this regarding the motivation behind this. I attempted to summarize the motivation in the verbose (sorry - planning on cleaning this up) commit message.

This commit adds a DistSQL router that routes metadata to a given output stream based on the metadata's StreamIdx. This is used by flows which schedule DistSQL processors in order to coordinate work around the cluster. The motivation for this change is a refactoring to Restore which attempts to distribute the work of performing the restore across the cluster. RESTORE works by creating a pipeline of work with 2 stages. The first stage splits and scatters the ranges we are going to import. This shuffling means that a range could end up on a random node in the cluster. The second stage of the pipeline is to download the data from the backup file and restore the data, which is accomplished through an AddSSTable request. It is beneficial (as well as the motivation for this refactor) for the node which issues this request to also be the leaseholder of the range it is trying to import. This is to prevent a situation where many nodes are all waiting on one node that had the misfortune of being the recipient of many scatter ranges in a row. This router would allow restore to be implemented with 2 separate processors: one that splits and scatters the data, and one that imports the data. Using this router, the split and scatter processor could determine, on the fly, which processor is suitable to import the data. Release note: None

asubiotto · 2020-06-25T10:46:18Z

IIUC the intention here is to be able to plan second stage processors on the lease holder. I'm not convinced that metadata is the correct way to do this, because it'd be mixing planning with execution.

This might be very hand wavy since I don't know much about the code in question, but this is how I would expect the restore to work: plan and execute one flow that splits and scatters ranges, then plan a second flow that would plan the processors on the leaseholders for the respective ranges, similar to what we do with scans. Would something like that work?

pbardea · 2020-06-25T13:14:30Z

Planning the second flow after the splitting and scattering is done was considered, and this is how restore used to work. However, the splitting and scattering itself takes a non-trivial amount of time, especially with large backups, so waiting for the first flow to finish before planning the second flow is expected to significantly affect performance.

There's also another question that I had, if this approach ends up making sense, does it make sense to stream this information as a row or as metadata? Since the information being passed between processors is about the data the the processors are going to import (rather than the data rows themselves) I thought it made sense to pass them as metadata.

asubiotto · 2020-06-25T13:34:27Z

I see. I guess I'm uncomfortable with the idea about having metadata change anything about execution.

I think we have a precedent for streaming this information as a row, and I think the infra already exists for that (take a look at the rangeRouter). This makes more sense to me in the context of a processor (i.e. a processor takes a row and does something with it). You could unconditionally plan all of these restore processors that would run AddSSTable requests with spans it receives. The first stage would then use a rangeRouter to route these spans to whichever node is a leaseholder for that range.

pbardea · 2020-06-25T15:03:38Z

Hm, I see. I looked at the rangeRouter but the problem that I ran into is that we don't know the mapping between spans the the stream index we want to send the row on until we do the work in the first processor.

I wonder if it would make sense to have a router that's very similar to the rangeRouter, but instead of having a column that is a key which the router maps to a stream index, have one of the columns be an int that stores the stream index directly? This seems pretty similar to the rangeRouter, but would allow the dynamic adjustment of the flow of rows after planning time.

asubiotto · 2020-06-25T15:13:10Z

I don't think there would be a problem dynamically update the range router's mapping. The first processor would reference it and update before returning spans. I would prefer that than creating a new router that is the same minus this detail, but I can be convinced otherwise.

yuzefovich · 2020-06-25T15:53:02Z

I've been persuaded by Alfonso that the necessary information should flow as "rows". Paul, sorry for encouraging you to go down a different path.

pbardea · 2020-06-25T16:24:13Z

Summarizing an offline discussion with @asubiotto, I'm going to attempt to see if the processor can leverage the range router and update it's mapping per processor dynamically (during the execution of the processor).

pbardea requested review from yuzefovich and asubiotto June 25, 2020 05:12

pbardea changed the title ~~sql: add metadata router~~ [wip] sql: add metadata router Jun 25, 2020

pbardea mentioned this pull request Jun 25, 2020

backup: consider using distsql for work distribution #40239

Closed

pbardea force-pushed the meta-router branch from d761912 to 80a4983 Compare June 25, 2020 05:20

pbardea closed this Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] sql: add metadata router #50637

[wip] sql: add metadata router #50637

pbardea commented Jun 25, 2020 •

edited

Loading

cockroach-teamcity commented Jun 25, 2020

pbardea commented Jun 25, 2020 •

edited

Loading

asubiotto commented Jun 25, 2020

pbardea commented Jun 25, 2020

asubiotto commented Jun 25, 2020

pbardea commented Jun 25, 2020

asubiotto commented Jun 25, 2020

yuzefovich commented Jun 25, 2020

pbardea commented Jun 25, 2020

[wip] sql: add metadata router #50637

[wip] sql: add metadata router #50637

Conversation

pbardea commented Jun 25, 2020 • edited Loading

cockroach-teamcity commented Jun 25, 2020

pbardea commented Jun 25, 2020 • edited Loading

asubiotto commented Jun 25, 2020

pbardea commented Jun 25, 2020

asubiotto commented Jun 25, 2020

pbardea commented Jun 25, 2020

asubiotto commented Jun 25, 2020

yuzefovich commented Jun 25, 2020

pbardea commented Jun 25, 2020

pbardea commented Jun 25, 2020 •

edited

Loading

pbardea commented Jun 25, 2020 •

edited

Loading