Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] sql: add metadata router #50637

Closed
wants to merge 1 commit into from
Closed

Conversation

pbardea
Copy link
Contributor

@pbardea pbardea commented Jun 25, 2020

This commit adds a DistSQL router that routes metadata to a given output
stream based on the metadata's StreamIdx. This is used by flows which
schedule DistSQL processors in order to coordinate work around the
cluster.

The motivation for this change is a refactoring to Restore which
attempts to distribute the work of performing the restore across the
cluster. RESTORE works by creating a pipeline of work with 2 stages. The
first stage splits and scatters the ranges we are going to import. This
shuffling means that a range could end up on a random node in the
cluster. The second stage of the pipeline is to download the data from
the backup file and restore the data, which is accomplished through an
AddSSTable request. It is beneficial (as well as the motivation for this
refactor) for the node which issues this request to also be the
leaseholder of the range it is trying to import. This is to prevent a
situation where many nodes are all waiting on one node that had the
misfortune of being the recipient of many scatter ranges in a row.

This router would allow restore to be implemented with 2 separate
processors: one that splits and scatters the data, and one that imports
the data. Using this router, the split and scatter processor could
determine, on the fly, which processor is suitable to import the data.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pbardea pbardea changed the title sql: add metadata router [wip] sql: add metadata router Jun 25, 2020
@pbardea
Copy link
Contributor Author

pbardea commented Jun 25, 2020

This is still a WIP and there are tests that are still to be written and general polish to be done - I just wanted to get some early feedback on the approach and that things are looking sane before I polish this up a bit more. Please let me know if you have any questions, and of course any feedback is greatly appreciated!

I chatted with @yuzefovich about this regarding the motivation behind this. I attempted to summarize the motivation in the verbose (sorry - planning on cleaning this up) commit message.

This commit adds a DistSQL router that routes metadata to a given output
stream based on the metadata's StreamIdx. This is used by flows which
schedule DistSQL processors in order to coordinate work around the
cluster.

The motivation for this change is a refactoring to Restore which
attempts to distribute the work of performing the restore across the
cluster. RESTORE works by creating a pipeline of work with 2 stages. The
first stage splits and scatters the ranges we are going to import. This
shuffling means that a range could end up on a random node in the
cluster. The second stage of the pipeline is to download the data from
the backup file and restore the data, which is accomplished through an
AddSSTable request. It is beneficial (as well as the motivation for this
refactor) for the node which issues this request to also be the
leaseholder of the range it is trying to import. This is to prevent a
situation where many nodes are all waiting on one node that had the
misfortune of being the recipient of many scatter ranges in a row.

This router would allow restore to be implemented with 2 separate
processors: one that splits and scatters the data, and one that imports
the data. Using this router, the split and scatter processor could
determine, on the fly, which processor is suitable to import the data.

Release note: None
@asubiotto
Copy link
Contributor

IIUC the intention here is to be able to plan second stage processors on the lease holder. I'm not convinced that metadata is the correct way to do this, because it'd be mixing planning with execution.

This might be very hand wavy since I don't know much about the code in question, but this is how I would expect the restore to work: plan and execute one flow that splits and scatters ranges, then plan a second flow that would plan the processors on the leaseholders for the respective ranges, similar to what we do with scans. Would something like that work?

@pbardea
Copy link
Contributor Author

pbardea commented Jun 25, 2020

Planning the second flow after the splitting and scattering is done was considered, and this is how restore used to work. However, the splitting and scattering itself takes a non-trivial amount of time, especially with large backups, so waiting for the first flow to finish before planning the second flow is expected to significantly affect performance.

There's also another question that I had, if this approach ends up making sense, does it make sense to stream this information as a row or as metadata? Since the information being passed between processors is about the data the the processors are going to import (rather than the data rows themselves) I thought it made sense to pass them as metadata.

@asubiotto
Copy link
Contributor

I see. I guess I'm uncomfortable with the idea about having metadata change anything about execution.

I think we have a precedent for streaming this information as a row, and I think the infra already exists for that (take a look at the rangeRouter). This makes more sense to me in the context of a processor (i.e. a processor takes a row and does something with it). You could unconditionally plan all of these restore processors that would run AddSSTable requests with spans it receives. The first stage would then use a rangeRouter to route these spans to whichever node is a leaseholder for that range.

@pbardea
Copy link
Contributor Author

pbardea commented Jun 25, 2020

Hm, I see. I looked at the rangeRouter but the problem that I ran into is that we don't know the mapping between spans the the stream index we want to send the row on until we do the work in the first processor.

I wonder if it would make sense to have a router that's very similar to the rangeRouter, but instead of having a column that is a key which the router maps to a stream index, have one of the columns be an int that stores the stream index directly? This seems pretty similar to the rangeRouter, but would allow the dynamic adjustment of the flow of rows after planning time.

@asubiotto
Copy link
Contributor

I don't think there would be a problem dynamically update the range router's mapping. The first processor would reference it and update before returning spans. I would prefer that than creating a new router that is the same minus this detail, but I can be convinced otherwise.

@yuzefovich
Copy link
Member

I've been persuaded by Alfonso that the necessary information should flow as "rows". Paul, sorry for encouraging you to go down a different path.

@pbardea
Copy link
Contributor Author

pbardea commented Jun 25, 2020

Summarizing an offline discussion with @asubiotto, I'm going to attempt to see if the processor can leverage the range router and update it's mapping per processor dynamically (during the execution of the processor).

@pbardea pbardea closed this Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants