Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsharding vindex #9428

Merged
merged 17 commits into from
Jan 5, 2022
Merged

Conversation

harshit-gangal
Copy link
Member

@harshit-gangal harshit-gangal commented Dec 21, 2021

Description

This PR adds a new generic MultiColumn Vindex that can be used only as a primary vindex.

This Vindex takes in 3 inputs

  1. column_count - the number of columns that would be provided for using the vindex.
  2. column_vindex - hashing function each column will use to provide hash value for that column
  3. column_bytes - bytes to be used from each column's hash value after applying hashing function on it to produce keyspace id.

Usage in VSchema:

"vindexes": {
    "multicol_vdx": {
	  "type": "multicol",
	  "params": {
		"column_count": "3",
		"column_bytes": "1,3,4",
		"column_vindex": "hash,binary,unicode_loose_xxhash"
	  }
    }
}
"tables": {
   "multicol_tbl": {
	  "column_vindexes": [
	    {
                "columns": ["cola","colb","colc"],
		"name": "multicol_vdx"
	    }
      ]
   }
}

column_count is the mandatory parameter that needs to be provided.
A maximum of 8 columns can be used in this vindex i.e. column_count <= 8

column_vindex should contain the vindex name in a comma-separated list. It should be less than equal to column_count.
Default vindex is hash vindex, any column for which vindex is not provided, the default vindex will be used.
Vindex in column_vindex should implement the below interface otherwise the initialization will fail.

// Hashing defined the interface for the vindexes that export the Hash function to be used by multi-column vindex.
type Hashing interface {
	Hash(id sqltypes.Value) ([]byte, error)
}

column_bytes should contain bytes in a comma-separated list. The total count should be equal to 8 bytes.
If for some columns bytes are not represented then it is calculated by assigning equal bytes to remaining unassigned columns.

Eg:

Given:
column_count = 5
column_bytes = 1, , 3

col 1 -> 1
col 2 -> not-provided
col 3 -> 3
col 4 -> not-provided
col 5 -> not-provided

Calculated:
remaining bytes = 8 - 1 - 3 -> 4
remaining columns = 5 - 2 -> 3
col 2 -> 2
col 4 -> 1
col 5 -> 1

Related Issue(s)

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
… use

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
@harshit-gangal harshit-gangal marked this pull request as ready for review December 30, 2021 16:55
@harshit-gangal
Copy link
Member Author

@csquared take a look at the PR description.

@frouioui
Copy link
Member

frouioui commented Jan 4, 2022

@harshit-gangal, thank you for such a good description, we should reuse it for the documentation.

I have a question regarding column_bytes. Let's assume column_count equals 2, and column_bytes equals 4,4. The 8 bytes used for the key will be created using 4 bytes from both columns (as instructed by column_bytes). Now, which 4 bytes will be taken from each column? The left-most or right-most bytes, or else?

Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

We need to document this change

go/vt/vtgate/vindexes/multicol.go Show resolved Hide resolved
go/vt/vtgate/vindexes/multicol.go Show resolved Hide resolved
@harshit-gangal
Copy link
Member Author

harshit-gangal commented Jan 4, 2022

@harshit-gangal, thank you for such a good description, we should reuse it for the documentation.

I have a question regarding column_bytes. Let's assume column_count equals 2, and column_bytes equals 4,4. The 8 bytes used for the key will be created using 4 bytes from both columns (as instructed by column_bytes). Now, which 4 bytes will be taken from each column? The left-most or right-most bytes, or else?

_c1__c2_c2_c2__c3__c3__c3_c3_
|_0_|_1_|_2_|_3_|_4_|_5_|_6_|_7_|

c1 - 1 byte
c2 - 3 bytes
c3 - 4 bytes

For the column Cn -> Cn.Hash(Cn_values)[0 : n_bytes_allocated_for_column]

@harshit-gangal harshit-gangal merged commit 0522df8 into vitessio:main Jan 5, 2022
@harshit-gangal harshit-gangal deleted the subsharding-vdx branch January 5, 2022 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: VSchema for “Subsharding” VIndex
2 participants