Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collation and Character set support #8606

Closed
GuptaManan100 opened this issue Aug 9, 2021 · 9 comments
Closed

Collation and Character set support #8606

GuptaManan100 opened this issue Aug 9, 2021 · 9 comments
Assignees
Labels
Component: Query Serving LFX Type: Enhancement Logical improvement (somewhere between a bug and feature)

Comments

@GuptaManan100
Copy link
Member

Feature Description

Vitess does not yet have support for collations and character-sets. So, to compare varchar strings Vitess needs to rely on WEIGHT_STRING function for now. As per MySQL documentation, WEIGHT_STRING is a debugging function, meant only for internal use. We can move away from this implementation after we have collation and character-set support in Vitess

Use Case(s)

  • Having the ability to compare strings using collation and character set support we will be able to better implement ORDER BY, GROUP BY, JOIN.
  • It will also allow us to leverage more advanced join techniques than what we currently implement.
@GuptaManan100 GuptaManan100 added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving LFX labels Aug 9, 2021
@king-11
Copy link
Contributor

king-11 commented Aug 11, 2021

Hey, @GuptaManan100 I would like to apply for this project under LFX. I am not much familiar with Vitess as of now can you point me somewhere I can look at current implementation so that I can design the changes that need to be made. Thanks 😄

@GuptaManan100
Copy link
Member Author

Hello, @king-11 https://github.com/vitessio/vitess/tree/main/go/vt/vtgate/planbuilder is the package which has the code for building the plans of how queries are to be executed in Vitess. You can look at this https://github.com/vitessio/vitess/blob/main/go/vt/vtgate/planbuilder/testdata/aggr_cases.txt file which contains the test data for aggregation queries. You will find weight_string function added to a lot of those. You can run the tests locally from https://github.com/vitessio/vitess/blob/main/go/vt/vtgate/planbuilder/plan_test.go file.

@GuptaManan100
Copy link
Member Author

@rgrupesh There is no other explicit requirement. The mentor for this project is going to be @vmg.

@rgrupesh
Copy link

@rgrupesh There is no other explicit requirement. The mentor for this project is going to be @vmg.

Thanks again @GuptaManan100 !

@vmg
Copy link
Collaborator

vmg commented Aug 23, 2021

For anybody else who is interested on applying for this task: the actual integration into Vitess is not that important, I will assist you with it once you've implemented the collation code.

The most important thing for your application is understanding what are Character Sets and Collations in MySQL, how do they work, and how they're implemented (i.e. https://dev.mysql.com/doc/refman/8.0/en/charset-charsets.html), then looking at the x/text/collate package from the Go maintainers (https://pkg.go.dev/golang.org/x/text/collate) and figuring out: how many of the existing MySQL collations are implemented in that package, how many behave exactly the same as in MySQL and how many will need to be implemented from scratch.

A strong application cover letter will give rough answers to all those questions (obviously they don't have to be fully correct/definitive -- I'm sure we'll find many nuances and bugs once we start integrating the collations into Vitess).

Thanks everyone and good luck with your applications!

@king-11
Copy link
Contributor

king-11 commented Aug 23, 2021

Can we still submit the cover letter? I thought the deadline was 22nd and questions that needed to be answered were the mentioned ones only 😅

@vmg
Copy link
Collaborator

vmg commented Aug 24, 2021

@king-11: ah, fair enough, the deadline went by while I was on holidays. The point still stands about what's important for the project, but don't worry if you've already submitted. :)

@king-11
Copy link
Contributor

king-11 commented Sep 5, 2021

@vmg I have read about what, how they work and how they are implemented. So I was moving to the go collate package how can I compare their implementations vs MySQL for each collation can you help me out on that.

@vmg
Copy link
Collaborator

vmg commented Sep 7, 2021

So, there are many ways to accomplish this. The first thing I'd try is implementing a WEIGHT_STRING function in Go on top of the collate package. It should take a string and a collation, and return a binary string like MySQL's implementation does. With this function in place, it should be trivial to start writing unit tests that compare the output of our collation package with the results that MySQL returns. Let me know if you run into any roadblocks attempting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving LFX Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

No branches or pull requests

4 participants