-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: keyword lookups in the tokenizer #7606
Conversation
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
I'm loving this. |
go/vt/sqlparser/keywords_test.go
Outdated
if !ok { | ||
t.Fatalf("keyword %q failed to match", kw.name) | ||
} | ||
if lookup != kw.id { | ||
t.Fatalf("keyword %q matched to %d (expected %d)", kw.name, lookup, kw.id) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use require
over t.fatal
go/vt/sqlparser/parse_test.go
Outdated
if err != nil { | ||
t.Errorf(" Error: %v", err) | ||
t.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use require.NoError(t, err)
if err != nil { | ||
t.Error(scanner.Text()) | ||
t.Errorf(" Error: %v", err) | ||
t.Errorf("failed to parse %q: %v", query, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use require.NoError(t, err)
if err != nil { | ||
b.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use require.NoError(t, err)
if err != nil { | ||
b.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
if err != nil { | ||
b.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
I love how entertaining AND descriptive your PRs are @vmg :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Signed-off-by: Vicent Marti <vmg@strn.cat>
@harshit-gangal: I've added Ready to merge. 👌 |
What about quicktest? Does it add overhead? |
@deepthi I haven't tested it, I just noticed the |
Description
Happy Thursday everyone, this week we're bringing
sqlparser
performance improvements. I had a chance to sit with @frouioui and look at some of the profiles that we're now acquiring from his Are We Fast Yet (TM) work. There was nothing glaringly obvious that would provide massive optimization gains (as one would expect at this point, Vitess is quite optimized already), but for the normal request lifecycle, all of thesqlparser
operations on the AST are always quite hot, and most importantly for our goals, CPU-bound.Let's start squeezing some blood out of this stone: this particular PR comes from allocation benchmarks: the code in our SQL Tokenizer that processes SQL keywords allocates so much memory that it's showing as a hotspot in a CPU profiler and very clearly an allocation hotstop in the memory profiler.
Why does it do all these allocations? Well, right now it is copying the current token it's processing into a temporary buffer (this is the buffer that gets returned to the caller) and then it does yet another copy of the buffer to lowercase it so it can be looked up in our keywords table (people following at home will surely remember that SQL keywords are case insensitive).
Let's improve this with some very classical Compiler Theory approaches: Instead of using a hash table to lookup keywords, use a perfect table (a perfect table is a minimal hash table where lookups cannot collide -- it is measurably faster than a normal hashtable, even the one built-in in the Go runtime). And since we now have a perfect hash table, we control the hashing algorithm used for lookups... So we can switch the algorithm to perform the hashing case-insensitively. This makes it so we don't have to create lowercase copies of all keywords!
Results are :gucci: in the most realistic parse benchmarks. The pathological benchmarks do not regress.
Related Issue(s)
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect: