Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Improve sqlparser performance #1557

Open
alamb opened this issue Nov 26, 2024 · 4 comments
Open

[EPIC] Improve sqlparser performance #1557

alamb opened this issue Nov 26, 2024 · 4 comments

Comments

@alamb
Copy link
Contributor

alamb commented Nov 26, 2024

What problem are you trying to solve?

Normally, in a SQL processing system, parsing SQL is not a major bottleneck compared to actually processing data. That being said, given how many SQL strings are parsed by this crate, I think there is significant benefit to improving the performance of the SQL parser in this crate.

That being said, I also think it is important to minimize the impact on downstream crates as much as possible.

Recently, we started introducing locations into the parser (thanks again @Nyrox!), which we found slows things down a bit (see #1435 (comment)).

Thankfully, I think there is significant room for improvement. As as part of the adding location information, I spent some time profiling and I think there are some obvious ways to improve the performance without impacting downstream crates.

Here is the flamegraph for anyone who is interested (you can download it locally to get zoom / etc):

fixed-flamegraph

fixed-flamegraph

What would you like to see?

The idea would be

  1. Run the benchmarks (instructions in Document micro benchmarks #1555)
  2. Maybe add additional benchmarks so they are more representative
  3. Improve the benchmarks

Ideas to improve performance:

@davisp
Copy link
Member

davisp commented Dec 7, 2024

@alamb How are you generating your flamegraphs?

Locally, cargo flamegraph --bench sqlparser_bench does not appear to be doing the trick as the flamegraph appears to be at most measuring a single iteration of the benchmark.

@alamb
Copy link
Contributor Author

alamb commented Dec 9, 2024

Locally, cargo flamegraph --bench sqlparser_bench does not appear to be doing the trick as the flamegraph appears to be at most measuring a single iteration of the benchmark.

This is basically what I normally do. Sometimes it is tricky to interpret the cargo bench flamegraph as it has both the warmup and the run

Can you post the flamegraph you have?

@davisp
Copy link
Member

davisp commented Dec 10, 2024

So this is kinda weird. It appears the --bench argument that should be passed to the benchmark isn't.

Running cargo flamegraph --root --bench sqlparser_bench:

flamegraph

However, it works fine if I run flamegraph against the binary directly:

flamegraph --root -- target/release/deps/sqlparser_bench-9f7a6e6e193d8f5c --bench

flamegraph

And the easier to remember version works as well:

cargo flamegraph --root --bench sqlparser_bench -- --bench:

flamegraph

So no idea why that --bench is apparently getting dropped but I've managed to get things going at least.

@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2024

So no idea why that --bench is apparently getting dropped but I've managed to get things going at least.

🎉 glad to hear you figured out a way past the current issue. I have also found argument handling with cargo bench a little strange so I don't have a great explanation 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants