Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning large number of rows in SimpleQueryHandler::do_query() is extremely slow #37

Open
osawyerr opened this issue Jan 15, 2023 · 6 comments

Comments

@osawyerr
Copy link

Hi there. I've been experimenting with pgwire. Using psql when returning a large number of rows (even with a single i32 column) from SimpleQueryHandler::do_query() the results are almost 10x slower than in postgres.

  • Example, returning 7.5M rows of a row containing a single i32 column in pgwire using TextDataRowEncoder takes approx 6.5 secs in psql
  • The same results from Postgres comes back in 600ms.

version : 0.7.0
OS: MacOS M1

@osawyerr osawyerr changed the title Returning large number of rows in SimpleQueryHandler::do_query() is slow Returning large number of rows in SimpleQueryHandler::do_query() is extremely slow Jan 15, 2023
@sunng87
Copy link
Owner

sunng87 commented Jan 15, 2023

Thank you for reporting. I haven't got chance to work on performance of pgwire. I will do some profile to find out the bottleneck. Contribution is welcomed if you are interested in this part.

@sunng87
Copy link
Owner

sunng87 commented May 2, 2023

In #94 I'm adding tcp_nodelay to client socket by default, according to my pgbench test the performance is now on par with postgresql.

@osawyerr
Copy link
Author

osawyerr commented May 2, 2023

Oh thats really cool.

@osawyerr
Copy link
Author

osawyerr commented Jul 8, 2023

I was looking at the again. It still seems pretty slow. When executing a query with alot of results (millions) on postgres directly it seems that some values are returned to the client before the query has finished executing (so it gives the impression that its quicker), however when using pgwire, all results are returned before any results are shown. Is this the case?

The implementation of streaming to the client I'm using is identical to the datafusion example.

Also in GrepTime, have you guys done performance testing with large resultsets?

@sunng87
Copy link
Owner

sunng87 commented Jul 17, 2023

Sorry for late response. I have been super busy these days. At greptime we haven't cover this part on postgres interface.

I'm going to check the code again but iirc we are using a stream based API to return results to client. It seems my datafusion example has some potential improvement that for a recordbatch, we don't need to add all results in the vector. This might be the reason you it's blocking for results. I will find time to update the example.

@sunng87
Copy link
Owner

sunng87 commented Mar 17, 2024

I just improved performance of DataRowEncoder in #165 and it should have twice throughtput in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants