-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[ZEPPELIN-1470] limiting results from jdbc #2428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f574f84 to
d3d6eba
Compare
|
user will not know that the table has more rows and the result is not complete. Maybe set fetchSize? |
d3d6eba to
1061a66
Compare
|
@tinkoff-dwh you're right, picked the wrong method - using setFetchSize now |
|
I'm also not exactly sure how to write a test for this, as the JDBCInterpreterTest doesn't use anything in the line of mocks (and the outside behavior - limiting the number of returned rows - is already guaranteed by some filtering code. Any ideas @tinkoff-dwh ? |
|
it would be better getMaxResult() + 1 |
|
I think it's impossible to test. |
|
Why +1? |
|
yep, getMaxResult is correct. i thought the condition <= |
|
It seems apache/zeppelin master (which is what this branch is based on) is failing the build - should I rebase it to 0.7 instead, or wait for master to be fixed? Please let me know what should I do next here (this is my first contribution to the project, pardon the ignorance) |
|
https://travis-ci.org/herval/zeppelin/builds/245889847 |
|
@tinkoff-dwh build #3 is failing in both my master & zeppelin master: https://travis-ci.org/apache/zeppelin - restarting didn't make it pass: https://travis-ci.org/apache/zeppelin/jobs/246113686 |
|
failed test which load dependencies, it is possible too long or problems with the network. try restart again |
1061a66 to
f751be8
Compare
| for (int i = 0; i < sqlArray.size(); i++) { | ||
| String sqlToExecute = sqlArray.get(i); | ||
| statement = connection.createStatement(); | ||
| statement.setFetchSize(getMaxResult()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If getMaxResult() is used in statement directly, then it might not be necessary to use it in method getResults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zjffdu
? it is count for size of fetch, not limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my misunderstanding of this API. Please ignore it.
|
LGTM |
|
Build's passing: https://travis-ci.org/herval/zeppelin/builds/247193502 Let me know if this is mergeable :-) @tinkoff-dwh |
|
@herval |
khalidhuseynov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for improvement. LGTM and CI is passing, will be merging into master if no more discussion
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be something to indicate the result is truncated?
|
hmm.. It actually seems that setFetchSize does not limit the query execution (so it takes forever & hangs Zeppelin) Figuring out if there's a way to show results are truncated + use setMaxRows |
|
we already have a check to limit. setMaxRows duplicate it |
|
@tinkoff-dwh it's not duplicate. The bug here is that if you don't do TLDR - the current limit check is only applied after you get results. If you query a big enough table, you'll never get the results (and will kill Zeppelin in the process) |
|
got it. I agree it's generally better to restrict the job at the source |
|
I am using the original setting (getMaxResults). I had to set it in two calls because setFetchSize will determine that there is a “next” page, but if I only do “setMaxResults”, it will not display the truncation warning (as it translates to a “limit”)
…________________________________
From: Felix Cheung <notifications@github.com>
Sent: Thursday, June 29, 2017 9:30:44 PM
To: apache/zeppelin
Cc: Herval Freire; Mention
Subject: Re: [apache/zeppelin] [ZEPPELIN-1470] limiting results from jdbc (#2428)
got it. I agree it's generally better to restrict the job at the source
perhaps you can use the existing settings but apply it at the JDBC instead? it would be easier to switch existing user over the new behavior this way - actually, let's see what other thinks about this approach.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#2428 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAAV6lwqItW73HbpQPIjLnKd6M4nnvryks5sJHn0gaJpZM4OBqFf>.
|
|
any additional thoughts on this? Is it mergeable? @khalidhuseynov @tinkoff-dwh @felixcheung |
4beb6d8 to
4f66469
Compare
|
Rebased to latest master - please advise |
Summary: As per the PR discussion on apache#2428 Differential Revision: https://phabricator.twitter.biz/D65563
|
ping - please advise |
|
LGTM~ |
|
|
||
| // fetch n+1 rows in order to indicate there's more rows available (for large selects) | ||
| statement.setFetchSize(getMaxResult()); | ||
| statement.setMaxRows(getMaxResult() + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so hopefully getMaxResult() won't return Integer.MAX_VALUE? because +1 will overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless someone specifies a fetch size of two billion rows, this should be fine (the UI would break and Zeppelin would run OOM and the world would melt w/ that amount of rows anyway, so I wouldn't particularly worry about that scenario? :-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, yes
|
merging if no more comment |
### What is this PR for? One thing we tracked down is that if you issue a large query on a very large table, it will simply try to load all results (and then cap them on Zeppelin's side), which seems suboptimal (and will freeze the server). Setting this on the JDBC level seems to solve the problem. ### What type of PR is it? Bug Fix ### Todos * [x] Tests ### How should this be tested? - Create or use a table with a very large number of rows. In our tests, I simply created a: ``` createdb zeppelin_test psql zeppelin_test create table too_many_rows(n int) ``` And added 5m rows to it. Making a paragraph like this will hang without setting a limit: ``` %zeppelin_test select * from too_many_rows ``` ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Herval Freire <hfreire@twitter.com> Closes apache#2428 from herval/hfreire/limit-row-count and squashes the following commits: 4f66469 [Herval Freire] display truncation message b538c44 [Herval Freire] limiting results from jdbc

What is this PR for?
One thing we tracked down is that if you issue a large query on a very large table, it will simply try to load all results (and then cap them on Zeppelin's side), which seems suboptimal (and will freeze the server). Setting this on the JDBC level seems to solve the problem.
What type of PR is it?
Bug Fix
Todos
How should this be tested?
And added 5m rows to it.
Making a paragraph like this will hang without setting a limit:
Questions:
No
No
No