Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Apr 13, 2015

@SparkQA
Copy link

SparkQA commented Apr 13, 2015

Test build #30157 has finished for PR 5488 at commit 3e7db15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@marmbrus
Copy link
Contributor

This is not a bug, this is the intended behavior. I think the right fix here is to update the programming guide and other documentation to clearly state this. If users want to limit the scope of the data, they should add predicates which will be pushed down to the database.

@viirya
Copy link
Member Author

viirya commented Apr 15, 2015

ok. If I understand it correctly, lowerBound and upperBound are just used to decide partition stride, not for filtering. So all table rows are partitioned.

@marmbrus
Copy link
Contributor

That is correct.

@viirya viirya changed the title [SPARK-6800][SQL] Fix wrong logic to generate WHERE clause for JDBC [SPARK-6800][SQL] Update doc for JDBCRelation's columnPartition Apr 15, 2015
@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30296 has finished for PR 5488 at commit 3eb74d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters minValue and maxValue are advisory in that incorrect values may cause the partitioning to be poor, but no data will fail to be represented.

The sentence above already explains that the filters are only used for partitioning and that all data will always be returned. I think the best place to update would be in the SQL programming guide, in the table under the section "JDBC To Other Databases".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30307 timed out for PR 5488 at commit 1dcc929 after a configured wait of 120m.

@micaelcapitao
Copy link

Hi.
The doc for the SQLContext.jdbc() method says:

@param columnName the name of a column of integral type that will be used for partitioning.
@param lowerBound the minimum value of columnName to retrieve
@param upperBound the maximum value of columnName to retrieve
@param numPartitions the number of partitions. the range minValue-maxValue will be split
evenly into this many partitions

This doc should be updated too.

How can one add predicates to limit the scope of data being pushed from the DB using the SQLContext API? Will a select limiting that scope make the table not to be pushed entirely?

@viirya
Copy link
Member Author

viirya commented Apr 15, 2015

@micaelcapitao Thanks. I updated the doc too.

I think you can use jdbc data source API to create temporary table and then use WHERE clause to add predicates.

@marmbrus
Copy link
Contributor

That's correct. WHERE clause predicates and data frame filter operations
will be pushed down to the database.
On Apr 15, 2015 9:37 AM, "Liang-Chi Hsieh" notifications@github.com wrote:

@micaelcapitao https://github.com/micaelcapitao Thanks. I updated the
doc too.

I think you can use jdbc data source API to create temporary table and
then use WHERE clause to add predicates.


Reply to this email directly or view it on GitHub
#5488 (comment).

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30354 has finished for PR 5488 at commit 51386c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

asfgit pushed a commit that referenced this pull request Apr 15, 2015
JIRA https://issues.apache.org/jira/browse/SPARK-6800

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #5488 from viirya/fix_jdbc_where and squashes the following commits:

51386c8 [Liang-Chi Hsieh] Update code comment.
1dcc929 [Liang-Chi Hsieh] Update document.
3eb74d6 [Liang-Chi Hsieh] Revert and modify doc.
df11783 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_jdbc_where
3e7db15 [Liang-Chi Hsieh] Fix wrong logic to generate WHERE clause for JDBC.

(cherry picked from commit e3e4e9a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@marmbrus
Copy link
Contributor

Thanks! Merged to master and branch-1.3

@asfgit asfgit closed this in e3e4e9a Apr 15, 2015
@viirya viirya deleted the fix_jdbc_where branch December 27, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants