Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DataDrivenDBInputFormat & Add a JDBCScheme builder. #44

Closed
wants to merge 2 commits into from
Closed

Conversation

granthenke
Copy link

This is a rough port from Hadoop with the minimal amounts of changes needed. I also added a builder to JDBCScheme to help with the various forms of constructors. This can definitely be done more cleanly when moved to https://github.com/Cascading/cascading-jdbc.

See user group discussion: https://groups.google.com/forum/#!topic/cascading-user/HbJdrLKOLH8

@granthenke
Copy link
Author

This may also fix issue #18. I have done various tests and the performance was acceptable. In one test 1.5 trillion unique records were retrieved from a Teradata DB with 30 concurrent reads (and 30 reducers) in just over 4 hours. (Note: The number of concurrent reads and reducers may not be the most optimal.)

Explicit tests for Postgres may need to be done to be sure.

@johnynek
Copy link
Contributor

johnynek commented Feb 7, 2014

we're moving this code here:
twitter/scalding#779

but JDBC is removed. We recommend cascading-jdbc now.

@johnynek johnynek closed this Feb 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants