Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce loading size by using .select() on CassandraRDDs #554

Open
nstrelow opened this issue Jul 26, 2017 · 0 comments
Open

Reduce loading size by using .select() on CassandraRDDs #554

nstrelow opened this issue Jul 26, 2017 · 0 comments

Comments

@nstrelow
Copy link
Collaborator

By selecting only the needed columns on CassandraRDDs the loading size can be greatly reduced.
e.g. subject_dbpeida all columns: 700MB -> only name: 10MB.

To be used when only certain columns are needed.

Can be applied to blocking and duplicate detection and possibly more.

Probably an extension of the subject class is needed to fit the columns that were selected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant