Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use colnames() rather than names() in gather()? #138

Closed
jarodmeng opened this issue Nov 12, 2015 · 2 comments
Closed

Use colnames() rather than names() in gather()? #138

jarodmeng opened this issue Nov 12, 2015 · 2 comments

Comments

@jarodmeng
Copy link

We're trying to implement gather() and spread() for SQL databases using S3 methods. However, gather() cannot be extended this way because it uses names() to get column names. For SQL database backends, names() would return the names of the list elements rather than the column names.

Is it possible to use colnames() instead, so that it works for both data frames and SQL backends?

@hadley
Copy link
Member

hadley commented Nov 12, 2015

Would be better to use dplyrs tbl_vars(). But what SQL are you going to generate? I always assumed gather/spread in SQL would be prohibitively difficult

@jarodmeng
Copy link
Author

Since it's used in the generic gather function, using colnames would preserve the functionalities for data frames, but allow it to be extended to SQL backends.

I implemented gather for SQL backends by building a bunch of lazy dots (mutate dots to create flag columns to indicate whether a row matches a key_col value, summarize dots to aggregate the product of the newly created flags and value_col, and finally select dots to only select those key_col value columns). It actually works fairly quickly and reliably.

@hadley hadley closed this as completed in 3819854 Dec 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants