Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split groupeddataframe/grouping.jl into several files #2050

Merged
merged 1 commit into from
Dec 24, 2019

Conversation

jlumpe
Copy link
Contributor

@jlumpe jlumpe commented Dec 9, 2019

This file is pretty massive at 1500 lines and it was getting difficult to navigate through. I've split it logically into several files, similar to the organization of src/abstractdataframe/:

  • groupeddataframe.jl - Type definition and basic methods like length, names, and ==.
  • groupkeys.jl - Definitions for GroupKey, GroupKeys, and their methods.
  • getindex.jl - All methods of getindex(::GroupedDataFrame, ...) and other functions related to indices/keys.
  • splitapplycombine.jl - groupby, map, combine, by, and aggregate.

@bkamins
Copy link
Member

bkamins commented Dec 10, 2019

@nalimilan is maintaining split-apply-combine part of the codebase, so probably it is best if he comments if this is feasible (I guess he has some projects changing this part of code in paralel and such a change is seriously breaking).

In the worst case we can leave this proposal as a post 1.0 release change (as it is planned in general that we would refactor the codebase after 1.0 release to make it more consistent and manageable).

@jlumpe
Copy link
Contributor Author

jlumpe commented Dec 10, 2019

splitapplycombine.jl actually still contains over 1000 lines and should probably be broken up further, but I'm not familiar enough with that part of the code. The upside is that git treats it a renaming of the original file so hopefully any changes to it should be able to be merged automatically.

@nalimilan
Copy link
Member

Why not do this in theory, but I'd rather wait until #2047 is merged as rebasing over moving code is really a nightmare (I don't think git is that smart). Regarding the details, I'd say getindex.jl is too short to deserve its own file: better put its contents in groupeddataframe.jl.

Also as @bkamins noted at #2047 (review) we should also move some code out of dataframerow/utils.jl, which really isn't the best place to have it.

@jlumpe
Copy link
Contributor Author

jlumpe commented Dec 22, 2019

Saw #2047 got merged, rebased on top of it. @nalimilan I've kept all the indexing stuff in the main file for now, but #2046 is going to add a significant amount to it. It also depends on the definitions of GroupKey and GroupKeys so those can't be put into a different file if the indexing stuff isn't.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's go with this then.

@bkamins bkamins merged commit 0ee43f5 into JuliaData:master Dec 24, 2019
@bkamins
Copy link
Member

bkamins commented Dec 24, 2019

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants