Skip to content

Commit

Permalink
Add docs about indexing and iterating GroupedDataFrames
Browse files Browse the repository at this point in the history
  • Loading branch information
jlumpe committed Aug 12, 2019
1 parent 384b53d commit 7e92206
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 1 deletion.
12 changes: 12 additions & 0 deletions docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,3 +189,15 @@ Note that `sdf[!, col] .= v` and `sdf[!, cols] .= v` syntaxes are not allowed as

If column indexing using `Symbol` names in `cols` is performed, the order of columns in the operation is specified
by the order of names.


## Indexing `GroupedDataFrame`s

[`GroupedDataFrame`](@ref) implements the dictionary interface and so supports additional methods for indexing:

* `gd[i::Int]` -> Get the `i`th group
* `gd[key::NamedTuple]` -> Get the group corresponding to the dictionary key `key` (see [`keys(::GroupedDataFrame)`](@ref)).
The fields of the `NamedTuple` must be in the same order, which is the order of the columns passed to [`groupby`](@ref).
The keys are ordered, so that `gd[keys(gd)[i]]` refers to the same group as `gd[i]`.
* `gd[key::Tuple]` -> Same as previous, but omitting the names on `key`.
* `get(gd, key, default)` -> Get group by key, returning `default` if it does not exist.
14 changes: 13 additions & 1 deletion docs/src/man/split_apply_combine.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ julia> aggregate(iris, :Species, [sum, mean])
│ 3 │ virginica │ 329.4 │ 148.7 │ 277.6 │ 101.3 │ 6.588 │ 2.974 │
```

If you only want to split the data set into subsets, use the `groupby` function:
If you only want to split the data set into subsets, use the [`groupby`](@ref) function:

```jldoctest sac
julia> for subdf in groupby(iris, :Species)
Expand All @@ -138,3 +138,15 @@ julia> for subdf in groupby(iris, :Species)
50
50
```

To also get the values of the grouped columns along with each group, use the
`pairs` function:

```jldoctest sac
julia> for (key, subdf) in pairs(groupby(iris, :Species))
println("Number of data points for $(key.Species): $(size(subdf, 1))")
end
Number of data points for setosa: 50
Number of data points for versicolor: 50
Number of data points for verginica: 50
```

0 comments on commit 7e92206

Please sign in to comment.