Add docs about indexing and iterating GroupedDataFrames

JuliaData · Aug 12, 2019 · 7e92206 · 7e92206
1 parent 384b53d
commit 7e92206
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 1 deletion.
diff --git a/docs/src/lib/indexing.md b/docs/src/lib/indexing.md
@@ -189,3 +189,15 @@ Note that `sdf[!, col] .= v` and `sdf[!, cols] .= v` syntaxes are not allowed as
 
 If column indexing using `Symbol` names in `cols` is performed, the order of columns in the operation is specified
 by the order of names.
+
+
+## Indexing `GroupedDataFrame`s
+
+[`GroupedDataFrame`](@ref) implements the dictionary interface and so supports additional methods for indexing:
+
+* `gd[i::Int]` -> Get the `i`th group
+* `gd[key::NamedTuple]` -> Get the group corresponding to the dictionary key `key` (see [`keys(::GroupedDataFrame)`](@ref)).
+  The fields of the `NamedTuple` must be in the same order, which is the order of the columns passed to [`groupby`](@ref).
+  The keys are ordered, so that `gd[keys(gd)[i]]` refers to the same group as `gd[i]`.
+* `gd[key::Tuple]` -> Same as previous, but omitting the names on `key`.
+* `get(gd, key, default)` -> Get group by key, returning `default` if it does not exist.
diff --git a/docs/src/man/split_apply_combine.md b/docs/src/man/split_apply_combine.md
@@ -128,7 +128,7 @@ julia> aggregate(iris, :Species, [sum, mean])
 │ 3   │ virginica     │ 329.4           │ 148.7          │ 277.6           │ 101.3          │ 6.588            │ 2.974           │
 ```
 
-If you only want to split the data set into subsets, use the `groupby` function:
+If you only want to split the data set into subsets, use the [`groupby`](@ref) function:
 
 ```jldoctest sac
 julia> for subdf in groupby(iris, :Species)
@@ -138,3 +138,15 @@ julia> for subdf in groupby(iris, :Species)
 50
 50
 ```
+
+To also get the values of the grouped columns along with each group, use the
+`pairs` function:
+
+```jldoctest sac
+julia> for (key, subdf) in pairs(groupby(iris, :Species))
+           println("Number of data points for $(key.Species): $(size(subdf, 1))")
+       end
+Number of data points for setosa: 50
+Number of data points for versicolor: 50
+Number of data points for verginica: 50
+```