-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation #1252
Update documentation #1252
Conversation
1 similar comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! I think there are already enough changes to justify one PR. Better finish this one, and open another one for the remaining checkboxes.
README.md
Outdated
@@ -2,7 +2,7 @@ DataFrames.jl | |||
============= | |||
|
|||
[![0.6](http://pkg.julialang.org/badges/DataFrames_0.6.svg)](http://pkg.julialang.org/?pkg=DataFrames) | |||
[![0.7](http://pkg.julialang.org/badges/DataFrames_0.7.svg)](http://pkg.julialang.org/?pkg=DataFrames) | |||
<!-- [![0.7](http://pkg.julialang.org/badges/DataFrames_0.7.svg)](http://pkg.julialang.org/?pkg=DataFrames) --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't render a badge on the README. I think because v0.7 doesn't have tests on pkg.julialang.org link. I'd also like to change the CI tests to allow failures on nightly until a release candidate comes out for v0.7. 👍 or 👎 ? These can all be moved to another PR too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. Just remove it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm going to remove the change here in favor of a badge-specific PR in #1253
docs/src/index.md
Outdated
to know to get up and running with tabular data manipulation using the DataFrames.jl package | ||
and the Julia language. If there is something you expect DataFrames to be capable of, but | ||
cannot figure out how to do, please reach out with questions in Domains/Data on | ||
[Discourse](https://discourse.julialang.org/new-topic?title=[DataFrames%20Question]:%20&body=%23%20Question:%0A%0A%23%20Dataset%20(if%20applicable):%0A%0A%23%20MWE%20(if%20applicable):%0A&category=Domains/Data&tags=question). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"MWE" could be spelled in full.
docs/src/index.md
Outdated
and the Julia language. If there is something you expect DataFrames to be capable of, but | ||
cannot figure out how to do, please reach out with questions in Domains/Data on | ||
[Discourse](https://discourse.julialang.org/new-topic?title=[DataFrames%20Question]:%20&body=%23%20Question:%0A%0A%23%20Dataset%20(if%20applicable):%0A%0A%23%20MWE%20(if%20applicable):%0A&category=Domains/Data&tags=question). | ||
You can also give feedback and suggest features or improvements by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"give feedback" is too general, better say "report bugs". Also "suggest features" is likely to attract many little useful posts which we will have to close, creating frustration on both sides. Better leave people discuss their problems on Discourse.
CurrentModule = DataFrames | ||
``` | ||
|
||
# Functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't everything listed here a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, everything on that page is a function, I'm not sure this H1 header adds anything. We could remove it and only show the H2 headers of the function-groups (and bump them up to being H1 headers). But before I act on any of your other comments on where/how functions should be grouped, what are your thoughts on just listing all of the functions alphabetically in a single section? It might make navigation difficult since the functions wouldn't be grouped by concept, but the conceptual groupings are pretty arbitrary and no matter how we group them I'm sure we won't be able to make the groups intuitive to everyone. It will probably be easier to keep these docstrings up to date if we just list the functions in the same way they are exported in the DataFrames.jl
source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, due to the presence of the isolated meltdf
, I had misread this as a H2 heading. I'm OK with keeping it as H1, meltdf
should simply be moved.
I'd say grouping is a better approach, but we don't need a lot of groups. Maybe one for basics, and one for grouping/joining & co.?
docs/src/lib/functions.md
Outdated
meltdf | ||
``` | ||
|
||
## Data Manipulation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Split-Apply-Combine"? Though colwise
is different.
|
||
All other columns are assumed to be measured variables (they are stacked). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove these two sentences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the example they describe because it didn't work
julia> iris = CSV.read(joinpath(Pkg.dir("DataFrames"), "test/data/iris.csv")); | ||
|
||
julia> d = stackdf(iris); | ||
ERROR: StackOverflowError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-)
│ 5 │ SepalLength │ 5.0 │ setosa │ | ||
│ 6 │ SepalLength │ 5.4 │ setosa │ | ||
|
||
julia> x = by(d, [:variable, :Species], df -> DataFrame(vsum = mean(parse.(Float64, df[:value])))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a line break. To work around the CSV.jl parsing bug, probably better go back to latest version for now.
src/other/utils.jl
Outdated
""" | ||
gennames(n::Integer) | ||
|
||
Generate standardized names for columns of a DataFrame. The first name will be :x1, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`:x1`
src/other/utils.jl
Outdated
""" | ||
countnull(a::AbstractArray) | ||
|
||
Count the number of null values in an array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`null`
I agree that the other goals can be changed in different PRs, I'll go with that approach rather than turning this into a monolithic behemoth of code churn |
a7136b9
to
d0b1612
Compare
Ok I think that's most of it. Right now the docstrings aren't rendering for the |
@@ -49,7 +49,6 @@ export AbstractDataFrame, | |||
nrow, | |||
nullable!, | |||
order, | |||
printtable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will stop exporting the printtable
function as suggested. If we go with that then we need to fix the tests to call DataFrames.printtable()
https://travis-ci.org/JuliaData/DataFrames.jl/jobs/287109648#L606
"Data manipulation" => "lib/manipulation.md", | ||
], | ||
"About" => Any[ | ||
"Release Notes" => "NEWS.md", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should point to the releases GitHub page somewhere on the homepage?
docs/src/man/categorical.md
Outdated
```julia | ||
cv = categorical(v) | ||
```jldoctest categorical | ||
julia> cv = categorical(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That (existing) example doesn't make a lot of sense as it just repeats the previous ones. Better show how to use categorical!(df, :col)
, and mention that this is in-place.
docs/src/man/getting_started.md
Outdated
|
||
```julia | ||
mean(Nulls.skip(df[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was useful to show how to skip nulls again. Maybe put a null in one of the columns, and show both?
docs/src/man/querying_frameworks.md
Outdated
@where i.age > 40 | ||
@select {number_of_children=i.children, i.name} | ||
end | ||
Query.EnumerableSelect{NamedTuples._NT_number__of__children_name{Int64,String},Query.EnumerableWhere{NamedTuples._NT_name_age_children{String,Float64,Int64},Query.EnumerableIterable{NamedTuples._NT_name_age_children{String,Float64,Int64},IterableTables.DataFrameIterator{NamedTuples._NT_name_age_children{String,Float64,Int64},Tuple{Array{String,1},Array{Float64,1},Array{Int64,1}}}},##5#7},##6#8}(Query.EnumerableWhere{NamedTuples._NT_name_age_children{String,Float64,Int64},Query.EnumerableIterable{NamedTuples._NT_name_age_children{String,Float64,Int64},IterableTables.DataFrameIterator{NamedTuples._NT_name_age_children{String,Float64,Int64},Tuple{Array{String,1},Array{Float64,1},Array{Int64,1}}}},##5#7}(Query.EnumerableIterable{NamedTuples._NT_name_age_children{String,Float64,Int64},IterableTables.DataFrameIterator{NamedTuples._NT_name_age_children{String,Float64,Int64},Tuple{Array{String,1},Array{Float64,1},Array{Int64,1}}}}(IterableTables.DataFrameIterator{NamedTuples._NT_name_age_children{String,Float64,Int64},Tuple{Array{String,1},Array{Float64,1},Array{Int64,1}}}(3×3 DataFrames.DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Looks like Query should provide a simplified printing for that type! :-)
Would [...]
work here to avoid copying the full output?
3: Array{Float64}((150,)) [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5 … 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1] | ||
4: Array{Float64}((150,)) [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1 … 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8] Species: DataFrames.RepeatedVector{CategoricalArrays.CategoricalValue{String,UInt32}} | ||
parent: CategoricalArrays.CategoricalArray{String,1,UInt32,String,Union{}} | ||
refs: Array{UInt32}((150,)) UInt32[0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001 … 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe replace everything related to CategoricalArray
with [...]
if that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't pass the doctests, but if we just want to truncate the output then that's fine. To be honest I'm not sure I see the value of showing dump
in the documentation (unless we add an advanced section to the documentation, since this is very useful but not something I'd show to a newcomer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's just remove this.
docs/src/man/split_apply_combine.md
Outdated
│ 1 │ setosa │ 1.462 │ 0.0301592 │ | ||
│ 2 │ versicolor │ 4.26 │ 0.220816 │ | ||
│ 3 │ virginica │ 5.552 │ 0.304588 │ | ||
|
||
``` | ||
|
||
A second approach to the Split-Apply-Combine strategy is implemented in the `aggregate` function, which also takes three arguments: (1) a DataFrame, (2) one or more columns to split the DataFrame on, and (3) one or more functions that are used to compute a summary of each subset of the DataFrame. Each function is applied to each column, that was not used to split the DataFrame, creating new columns of the form `$name_$function` e.g. `SepalLength_mean`. Anonymous functions and expressions that do not have a name will be called `λ1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently "will be called λ1
" is no longer true.
docs/src/man/split_apply_combine.md
Outdated
│ 2 │ versicolor │ 50 │ 50 │ 50 │ 50 │ | ||
│ 3 │ virginica │ 50 │ 50 │ 50 │ 50 │ | ||
|
||
julia> aggregate(iris, :Species, [sum, x->mean(x)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without Nulls.skip
, it's confusing that you use x -> mean(x)
rather than just mean
. Maybe keep Nulls.skip
even if that's not needed, given that it's a common pattern that's useful to show.
docs/src/man/categorical.md
Outdated
julia> df = DataFrame(A = [1, 1, 1, 2, 2, 2], | ||
B = ["X", "X", "X", "Y", "Y", "Y"]) | ||
julia> df = DataFrame(A = ["A", "B", "C", "D", "D", "A"], | ||
B = ["X", "X", "X", "Y", "Y", "Y"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug?
docs/src/man/categorical.md
Outdated
│ 5 │ D │ Y │ | ||
│ 6 │ A │ Y │ | ||
|
||
julia> allcols = deepcopy(df); bothcols = deepcopy(df); onecol = deepcopy(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's confusing to do all that setup. Maybe just show first that you can specify a single column, and then call the function without an argument and show that the other column was also converted? Then people can read the docstring to find out more.
Anything left to do here? |
de766cb
to
202c42c
Compare
Not anymore! |
Doctests are failing on CI (this is hidden by default, need to unroll the section). I think you'll have to install CSV in the same line that installs Documenter. |
|
Yeah, we need to tag releases in order to be able to run our own examples... |
Goals: