Update documentation #1252

cjprybol · 2017-10-11T05:26:40Z

Goals:

Add missing exported functions and types to online documentation
Move documentation examples to doctests
- Make sure the broken examples get fixed before merging

coveralls · 2017-10-11T05:42:00Z

Coverage remained the same at 72.552% when pulling a7136b9 on cjp/doctoberfest into 8fd0851 on master.

coveralls · 2017-10-11T05:42:00Z

Coverage remained the same at 72.552% when pulling a7136b9 on cjp/doctoberfest into 8fd0851 on master.

nalimilan

Thanks for doing this! I think there are already enough changes to justify one PR. Better finish this one, and open another one for the remaining checkboxes.

nalimilan · 2017-10-11T12:10:24Z

README.md

@@ -2,7 +2,7 @@ DataFrames.jl
 =============

 [![0.6](http://pkg.julialang.org/badges/DataFrames_0.6.svg)](http://pkg.julialang.org/?pkg=DataFrames)
-[![0.7](http://pkg.julialang.org/badges/DataFrames_0.7.svg)](http://pkg.julialang.org/?pkg=DataFrames)
+<!-- [![0.7](http://pkg.julialang.org/badges/DataFrames_0.7.svg)](http://pkg.julialang.org/?pkg=DataFrames) -->


Why change this?

It doesn't render a badge on the README. I think because v0.7 doesn't have tests on pkg.julialang.org link. I'd also like to change the CI tests to allow failures on nightly until a release candidate comes out for v0.7. 👍 or 👎 ? These can all be moved to another PR too

Ah, OK. Just remove it then.

Ok, I'm going to remove the change here in favor of a badge-specific PR in #1253

nalimilan · 2017-10-11T12:11:39Z

docs/src/index.md

+to know to get up and running with tabular data manipulation using the DataFrames.jl package
+and the Julia language. If there is something you expect DataFrames to be capable of, but
+cannot figure out how to do, please reach out with questions in Domains/Data on
+[Discourse](https://discourse.julialang.org/new-topic?title=[DataFrames%20Question]:%20&body=%23%20Question:%0A%0A%23%20Dataset%20(if%20applicable):%0A%0A%23%20MWE%20(if%20applicable):%0A&category=Domains/Data&tags=question).


"MWE" could be spelled in full.

nalimilan · 2017-10-11T12:13:04Z

docs/src/index.md

+and the Julia language. If there is something you expect DataFrames to be capable of, but
+cannot figure out how to do, please reach out with questions in Domains/Data on
+[Discourse](https://discourse.julialang.org/new-topic?title=[DataFrames%20Question]:%20&body=%23%20Question:%0A%0A%23%20Dataset%20(if%20applicable):%0A%0A%23%20MWE%20(if%20applicable):%0A&category=Domains/Data&tags=question).
+You can also give feedback and suggest features or improvements by


"give feedback" is too general, better say "report bugs". Also "suggest features" is likely to attract many little useful posts which we will have to close, creating frustration on both sides. Better leave people discuss their problems on Discourse.

nalimilan · 2017-10-11T12:15:00Z

docs/src/lib/functions.md

+CurrentModule = DataFrames
+```
+
+# Functions


Isn't everything listed here a function?

Yes, everything on that page is a function, I'm not sure this H1 header adds anything. We could remove it and only show the H2 headers of the function-groups (and bump them up to being H1 headers). But before I act on any of your other comments on where/how functions should be grouped, what are your thoughts on just listing all of the functions alphabetically in a single section? It might make navigation difficult since the functions wouldn't be grouped by concept, but the conceptual groupings are pretty arbitrary and no matter how we group them I'm sure we won't be able to make the groups intuitive to everyone. It will probably be easier to keep these docstrings up to date if we just list the functions in the same way they are exported in the DataFrames.jl source file.

Actually, due to the presence of the isolated meltdf, I had misread this as a H2 heading. I'm OK with keeping it as H1, meltdf should simply be moved.

I'd say grouping is a better approach, but we don't need a lot of groups. Maybe one for basics, and one for grouping/joining & co.?

nalimilan · 2017-10-11T12:16:46Z

docs/src/lib/functions.md

+meltdf
+```
+
+## Data Manipulation


Maybe "Split-Apply-Combine"? Though colwise is different.

nalimilan · 2017-10-11T12:38:33Z

docs/src/man/reshaping_and_pivoting.md


-All other columns are assumed to be measured variables (they are stacked).


Why remove these two sentences?

I replaced the example they describe because it didn't work

nalimilan · 2017-10-11T12:42:17Z

docs/src/man/reshaping_and_pivoting.md

+julia> iris = CSV.read(joinpath(Pkg.dir("DataFrames"), "test/data/iris.csv"));
+
+julia> d = stackdf(iris);
+ERROR: StackOverflowError:


nalimilan · 2017-10-11T12:43:35Z

docs/src/man/reshaping_and_pivoting.md

+│ 5   │ SepalLength │ 5.0   │ setosa  │
+│ 6   │ SepalLength │ 5.4   │ setosa  │
+
+julia> x = by(d, [:variable, :Species], df -> DataFrame(vsum = mean(parse.(Float64, df[:value]))));


Add a line break. To work around the CSV.jl parsing bug, probably better go back to latest version for now.

nalimilan · 2017-10-11T12:49:55Z

src/other/utils.jl

+"""
+    gennames(n::Integer)
+
+Generate standardized names for columns of a DataFrame. The first name will be :x1, the


nalimilan · 2017-10-11T12:50:10Z

src/other/utils.jl

+"""
+    countnull(a::AbstractArray)
+
+Count the number of null values in an array.


cjprybol · 2017-10-11T17:25:17Z

I agree that the other goals can be changed in different PRs, I'll go with that approach rather than turning this into a monolithic behemoth of code churn

cjprybol · 2017-10-11T23:14:15Z

Ok I think that's most of it. Right now the docstrings aren't rendering for the types and functions API man pages, but I think that's because not every type/function in those sections have docstrings. That'll be a followup PR

cjprybol · 2017-10-12T16:53:26Z

src/DataFrames.jl

@@ -49,7 +49,6 @@ export AbstractDataFrame,
       nrow,
       nullable!,
       order,
-       printtable,


This will stop exporting the printtable function as suggested. If we go with that then we need to fix the tests to call DataFrames.printtable() https://travis-ci.org/JuliaData/DataFrames.jl/jobs/287109648#L606

nalimilan · 2017-10-12T15:17:45Z

docs/make.jl

-            "Data manipulation" => "lib/manipulation.md",
-        ],
-        "About" => Any[
-            "Release Notes" => "NEWS.md",


Maybe we should point to the releases GitHub page somewhere on the homepage?

nalimilan · 2017-10-12T15:23:01Z

docs/src/man/categorical.md

-```julia
-cv = categorical(v)
+```jldoctest categorical
+julia> cv = categorical(v)


That (existing) example doesn't make a lot of sense as it just repeats the previous ones. Better show how to use categorical!(df, :col), and mention that this is in-place.

nalimilan · 2017-10-12T15:27:50Z

docs/src/man/getting_started.md


-```julia
-mean(Nulls.skip(df[1]))


It was useful to show how to skip nulls again. Maybe put a null in one of the columns, and show both?

nalimilan · 2017-10-12T15:30:53Z

docs/src/man/querying_frameworks.md

+                   @where i.age > 40
+                   @select {number_of_children=i.children, i.name}
+              end
+Query.EnumerableSelect{NamedTuples._NT_number__of__children_name{Int64,String},Query.EnumerableWhere{NamedTuples._NT_name_age_children{String,Float64,Int64},Query.EnumerableIterable{NamedTuples._NT_name_age_children{String,Float64,Int64},IterableTables.DataFrameIterator{NamedTuples._NT_name_age_children{String,Float64,Int64},Tuple{Array{String,1},Array{Float64,1},Array{Int64,1}}}},##5#7},##6#8}(Query.EnumerableWhere{NamedTuples._NT_name_age_children{String,Float64,Int64},Query.EnumerableIterable{NamedTuples._NT_name_age_children{String,Float64,Int64},IterableTables.DataFrameIterator{NamedTuples._NT_name_age_children{String,Float64,Int64},Tuple{Array{String,1},Array{Float64,1},Array{Int64,1}}}},##5#7}(Query.EnumerableIterable{NamedTuples._NT_name_age_children{String,Float64,Int64},IterableTables.DataFrameIterator{NamedTuples._NT_name_age_children{String,Float64,Int64},Tuple{Array{String,1},Array{Float64,1},Array{Int64,1}}}}(IterableTables.DataFrameIterator{NamedTuples._NT_name_age_children{String,Float64,Int64},Tuple{Array{String,1},Array{Float64,1},Array{Int64,1}}}(3×3 DataFrames.DataFrame


Indeed. Looks like Query should provide a simplified printing for that type! :-)

Would [...] work here to avoid copying the full output?

nalimilan · 2017-10-12T15:36:28Z

docs/src/man/reshaping_and_pivoting.md

+      3: Array{Float64}((150,)) [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5  …  5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1]
+      4: Array{Float64}((150,)) [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1  …  2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8]  Species: DataFrames.RepeatedVector{CategoricalArrays.CategoricalValue{String,UInt32}}
+    parent: CategoricalArrays.CategoricalArray{String,1,UInt32,String,Union{}}
+      refs: Array{UInt32}((150,)) UInt32[0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001  …  0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003, 0x00000003]


Maybe replace everything related to CategoricalArray with [...] if that works.

It won't pass the doctests, but if we just want to truncate the output then that's fine. To be honest I'm not sure I see the value of showing dump in the documentation (unless we add an advanced section to the documentation, since this is very useful but not something I'd show to a newcomer)

Yeah, let's just remove this.

nalimilan · 2017-10-12T15:38:52Z

docs/src/man/split_apply_combine.md

+│ 1   │ setosa     │ 1.462 │ 0.0301592 │
+│ 2   │ versicolor │ 4.26  │ 0.220816  │
+│ 3   │ virginica  │ 5.552 │ 0.304588  │
+
 ```

 A second approach to the Split-Apply-Combine strategy is implemented in the `aggregate` function, which also takes three arguments: (1) a DataFrame, (2) one or more columns to split the DataFrame on, and (3) one or more functions that are used to compute a summary of each subset of the DataFrame. Each function is applied to each column, that was not used to split the DataFrame, creating new columns of the form `$name_$function` e.g. `SepalLength_mean`. Anonymous functions and expressions that do not have a name will be called `λ1`.


Apparently "will be called λ1" is no longer true.

nalimilan · 2017-10-12T15:40:04Z

docs/src/man/split_apply_combine.md

+│ 2   │ versicolor │ 50                 │ 50                │ 50                 │ 50                │
+│ 3   │ virginica  │ 50                 │ 50                │ 50                 │ 50                │
+
+julia> aggregate(iris, :Species, [sum, x->mean(x)])


Without Nulls.skip, it's confusing that you use x -> mean(x) rather than just mean. Maybe keep Nulls.skip even if that's not needed, given that it's a common pattern that's useful to show.

coveralls · 2017-10-18T01:22:05Z

Coverage increased (+0.2%) to 72.722% when pulling a916059 on cjp/doctoberfest into 8fd0851 on master.

nalimilan · 2017-10-18T08:44:18Z

docs/src/man/categorical.md

-julia> df = DataFrame(A = [1, 1, 1, 2, 2, 2],
-                      B = ["X", "X", "X", "Y", "Y", "Y"])
+julia> df = DataFrame(A = ["A", "B", "C", "D", "D", "A"],
+                             B = ["X", "X", "X", "Y", "Y", "Y"])


nalimilan · 2017-10-18T08:46:34Z

docs/src/man/categorical.md

+│ 5   │ D │ Y │
+│ 6   │ A │ Y │
+
+julia> allcols = deepcopy(df); bothcols = deepcopy(df); onecol = deepcopy(df)


It's confusing to do all that setup. Maybe just show first that you can specify a single column, and then call the function without an argument and show that the other column was also converted? Then people can read the docstring to find out more.

coveralls · 2017-10-18T09:41:35Z

Coverage increased (+0.02%) to 72.568% when pulling de766cb on cjp/doctoberfest into 8fd0851 on master.

nalimilan · 2017-11-14T18:55:43Z

Anything left to do here?

cjprybol · 2017-11-14T19:50:03Z

Anything left to do here?

Not anymore!

nalimilan · 2017-11-14T21:24:14Z

Doctests are failing on CI (this is hidden by default, need to unroll the section). I think you'll have to install CSV in the same line that installs Documenter.

cjprybol · 2017-11-14T22:05:48Z

Unsatisfiable requirements detected for package WeakRefStrings 🙁. Not sure if we can do anything about that until we tag the 11.0 release here since CSV, WeakRefStrings, and DataStreams are upstream. Thoughts on merging and cleaning up any remaining issues post version tag? Everything looks good locally, only issues I think we might still have are on the variability of the # of columns omitted during DataFrame printing and the ordering of union printing (Union{Null, T} and Union{T, Null}) being different on 0.6 and 0.7

coveralls · 2017-11-14T22:59:43Z

Coverage remained the same at 72.956% when pulling 84e7da2 on cjp/doctoberfest into 9d0e065 on master.

coveralls · 2017-11-15T04:32:55Z

Coverage remained the same at 72.956% when pulling 95ad4ca on cjp/doctoberfest into 9d0e065 on master.

nalimilan · 2017-11-15T08:55:38Z

Yeah, we need to tag releases in order to be able to run our own examples...

nalimilan reviewed Oct 11, 2017

View reviewed changes

cjprybol force-pushed the cjp/doctoberfest branch from a7136b9 to d0b1612 Compare October 11, 2017 23:11

cjprybol commented Oct 12, 2017

View reviewed changes

nalimilan reviewed Oct 12, 2017

View reviewed changes

nalimilan reviewed Oct 18, 2017

View reviewed changes

cjprybol added this to the 0.11 milestone Oct 26, 2017

cjprybol mentioned this pull request Nov 14, 2017

column names of aggregated DataFrame with anonymous functions #1276

Closed

Use doctests throughout documentation, and add missing functions to docs

202c42c

cjprybol force-pushed the cjp/doctoberfest branch from de766cb to 202c42c Compare November 14, 2017 19:42

Merge branch 'master' into cjp/doctoberfest

84e7da2

Add Pkg.add("CSV") to .travis.yml for doctests

95ad4ca

nalimilan approved these changes Nov 15, 2017

View reviewed changes

nalimilan changed the title ~~WIP: Update documentation~~ Update documentation Nov 15, 2017

nalimilan merged commit 4315cdd into master Nov 15, 2017

nalimilan deleted the cjp/doctoberfest branch November 15, 2017 08:56


		All other columns are assumed to be measured variables (they are stacked).


		```julia
		mean(Nulls.skip(df[1]))

Update documentation #1252

Update documentation #1252

Conversation

cjprybol commented Oct 11, 2017 • edited Loading

coveralls commented Oct 11, 2017

coveralls commented Oct 11, 2017

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjprybol Oct 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjprybol commented Oct 11, 2017

cjprybol commented Oct 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Oct 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Oct 18, 2017 • edited Loading

nalimilan commented Nov 14, 2017

cjprybol commented Nov 14, 2017

nalimilan commented Nov 14, 2017

cjprybol commented Nov 14, 2017

coveralls commented Nov 14, 2017 • edited Loading

coveralls commented Nov 15, 2017 • edited Loading

nalimilan commented Nov 15, 2017

cjprybol commented Oct 11, 2017 •

edited

Loading

cjprybol Oct 11, 2017 •

edited

Loading

coveralls commented Oct 18, 2017 •

edited

Loading

coveralls commented Oct 18, 2017 •

edited

Loading

coveralls commented Nov 14, 2017 •

edited

Loading

coveralls commented Nov 15, 2017 •

edited

Loading