WIP: clean up getindex: part 2 #1534

bkamins · 2018-09-24T21:39:49Z

This PR was hard for me. Therefore I split it into implementation (pushing now) and tests (which I will add when we decide this is the functionality we want).

It implements discussion in #1190, #142 and #1533 and some minor clean-ups.

A particular difficulty was what to deprecate and what to change. @nalimilan - an initial review is appreciated (I will work on it further if we decide this is a right direction).

nalimilan

Thanks. Yes, this sounds like a step in the right direction. My main interrogation is about the nature of SubDataFrame (see comment inline).

nalimilan · 2018-09-25T12:27:41Z

src/dataframe/dataframe.jl

    new_columns = Any[dv[row_inds] for dv in columns(df)]
    return DataFrame(new_columns, copy(index(df)))
 end

 # df[:, :] => DataFrame
 function Base.getindex(df::DataFrame, ::Colon, ::Colon)
-    Base.depwarn("indexing with colon as row will create a copy of rows in the future", :getindex)
+    Base.depwarn("indexing with colon as row will create a copy in the future" *


I guess you mean a copy of column vectors?

nalimilan · 2018-09-25T12:28:11Z

src/subdataframe/subdataframe.jl

 function SubDataFrame(parent::DataFrame, row::Integer)
+    Base.depwarn("Selecting a single row from a `DataFrame` will return `DataFrameRow` in future.", :getindex)


SubDataFrame

nalimilan · 2018-09-25T12:28:42Z

src/subdataframe/subdataframe.jl

    return SubDataFrame(parent, [Int(row)])
 end

 function SubDataFrame(parent::DataFrame, rows::AbstractVector{<:Integer})
+    if any(isa.(rows, Bool))


any(x -> x isa Bool, rows) would avoid an allocation.

Below and elsewhere, better use backticks consistently around types.

nalimilan · 2018-09-25T12:31:19Z

src/subdataframe/subdataframe.jl

    return SubDataFrame(parent, convert(Vector{Int}, rows))
 end

 function SubDataFrame(parent::DataFrame, rows::AbstractVector{Bool})
+    if length(rows) != nrow(parent)
+        throw(ArgumentError("invalid length of Vector{Bool} row index"))


It would be nice to print "(got $m, expected $n)". Also, the object is not necessarily a Vector{Bool}.

nalimilan · 2018-09-25T12:37:33Z

src/subdataframe/subdataframe.jl

-    any(ismissing, rowinds) && throw(MissingException("missing values are not allowed in indices"))
-    return SubDataFrame(adf, convert(Vector{Missings.T(T)}, rowinds))
+function Base.view(adf::AbstractDataFrame, rowinds)
+    Base.depwarn("view(adf, x) will create a `SubDataFrame` containing columns `x` and all rows in the future.", :view)


Better give the replacement syntax. Same for other deprecations.

nalimilan · 2018-09-25T12:46:28Z

src/subdataframe/subdataframe.jl

-function Base.view(adf::AbstractDataFrame, rowinds::Any)
-    return SubDataFrame(adf, rowinds)
+function Base.view(adf::AbstractDataFrame, rowinds, colinds)
+    return SubDataFrame(adf[colinds], rowinds)


Something I realize looking at this is that SubDataFrame is a view on rows of a data frame, but not on columns. So while view(adf, 1, :) will reflect changes due to e.g. rename!(adf, col => newcol), view(df, 1, cols) won't (since adf[colinds] creates a new parent data frame). The difference is subtle: we get a view of column vectors, but not a view of the parent data frame.

I wonder whether it would be more consistent to add a cols fields to SubDataFrame so that it's a real view on both rows and columns. That also depends on whether it would have a significant performance hit or not.

I was thinking about it when I asked the same about DataFrameRow earlier. I came to the conclusion that it is impossible to consistently handle columns. The reason is that we allow dual column indexing: by their numbers and by their names. Later we can rename columns or move them breaking one type of indexing or the other and if we break number-name mapping in the parent it is not clear what we should do in the view.

Finally SubDataFrame in general is supposed to be only used when original DataFrame is not mutated in terms of rows/columns (e.g. removing rows in source DataFrame can break SubDataFrame) - this is something we should clearly indicate in the documentation.

I was thinking about it when I asked the same about DataFrameRow earlier. I came to the conclusion that it is impossible to consistently handle columns. The reason is that we allow dual column indexing: by their numbers and by their names. Later we can rename columns or move them breaking one type of indexing or the other and if we break number-name mapping in the parent it is not clear what we should do in the view.

I'm not sure that's really an issue. Anyway as you note one isn't supposed to change the structure of the parent. What bothers me more is the inconsistency between passing : for columns and passing a vector. Though in practice it shouldn't really matter.

To sum up: we should leave not creating a copy when passing : for performance reasons. I will add a note in the docs when parent is copied and when not.

In fact - this is consistent with the thinking that DataFrame is a collection of rows.

To sum up: we should leave not creating a copy when passing : for performance reasons.

Yes, I agree, what I meant is that if SubDataFrame also stored the indices of a subset of columns we wouldn't need to make a copy when not passing : either. I'm not saying we should definitely do this, even less than it should be done in this PR, but maybe something to keep in mind.

nalimilan · 2018-09-25T12:48:30Z

src/subdataframe/subdataframe.jl

    return SubDataFrame(adf[[colind]], rowinds)
 end

+function Base.view(adf::AbstractDataFrame, rowinds, colind::Bool)
+    throw(ArgumentError("invalid column index of type Bool"))


Also print the value?

nalimilan · 2018-09-25T12:49:32Z

src/subdataframe/subdataframe.jl

@@ -132,11 +148,13 @@ nrow(sdf::SubDataFrame) = ncol(sdf) > 0 ? length(rows(sdf))::Int : 0
 ncol(sdf::SubDataFrame) = length(index(sdf))

 function Base.getindex(sdf::SubDataFrame, colinds)
-    return parent(sdf)[rows(sdf), colinds]
+    return view(parent(sdf), rows(sdf), colinds)


Does't this need a deprecation too?

bkamins · 2018-09-29T16:55:41Z

Before implementing, based on #1533 and trying to be consistent with Base behavior I have added the documentation of target getindex/view behavior to this PR before moving forward.

This is complex and subtly breaking in several places (I would recommend that during a review each proposal is verified against current behavior - at least this is what I had to do and sometimes were surprised by the result).

nalimilan

Thanks, that's really an impressive list!

It should probably go to a separate page as it's really distinct from docstrings. It would also be great if you could summarize in a few sentences the main rules that we follow (in particular when does copying happens).

docs/src/lib/functions.md

nalimilan · 2018-09-30T16:24:21Z

docs/src/lib/functions.md

+* `df[rows, cols]` -> a `DataFrame` containing columns `cols` and `df[rows, col]` as a vector in each `col` in `cols`.
+* `@view df[col]` -> an alias of a vector contained in column `col`;
+* `@view df[cols]` -> a `SubDataFrame` with parent `df` if `cols` is a colon and `df[cols]` otherwise;


Maybe the special treatment of colon could be problematic, and we should also use df[:] as the parent in that case? Same in similar cases.

Retaining df will be faster and I guess it will be important for groupby when you have a lot of small groups.

This is a case that currently was not on the table as @view df[x] treats x as rows, but it is going to be changed to ensure consistency.

As long as df is not mutated this should not matter in practice.

"As long as df is not mutated" is what bothers me. :-)

Why do you expect view(df, :) to be used in groupby? Presumably if the overhead of calling df[cols] is OK when cols != :, it's also OK when cols == :? Or is the latter more common than the former?

docs/src/lib/functions.md

nalimilan · 2018-09-30T16:28:29Z

docs/src/lib/functions.md

+`SubDataFrame`:
+* `sdf[col]` -> a view of vector contained in column `col` of `parent(sdf)` with `DataFrames.rows(sdf)` as selector;
+* `sdf[cols]` -> a freshly allocated `DataFrame` containing copies of vectors contained in columns `cols` of `sdf`;


This seems inconsistent with sdf[col] and df[cols], which don't make copies. Better return a SubDataFrame.

I was not sure what is best here myself.

In Base such an operation destroys a view. Also some code in DataFrames.jl relies on the fact that sdf[cols] returns a DataFrame if I remember correctly.

I will leave as is in the next iteration, but let us discuss it further.

But in Base x[inds] always makes a copy, so it's consistent that it's also the case for views. For DataFrame, df[cols] doesn't copy the column vectors, so it would be consistent to do that for SubDataFrame. The goal is that for consistency if you mutate the result, the original AbstractDataFrame is changed in both cases.

OK, I have rewritten it to:

a SubDataFrame, with parent parent(sdf) if cols is a colon and parent(sdf)[cols] otherwise

nalimilan · 2018-09-30T16:35:42Z

docs/src/lib/functions.md

+* `@view df[cols]` -> a `SubDataFrame` with parent `df` if `cols` is a colon and `df[cols]` otherwise;
+* `@view df[row, col]` -> translates to `view(df[col], row)` (a `0`-dimensional view into `df[col]`);
+* `@view df[row, cols]` -> a `DataFrameRow` with parent `df` if `cols` is a colon and `df[cols]` otherwise;


So this is going to require creating a new DataFrame for each row? That sounds really bad. Maybe we could change DataFrameRow to also take a subset of rows? That's really a corner case though, so it could be fixed later.

We have the same situation with SubDataFrame as current SubDataFrame and DaraFrameRow were designed with the assumption that only row-indexing is performed.

I would leave it for a separate PR (we will have to handle colon and subset of columns separately and support a mapping from column numbers of a view to column numbers of a parent in the latter). In the worst case user should first create a parent that has correct columns and then run a view on all of them. I will open an issue for this to keep track of it.

bkamins · 2018-10-06T19:07:09Z

I have reviewed the comments. The only problematic thing is what sdf[cols] should produce.

nalimilan · 2018-10-07T09:51:49Z

docs/src/lib/indexing.md

+* `df[cols]` -> a freshly allocated `DataFrame` containing the vectors contained in columns `cols`;
+* `df[row, col]` -> the value contained in row `row` of column `col`, the same as `df[col][row]`;
+* `df[row, cols]` -> a `NamedTuple` containing data from row `row` in columns `cols`;
+* `df[rows, col]` -> a copy of the vector `df[col]` with only the entries corresponding to `rows` selected, the same as df[col][rows];


Missing backticks.

nalimilan · 2018-10-07T09:56:41Z

docs/src/lib/functions.md

+* `df[rows, cols]` -> a `DataFrame` containing columns `cols` and `df[rows, col]` as a vector in each `col` in `cols`.
+* `@view df[col]` -> an alias of a vector contained in column `col`;
+* `@view df[cols]` -> a `SubDataFrame` with parent `df` if `cols` is a colon and `df[cols]` otherwise;


"As long as df is not mutated" is what bothers me. :-)

Why do you expect view(df, :) to be used in groupby? Presumably if the overhead of calling df[cols] is OK when cols != :, it's also OK when cols == :? Or is the latter more common than the former?

bkamins · 2018-10-07T13:31:43Z

Why do you expect view(df, :) to be used in groupby? Presumably if the overhead of calling df[cols] is OK when cols != :, it's also OK when cols == :? Or is the latter more common than the former?

Currently we have:

Base.getindex(gd::GroupedDataFrame, idx::Int) =
    view(gd.parent, gd.idx[gd.starts[idx]:gd.ends[idx]])

which essentially will have to be rewritten as view(gd.parent, gd.idx[gd.starts[idx]:gd.ends[idx]], :) given this PR (after the deprecation period).

So, in short, because we rewrite view(df, i) to mean view(df, i, :) after this PR the form with a colon will be most common in users code. In the current code view(df, i) is a very popular pattern (I would say the only that is encountered in practice given current design).

Also note that currently view(df, r, c) always makes a copy of df while view(df, r) never makes a copy.

nalimilan · 2018-10-07T14:42:14Z

OK, I see. The best fix is really to make SubDataFrame store column indices (#1557), or to make it really cheap to construct a new DataFrame. Until then, better keep view(df, rows, :) efficient I guess.

nalimilan

Hm, I had forgotten about my other comments regarding the implementation.

bkamins · 2018-10-07T14:51:54Z

The good thing is that we will go through a deprecation period when this would not matter much 😄, and in the mean time I will try to make #1557 efficient.

bkamins · 2018-10-07T14:54:49Z

Actually there is a lot to improve in the implementation, apart from the earlier comments, once we have decided on the functionality.

Anyway this PR will be mostly deprecations given what we have discussed.

nalimilan · 2018-10-08T12:28:26Z

src/dataframe/dataframe.jl

-# df[SingleColumnIndex] => AbstractDataVector
+const ColumnIndex = Union{Integer, Symbol}
+
+# df[SingleColumnIndex] => AbstractVector, alias


Could be even more precise than "alias": the vector itself is returned, not a view of it.

nalimilan · 2018-10-08T19:21:47Z

src/dataframe/dataframe.jl

+
+# df[SingleRowIndex, :] => DataFrame
+function Base.getindex(df::DataFrame, row_ind::Integer, ::Colon)
+    Base.depwarn("Selecting a single row from a `DataFrame` will return a `NamedTupe` in future.", :getindex)


"Tuple". Also "in the future".

nalimilan · 2018-10-08T19:26:28Z

src/dataframe/dataframe.jl

    return DataFrame(new_columns, Index(_names(df)[selected_columns]))
 end

-# df[MultiRowIndex, SingleColumnIndex] => AbstractVector
+# df[SingleRowIndex, :] => DataFrame
+function Base.getindex(df::DataFrame, row_ind::Bool, ::Colon)


For readability, wouldn't it be better to do row_ind isa Bool inside the method below? Same below in a similar case.

nalimilan · 2018-10-08T19:33:58Z

src/dataframe/dataframe.jl

+# df[SingleRowIndex, :] => DataFrame
+function Base.getindex(df::DataFrame, row_ind::Integer, ::Colon)
+    Base.depwarn("Selecting a single row from a `DataFrame` will return a `NamedTupe` in future.", :getindex)
+    new_columns = AbstractVector[[dv[[row_ind]]] for dv in columns(df)]


Aren't there too many brackets here?

src/dataframerow/dataframerow.jl

nalimilan · 2018-10-08T20:01:51Z

src/other/index.jl

-    if length(idxs) != length(idx)
-        throw(ArgumentError("missing values are not allowed for column indexing"))
-    end
+    idxs = disallowmissing(idx)


Is this call really useful now?

nalimilan · 2018-10-08T20:04:51Z

src/subdataframe/subdataframe.jl

-function SubDataFrame(sdf::SubDataFrame, rowinds::Colon)
-    return sdf
+SubDataFrame(parent::DataFrame, rows) =
+    SubDataFrame(parent, (1:nrow(parent))[rows])


What's this trick?

The idea is that:

we are sure that we get Vector{Int} as a result;

(1:nrow(parent))[rows] will use machinery from Base to catch any problematic values in rows, OTOH anything that is valid in Base will be valid here.

If not for performance and error clarity reasons this could be the only definition we use. I leave it because it is future proof - if anything outside from the definitions in other methods gets allowed in Base for indexing we will allow it.

But I can drop it as it is not strictly needed for now.

We don't use that approach elsewhere, do we? I'd rather be consistent. Also the error thrown will be probably be unclear for users.

nalimilan · 2018-10-08T20:10:54Z

src/subdataframe/subdataframe.jl


-function Base.view(adf::AbstractDataFrame, rowinds::Any)
+function Base.view(adf::AbstractDataFrame, rowinds)
+    Base.depwarn("`view(adf, x)` will be translated to `view(adf, x, :)` in the future.", :view)


Better tell people that it's the new syntax they should use, that's clearer.

nalimilan · 2018-10-08T20:15:28Z

src/subdataframe/subdataframe.jl

+end
+
+function Base.view(adf::AbstractDataFrame, rowind::Integer, colinds)
+    Base.depwarn("`view(adf, x)` will create a `DataFrameRow` in the future." *


Deprecation doesn't match the called method.

src/subdataframe/subdataframe.jl

bkamins · 2018-10-09T11:53:28Z

OK. I tried to clean up everything (thank you for the patience - this PR is complex unfortunately).

We are left with a NamedTuple debate. But as you have said - we will see how it goes. A user always can get a DataFrame by selecting [row] instead of row. The upside is that after this it gets type stable. The downside is immutability.

But I guess in #1533 we investigated all the options and this one had least downsides. How well this goes also hinges on JuliaLang/julia#29417 decision, but I guess it did not get a lot of attention there.

bkamins · 2018-10-09T12:11:40Z

Just to add (is it is better to be sure what we are doing) - I would be also OK, if df[row, columns] produced a DataFrame not a NamedTuple but I guess the majority thinking is that we should drop a dimension if we select a single row only (it has an upside that we have a convenient way to create a NamedTuple from a row in a DataFrame which could be convenient for users).

bkamins · 2018-10-10T22:16:52Z

OK - I dug through getting this PR green on CI.
Probably it is worth to have another look at it.

Especially as I came to the conclusion that:

it is very cheap to create DataFrameRow;
I am not sure I know how to create NamedTuple from a row of DataFrame cheap - @nalimilan have you experimented with that?

bkamins · 2018-10-12T19:48:19Z

Hopefully I have cleaned up all depwarn messages.

ararslan · 2018-10-12T21:52:11Z

src/dataframe/dataframe.jl

    return DataFrame(new_columns, Index(_names(df)[selected_columns]))
 end

 # df[:, SingleColumnIndex] => AbstractVector
 # df[:, MultiColumnIndex] => DataFrame
 function Base.getindex(df::DataFrame, row_ind::Colon, col_inds)
-    Base.depwarn("indexing with colon as row will create a copy in the future" *
-                 " use df[col_inds] to get the columns without copying", :getindex)
+    Base.depwarn("Sndexing with colon as row will create a copy in the future. " *


I love Sndexing into data frames

This is a new operation I am working on. A combination of "Selecting" and "Indexing" in one go with a bit more of the latter than the former.

nalimilan · 2018-10-13T10:02:24Z

I would say it's quite simple, just iterate over columns and extract the requested entry. It should be as fast as we can hope -- which unfortunately should be relatively slow, but not slower than iterating over all entries in a DataFrameRow. The main question is whether the compiler will be able to avoid actually accessing all columns when only some of them are used. I guess we should do some benchmarking before switching.

I've played a bit with creating tuples for each row and it appears that even when passing a tuple of N columns to a function, the compiler isn't able to avoid the creating of a N-tuple for each row when we use only a single row (see this gist). Of course the same applies to named tuples. And inside a function taking a data frame, the type instability makes this even harder for the compiler.

In the end DataFrameRow is much faster than Tuple (I didn't even test NamedTuple) when naively iterating over rows: see this gist. So maybe we should make an exception an have getindex return a DataFrameRow. Even if that's inconsistent with AbstractArray, in practice it shouldn't matter much.

@quinnj I guess you've considered these issues a lot when designing Tables.jl?

bkamins · 2018-10-13T10:11:03Z

That is exactly what my experience was (I used something similar to #1564 to test it against NamedTuples). Also - as I note in #1564 there - is an issue that tuples you produce in your gist are covariant (a better situation) while named tuples as planned are invariant which makes things even worse.

nalimilan · 2018-10-13T11:02:09Z

Why worse? Can it actually be worse than that anyway? :-)

bkamins · 2018-10-25T09:04:14Z

@nalimilan Is there a conclusion on the NamedTuple case and how does it affect this PR? (should we keep it or change NamedTuple to DataFrameRow?)

nalimilan · 2018-10-26T16:39:36Z

I'd say use DataFrameRow for now instead of NamedTuple.

bkamins · 2018-10-26T18:03:13Z

Good. One last thing before I clean it up.

As DataFrameRow is a kind of view are we sure that we want df[1, cols] return DataFrameRow or should it be DataFrame (this is current behavior).

I am OK with both (given the discussion we had before both alternatives have pros and cons) - but prefer to double check before moving forward.

nalimilan · 2018-10-26T18:07:07Z

I'd go with DataFrameRow, as returning a DataFrame is really slow (probably even slower than NamedTuple).

pdeffebach · 2018-10-27T00:06:53Z

Is it too late to just make our own two types <: AbstractVector? That way we can make sure that DataFrameRow (attached to a dataframe) behaves exactly like DataFrameRow (not attached to a dataframe).

bkamins · 2018-10-27T05:46:09Z

@pdeffebach Can you explain what you mean in a bit more detail please because I am not sure what you mean (in general I guess it is not too late for changes - we try to clean up DataFrames.jl as much as possible before 1.0 release).

pdeffebach · 2018-10-27T13:14:35Z

Something like AttachedDataFrameRow and DetachedDataFrameRow (without the awkward names)

struct AttachedDataFrameRow <: AbstractVector # or similar
    parent::DataFrame
    row::Int
end

Base.getindex(... work with integer using parent Index)
Base.getindex(... work with symbol using parent Index)
function row(dfrow::AttachedDataFrameRow) ...
# Define array interface

Then there is DetachedDataFrameRow that tries to keep the exact same API

struct DetachedDataFrameRow <: AbstractVector # or similar
    data::Vector
    colindex::Index
    row::Int
end

function DetachedDataFrameRow(df::DataFrame, row::Int)
    data = [df[row, name] for name in names(df)]
    DetachedDataFrameRow(data, getfield(df, :colindex), row)
end

Base.copy(dfrow::AttachedDataFrameRow) = DetachedDataFrameRow(parent(dfrow), row(dfrow))

Base.getindex(... work with integer)
Base.getindex(... work with symbol)
function row(dfrow::DetachedDataFrameRow) ...
# Define array interface

I like this plan because it means we don't have to worry about losing behavior when a DataFrameRow-like object is attached verses a copy, even down to both using a Index object. We also don't have to make compromises: we can index with mutiple symbols dfrow[:x1, :x2, :x3] (benefit of DataFrame), yet still have mean(dfrow) work (benefit of NamedTuple.

On the other hand, I don't like this plan because its one more type we have to export and maintain.

nalimilan · 2018-10-27T14:06:23Z

The problem is that creating a new Vector to hold the data is slow, so DetachedDataFrameRow wouldn't really be usable as a replacement for DataFrameRow.

pdeffebach · 2018-10-27T14:26:20Z

Thanks for the feedback.

If that's the case, then relatively large change in API (DataFrame on copy vs DataFrameRow attached) is preferred, so people are disincentived to copy it too much.

I must admit I don't really know the use-case of copying it. Seems like it will be pretty rare. However DataFrame is suboptimal to me because it might make the switch to mean(dfrow) more difficult in the future. It makes sense with the current DataFrameRow behavior because both that and a DataFrame are quite Dict-like.

EDIT: I think i misinterpreted the above conversation. returning NamedTuple on copy seems great.

bkamins · 2018-10-27T16:16:01Z

I have improved the documentation to explain design considerations.

nalimilan · 2018-10-28T09:56:33Z

Thanks. It this good to go then?

bkamins · 2018-10-28T15:26:41Z

I would say is is good to go. Also then I think that #1571 should be rebased against it and try to be consistent with target functionality.

nalimilan · 2018-10-29T09:51:59Z

Great work!

nalimilan · 2018-11-01T20:02:13Z

There are lots of deprecation warnings when running tests. Some seem to be potentially problematic for users. For example:

┌ Warning: `sdf[colind]` will create a view of `parent(sdf)[colind]` in the future. Use sdf[:, [colind]]` to get a `DataFrame`.
│   caller = isequal(::SubDataFrame{Array{Int64,1}}, ::DataFrame) at abstractdataframe.jl:283
└ @ DataFrames ~/.julia/dev/DataFrames/src/abstractdataframe/abstractdataframe.jl:283

This seems to happen when calling isequal with a SubDataFrame. Maybe we should provide a temporary isequal method in deprecated.jl until we remove the deprecation? I guess this applies to other functions.

bkamins · 2018-11-01T21:56:28Z

That was the issue I was afraid of when asking about cleaning up deprecations process.
I will go through the tests of DataFrames.jl, try to clean up and submit a PR today.

nalimilan reviewed Sep 25, 2018

View reviewed changes

bkamins mentioned this pull request Sep 25, 2018

Behavior of df[1, cols] and DataFrameRow #1533

Closed

bkamins force-pushed the getindex_round_2 branch from 685a4ba to fc5b9f3 Compare September 29, 2018 16:52

nalimilan reviewed Sep 30, 2018

View reviewed changes

bkamins mentioned this pull request Oct 7, 2018

Support for column selection in views #1557

Closed

nalimilan reviewed Oct 7, 2018

View reviewed changes

nalimilan approved these changes Oct 7, 2018

View reviewed changes

nalimilan requested changes Oct 7, 2018

View reviewed changes

bkamins force-pushed the getindex_round_2 branch 2 times, most recently from dbcab70 to 7f25eb9 Compare October 8, 2018 13:09

nalimilan reviewed Oct 8, 2018

View reviewed changes

bkamins force-pushed the getindex_round_2 branch 2 times, most recently from fb0d604 to 17a6543 Compare October 10, 2018 19:28

bkamins added 6 commits October 10, 2018 23:43

revised getindex

0bd7484

another round of reviews

0f34cdd

minor fixes

6eda861

some more minor fixes

2f8fb98

fix some more failing tests

aa9e250

use correct view semantics in codebase

8ff49d4

bkamins force-pushed the getindex_round_2 branch from 83dafd9 to 8ff49d4 Compare October 10, 2018 21:45

bkamins mentioned this pull request Oct 11, 2018

Add dimensions to HTML output #1563

Closed

improved deprecation warnings

bdbd357

bkamins mentioned this pull request Oct 12, 2018

Improve copy of SubDataFrame and DataFrameRow #1564

Merged

ararslan reviewed Oct 12, 2018

View reviewed changes

fix Sndexing

6a3ba6f

drop returning a NamedTuple from getindex

25ab1a0

improve docs

353c59a

bkamins force-pushed the getindex_round_2 branch from 196f322 to 353c59a Compare October 27, 2018 16:17

nalimilan mentioned this pull request Oct 28, 2018

Extend setindex api #1571

Closed

nalimilan approved these changes Oct 29, 2018

View reviewed changes

nalimilan merged commit ca84253 into JuliaData:master Oct 29, 2018

bkamins deleted the getindex_round_2 branch October 29, 2018 09:53

		function SubDataFrame(parent::DataFrame, row::Integer)
		Base.depwarn("Selecting a single row from a `DataFrame` will return `DataFrameRow` in future.", :getindex)

WIP: clean up getindex: part 2 #1534

WIP: clean up getindex: part 2 #1534

Conversation

bkamins commented Sep 24, 2018

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented Sep 29, 2018

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan Oct 7, 2018 • edited Loading

Choose a reason for hiding this comment

bkamins commented Oct 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented Oct 7, 2018

nalimilan commented Oct 7, 2018

nalimilan left a comment

Choose a reason for hiding this comment

bkamins commented Oct 7, 2018

bkamins commented Oct 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented Oct 9, 2018

bkamins commented Oct 9, 2018

bkamins commented Oct 10, 2018

bkamins commented Oct 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Oct 13, 2018

bkamins commented Oct 13, 2018

nalimilan commented Oct 13, 2018

bkamins commented Oct 25, 2018

nalimilan commented Oct 26, 2018

bkamins commented Oct 26, 2018

nalimilan commented Oct 26, 2018

pdeffebach commented Oct 27, 2018

bkamins commented Oct 27, 2018

pdeffebach commented Oct 27, 2018 • edited Loading

nalimilan commented Oct 27, 2018

pdeffebach commented Oct 27, 2018 • edited Loading

bkamins commented Oct 27, 2018

nalimilan commented Oct 28, 2018

bkamins commented Oct 28, 2018

nalimilan commented Oct 29, 2018

nalimilan commented Nov 1, 2018

bkamins commented Nov 1, 2018

nalimilan Oct 7, 2018 •

edited

Loading

pdeffebach commented Oct 27, 2018 •

edited

Loading

pdeffebach commented Oct 27, 2018 •

edited

Loading