-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behavior of df[1, cols]
and DataFrameRow
#1533
Comments
That definitely makes sense, especially if we decide that iterating over a data frame gives rows (and therefore |
Before I move on can you clarify me the following design issues (those happened before I started contributing):
Probably you see what it is getting at: remove We could even retain |
AFAIK it doesn't support the full interface. For example
I guess that's just an oversight (we currently don't have other
I think it makes sense to differentiate a row from a subset of rows, if only because of the iteration difference I noted above. Having a different type can also allow for more efficient code, just like in |
Thanks. This is what I see in the code. That is why I have said:
i.e. we internally use What is more |
Yes, but it would be weird to have iteration behave differently depending just on a type parameter, wouldn't it? |
OK - let us leave it as is (but we have to be clear, at some point, in the documentation why we differentiate between
or:
which one would you prefer? |
I think I'd prefer version 1, but as I said if it turns out its inconvenient we could change our strategy. |
I like option 1 because #1449 gets us on the road to doing
Returning a DataFrame loses that progress. I really like this |
Another option similar to 1 would be to have Something to also take into account is that it would be easier to start with |
It would be nice for
and
interchangeably. |
@nalimilan I do not feel However, I do not feel that Here is a summary what I think we should do:
When we settle on this I will update #1534. |
I agree with all of these except (as expected) the two ones you highlighted:
Returning a vector doesn't sound appropriate, as we would completely lose the column names which are an essential property defining a row. Either a I agree that
AFAICT this is not possible unless we find a solution for iteration, right? |
I will lay out below what I think to let us consciously decide as this is a delicate thing and I guess we all want DataFrames 1.0 release to be stable in this area. Case of
|
Return type | Mutable | Copying | 1D | Breaking change |
---|---|---|---|---|
DataFrame |
yes | yes | no | no (current behavior) |
DataFrameRow |
yes | no | yes | mildly |
NamedTuple |
no | yes | yes | yes (code that currently works might break, because it is immutable) |
NamedArray |
yes | yes | yes | yes (introduces a new dependency) |
and every option has some disadvantage.
- Based on this I lean towards
DataFrame
because it is not-breaking (this is what we currently have - although there is a duplication of functionality betweendf[[1], :]
anddf[1, :]
). - I am OK with
DataFrameRow
as a second option as it is only mildly breaking and users are aware of this type in DataFrames.jl ecosystem (although you will be able to get it viaview(df, 1, cols)
so we would have a duplication of functionality). Actually we could makeDataFrameRow
be a wrapper around a one-rowDataFrame
like thisdf[1,:]
would beview(df[[1], :], 1, :)
to make it not-influencedf
on mutation but I am not sure if it would give any advantage over simply returning aDataFrame
; - Returning
NamedTuple
can break existing code in unexpected places and introduces a new type for a user to think about (and already user has to handleDataFrame
,SubDataFrame
andDataFrameRow
which is a lot). - Finally
NamedArray
creates a significant dependency.
Case @view df[row, cols]
I understand you support DataFrameRow
here - right? I am OK with this choice (my only objection is that we have one more type for the user to learn that is why I put it on the table).
Good summary. Though note that returning a
Right. I'm also open to investigating merging |
Regarding merging |
I'd return either a |
OK - you have convinced me for
Let us wait if we get any comments and then I will update the PR accordingly. |
OK - moving next step further. If |
Oh man. The difference might come down to the fact that a |
@nalimilan I assume you know why I am asking 😄 and |
note that you can't copy a
|
See #1536. I think it's fine for |
OK - so I will start updating #1534 following what was discussed here. |
One issue with Though I'm not sure the use case for
|
Can I request, if it is relavant here, that DataFrameRow should support "haskey", as now it doesn't? |
@LeoK987 - can you open a separate issue for this so that we do not lose track of this request and can discuss it? |
@pdeffebach I think it is worth a PR to Julia Base, as indexing by a vector works for (I will wait a bit with #1534 as it seems we still get some feedback) |
Closing as |
@nalimilan This is speculative, but I want to discuss it before we finish
getindex
cleanup.DataFrameRow
structure (there were some discussion of returningNamedTuple
fromeachrow
but it would not allow for mutation);getindex[1, :]
returns aDataFrame
but maybe it should return aDataFrameRow
? Also then maybeDataFrameRow
could accept only a selection of columns to makegetindex[1, columns]
work consistently and returnDataFrameRow
(this would be breaking and I am not 100% convinced we want it)?DataFrame
asDataFrameRow
directly (something likegetrow(df, 1)
)?The text was updated successfully, but these errors were encountered: