Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support adding columns to views #2794

Merged
merged 32 commits into from
Sep 1, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
289fadf
add setindex! rules
bkamins Jun 11, 2021
a253236
implement setindex! and broadcasting assignment
bkamins Jun 20, 2021
e33c605
implement insertcols!
bkamins Jun 21, 2021
7f1814a
add NEWS.md entry
bkamins Jun 21, 2021
4a22a1d
Apply suggestions from code review
bkamins Jun 27, 2021
b721a46
changes after code review part 2
bkamins Jun 27, 2021
412b89c
docs update
bkamins Jun 27, 2021
b95f07a
setindex! for ! and setproperty
bkamins Jun 27, 2021
af54c29
Merge branch 'main' into bk/view_add_column
bkamins Aug 6, 2021
dc0d241
fix NEWS.md
bkamins Aug 6, 2021
9f02571
another NEWS.md fix
bkamins Aug 6, 2021
121bb54
another small NEWS.md change
bkamins Aug 6, 2021
1a83b61
finished tests for df[!, col] assignment and broadcasted assignment
bkamins Aug 7, 2021
7d5a65b
some more tests
bkamins Aug 7, 2021
f614e58
done tests of ! assignment and broadcasting assignment
bkamins Aug 7, 2021
e886dc1
finished assignment, broadcasted assignment and insertcols!
bkamins Aug 7, 2021
59afd61
fix tests
bkamins Aug 8, 2021
700e65d
fix tests on Julia 1.7
bkamins Aug 8, 2021
50d9f8b
one more test fix
bkamins Aug 8, 2021
6045034
finalize all required changes
bkamins Aug 8, 2021
1ae0534
fix 1.7 broadcasting
bkamins Aug 8, 2021
971c282
Apply suggestions from code review
bkamins Aug 25, 2021
c4cb1ae
apply suggestions after code review
bkamins Aug 25, 2021
cadc128
fix fast path is select!/transform!
bkamins Aug 25, 2021
bdbf09a
Merge branch 'main' into bk/view_add_column
bkamins Aug 28, 2021
4ad940c
Apply suggestions from code review
bkamins Aug 29, 2021
8f134ef
changes after code review and promote_type decision
bkamins Aug 29, 2021
1f4aa78
fix tests
bkamins Aug 29, 2021
55a6d75
Apply suggestions from code review
bkamins Aug 29, 2021
93de064
apply changes after code review
bkamins Aug 29, 2021
3e5d8d8
Merge branch 'main' into bk/view_add_column
bkamins Sep 1, 2021
f25d333
Update NEWS.md
bkamins Sep 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@
* correctly handle selectors of the form `:col => AsTable` and `:col => cols`
by expanding a single column into multiple columns
([#2780](https://github.com/JuliaData/DataFrames.jl/pull/2780))
* if `sdf` is a `SubDataFrame` created with `:` as a column selector then
`insertcols!`, `sdf[:, col] = v`, and `sdf[:, col] .= v` where `col` is
a column not present in `sdf` is allowed and it creates a new column in
`parent(sdf)` with `missing` values stored in rows that are filtered-out
in `sdf`.
([XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX))

## Bug fixes

Expand Down
8 changes: 7 additions & 1 deletion docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,9 @@ so it is unsafe to use it afterwards (the column length correctness will be pres
* `sdf[CartesianIndex(row, col)] = v` -> the same as `sdf[row, col] = v`;
* `sdf[row, cols] = v` -> the same as `dfr = df[row, cols]; dfr[:] = v` in-place;
* `sdf[rows, col] = v` -> set rows `rows` of column `col`, in-place; `v` must be an abstract vector;
if `sdf` was created with `:` as column selector, `rows` is `:` and `col` is a `Symbol` or `AbstractString`
that is not present in `df` then a new column in `df` is created and holds `v` in rows selected in `sdf`
and `missing` in all rows present in `parent(sdf)` but not present in `sdf`.
* `sdf[rows, cols] = v` -> set rows `rows` of columns `cols` in-place;
`v` can be an `AbstractMatrix` or `v` can be `AbstractDataFrame` when column names must match;

Expand Down Expand Up @@ -171,7 +174,6 @@ The following broadcasting rules apply to `AbstractDataFrame` objects:
Note that if broadcasting assignment operation throws an error the target data frame may be partially changed
so it is unsafe to use it afterwards (the column length correctness will be preserved).


Broadcasting `DataFrameRow` is currently not allowed (which is consistent with `NamedTuple`).

It is possible to assign a value to `AbstractDataFrame` and `DataFrameRow` objects using the `.=` operator.
Expand All @@ -190,6 +192,10 @@ Additional rules:
`df` is performed in-place; if `rows` is `:` and `col` is `Symbol` or `AbstractString`
and it is missing from `df` then a new column is allocated and added;
the length of the column is always the value of `nrow(df)` before the assignment takes place;
* in the `sdf[:, col] .= v` if `sdf` was created with `:` as column selector
and `col` is a `Symbol` or `AbstractString` that is not present in `df` then a new column in `df`
is created and holds contents of `v` broadcasted onto rows selected in `sdf`
and `missing` in all rows present in `parent(sdf)` but not present in `sdf`.
* in the `df[!, col] .= v` syntax column `col` is replaced by a freshly allocated vector;
if `col` is `Symbol` or `AbstractString` and it is missing from `df` then a new column is allocated added;
the length of the column is always the value of `nrow(df)` before the assignment takes place;
Expand Down
206 changes: 206 additions & 0 deletions src/abstractdataframe/abstractdataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2173,3 +2173,209 @@ Base.getindex(::AbstractDataFrame, ::Union{Symbol, Integer, AbstractString}) =

Base.setindex!(::AbstractDataFrame, ::Any, ::Union{Symbol, Integer, AbstractString}) =
throw(ArgumentError("syntax df[column] is not supported use df[!, column] instead"))

# insertcols!

"""
insertcols!(df::AbstractDataFrame[, col], (name=>val)::Pair...;
makeunique::Bool=false, copycols::Bool=true)

Insert a column into a data frame in place. Return the updated data frame.
If `col` is omitted it is set to `ncol(df)+1`
(the column is inserted as the last column).

# Arguments
- `df` : the data frame to which we want to add columns
- `col` : a position at which we want to insert a column, passed as an integer
or a column name (a string or a `Symbol`); the column selected with `col`
and columns following it are shifted to the right in `df` after the operation
- `name` : the name of the new column
- `val` : an `AbstractVector` giving the contents of the new column or a value of any
type other than `AbstractArray` which will be repeated to fill a new vector;
As a particular rule a values stored in a `Ref` or a `0`-dimensional `AbstractArray`
are unwrapped and treated in the same way.
- `makeunique` : Defines what to do if `name` already exists in `df`;
if it is `false` an error will be thrown; if it is `true` a new unique name will
be generated by adding a suffix
- `copycols` : whether vectors passed as columns should be copied

If `val` is an `AbstractRange` then the result of `collect(val)` is inserted.

If `df` is a `SubDataFrame` then it must be created with `:` as column selector
(otherwise an error is thrown). In this case the `copycols` keyword argument
is ignored an added column is always copied and the parent data frame is
filled with `missing` in rows that are filtered out by `df`.

If `df` isa `DataFrame` that has no columns and only values
other than `AbstractVector` are passed then it is used to create a one element
column.
If `df` isa `DataFrame` that has no columns and at least one `AbstractVector` is
passed then its length is used to determine the number of elements in all
created columns.
In all other cases the number of rows in all created columns must match
`nrow(df)`.
.

# Examples
```jldoctest
julia> df = DataFrame(a=1:3)
3×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3

julia> insertcols!(df, 1, :b => 'a':'c')
3×2 DataFrame
Row │ b a
│ Char Int64
─────┼─────────────
1 │ a 1
2 │ b 2
3 │ c 3

julia> insertcols!(df, 2, :c => 2:4, :c => 3:5, makeunique=true)
3×4 DataFrame
Row │ b c c_1 a
│ Char Int64 Int64 Int64
─────┼───────────────────────────
1 │ a 2 3 1
2 │ b 3 4 2
3 │ c 4 5 3
```
"""
function insertcols!(df::AbstractDataFrame, col::ColumnIndex, name_cols::Pair{Symbol, <:Any}...;
makeunique::Bool=false, copycols::Bool=true)
if !(df isa DataFrame || (df isa SubDataFrame && getfield(df, :colindex) isa Index))
throw(ArgumentError("insertcols! is only supported for DataFrame or" *
"SubDataFrame created with `:` as column selector"))
end
col_ind = Int(col isa SymbolOrString ? columnindex(df, col) : col)
if !(0 < col_ind <= ncol(df) + 1)
throw(ArgumentError("attempt to insert a column to a data frame with " *
"$(ncol(df)) columns at index $col_ind"))
end

if !makeunique
if !allunique(first.(name_cols))
throw(ArgumentError("Names of columns to be inserted into a data frame " *
"must be unique when `makeunique=true`"))
end
for (n, _) in name_cols
if hasproperty(df, n)
throw(ArgumentError("Column $n is already present in the data frame " *
"which is not allowed when `makeunique=true`"))
end
end
end

if ncol(df) == 0 && df isa DataFrame
target_row_count = -1
else
target_row_count = nrow(df)
end

for (n, v) in name_cols
if v isa AbstractVector
if target_row_count == -1
target_row_count = length(v)
elseif length(v) != target_row_count
if target_row_count == nrow(df)
throw(DimensionMismatch("length of new column $n which is " *
"$(length(v)) must match the number " *
"of rows in data frame ($(nrow(df)))"))
else
throw(DimensionMismatch("all vectors passed to be inserted into " *
"a data frame must have the same length"))
end
end
elseif v isa AbstractArray && ndims(v) > 1
throw(ArgumentError("adding AbstractArray other than AbstractVector as " *
"a column of a data frame is not allowed"))
end
end
if target_row_count == -1
target_row_count = 1
end

for (name, item) in name_cols
if !(item isa AbstractVector)
if item isa Union{AbstractArray{<:Any, 0}, Ref}
x = item[]
item_new = fill!(Tables.allocatecolumn(typeof(x), target_row_count), x)
else
@assert !(item isa AbstractArray)
item_new = fill!(Tables.allocatecolumn(typeof(item), target_row_count), item)
end
elseif item isa AbstractRange
item_new = collect(item)
elseif copycols
item_new = copy(item)
else
item_new = item
end

if df isa DataFrame
dfp = df
else
dfp = parent(df)
T = eltype(item_new)
newcol = Tables.allocatecolumn(Union{T, Missing}, nrow(dfp))
fill!(newcol, missing)
newcol[rows(df)] = item_new
item_new = newcol
end

firstindex(item_new) != 1 && _onebased_check_error()

if ncol(dfp) == 0
dfp[!, name] = item_new
else
if hasproperty(dfp, name)
@assert makeunique
k = 1
while true
nn = Symbol("$(name)_$k")
if !hasproperty(dfp, nn)
name = nn
break
end
k += 1
end
end
insert!(index(dfp), col_ind, name)
insert!(_columns(dfp), col_ind, item_new)
end
col_ind += 1
end
return df
end

insertcols!(df::AbstractDataFrame, col::ColumnIndex, name_cols::Pair{<:AbstractString, <:Any}...;
makeunique::Bool=false, copycols::Bool=true) =
insertcols!(df, col, (Symbol(n) => v for (n, v) in name_cols)...,
makeunique=makeunique, copycols=copycols)

insertcols!(df::AbstractDataFrame, name_cols::Pair{Symbol, <:Any}...;
makeunique::Bool=false, copycols::Bool=true) =
insertcols!(df, ncol(df)+1, name_cols..., makeunique=makeunique, copycols=copycols)

insertcols!(df::AbstractDataFrame, name_cols::Pair{<:AbstractString, <:Any}...;
makeunique::Bool=false, copycols::Bool=true) =
insertcols!(df, (Symbol(n) => v for (n, v) in name_cols)...,
makeunique=makeunique, copycols=copycols)

function insertcols!(df::AbstractDataFrame, col::Int=ncol(df)+1; makeunique::Bool=false, name_cols...)
if !(0 < col <= ncol(df) + 1)
throw(ArgumentError("attempt to insert a column to a data frame with " *
"$(ncol(df)) columns at index $col"))
end
if !isempty(name_cols)
# an explicit error is thrown as keyword argument was supported in the past
throw(ArgumentError("inserting colums using a keyword argument is not supported, " *
"pass a Pair as a positional argument instead"))
end
return df
end
Loading