use of map in ByRow #2957

bkamins · 2021-12-05T19:30:34Z

The problem is:

julia> df = DataFrame(x=spzeros(10))
10×1 DataFrame
 Row │ x       
     │ Float64 
─────┼─────────
   1 │     0.0
   2 │     0.0
   3 │     0.0
   4 │     0.0
   5 │     0.0
   6 │     0.0
   7 │     0.0
   8 │     0.0
   9 │     0.0
  10 │     0.0

julia> transform(df, :x => ByRow(x -> rand()))
10×2 DataFrame
 Row │ x        x_function 
     │ Float64  Float64    
─────┼─────────────────────
   1 │     0.0   0.0797826 
   2 │     0.0   0.0797826 
   3 │     0.0   0.0797826 
   4 │     0.0   0.0797826 
   5 │     0.0   0.0797826 
   6 │     0.0   0.0797826 
   7 │     0.0   0.0797826 
   8 │     0.0   0.0797826 
   9 │     0.0   0.0797826 
  10 │     0.0   0.0797826

which is incorrect.

The reason is that we stayed with using map in

DataFrames.jl/src/abstractdataframe/selection.jl

Line 302 in 12c586c

(f::ByRow)(cols::AbstractVector...) = map(f.fun, cols...)

since it was efficient. I could change it, but maybe we consider the behavior of map for SparseVector incorrect and we can stay with using map?

@nalimilan + I am not sure whom to ping from SparseArrays community?

The text was updated successfully, but these errors were encountered:

nalimilan · 2021-12-05T19:43:36Z

So basically this boils down to:

julia> using SparseArrays

julia> x = spzeros(10)
10-element SparseVector{Float64, Int64} with 0 stored entries

julia> map(_ -> rand(), x)
10-element SparseVector{Float64, Int64} with 10 stored entries:
  [1 ]  =  0.809673
  [2 ]  =  0.809673
  [3 ]  =  0.809673
  [4 ]  =  0.809673
  [5 ]  =  0.809673
  [6 ]  =  0.809673
  [7 ]  =  0.809673
  [8 ]  =  0.809673
  [9 ]  =  0.809673
  [10]  =  0.809673

I don't think DataFrames should take care of this. That's a problem in SparseArrays, and as long as map behaves like that there it's OK for us to do the same with ByRow. But I agree it's weird, and we precisely changed PooledArrays to avoid that. Maybe we could find a common solution, like pure=false in PooledArrays.

I'm not sure who we could ping, probably best file an issue in SparseArrays?

bkamins added the bug label Dec 5, 2021

bkamins added this to the patch milestone Dec 5, 2021

bkamins mentioned this issue Dec 5, 2021

Decision on the behavior of map JuliaSparse/SparseArrays.jl#4

Open

bkamins mentioned this issue Jan 6, 2022

make sure ByRow invokes generic map #2982

Merged

bkamins closed this as completed in #2982 Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use of map in ByRow #2957

use of map in ByRow #2957

bkamins commented Dec 5, 2021

nalimilan commented Dec 5, 2021

use of map in ByRow #2957

use of map in ByRow #2957

Comments

bkamins commented Dec 5, 2021

nalimilan commented Dec 5, 2021