Ideas from gurobipy-pandas #3212

odow · 2023-02-09T04:28:51Z

I'm currently listening to @simonbowly talk about his cool work integrating gurobipy and pandas: https://github.com/Gurobi/gurobipy-pandas

I don't know if we can make it easy for JuMP to work with DataFrames without an extension package (although things might just work and it just requires some documentation), but it'd be nice to improve this:

JuMP.jl/docs/src/tutorials/linear/diet.jl

Lines 102 to 106 in 451740a

    
           @objective( 
        
               model, 
        
               Min, 
        
               sum(food["cost"] * x[food["name"]] for food in eachrow(foods)), 
        
           );

by adding x as a column to the data frame instead of a separate variable.

At minimum, it'd allow

 @objective( 
     model, 
     Min, 
     sum(food["cost"] * food["x"] for food in eachrow(foods)), 
 )

but potentially some variant of

@objective(model, Min, food["cost"]' * food["x"])
@objective(model, Min, sum(food.cost .* food.x))

You could also imagine building constraints/expressions via split-apply-combine. I'd imagine there are lots of models where the variables correspond to a row in a dataframe.

The text was updated successfully, but these errors were encountered:

jd-foster · 2023-02-09T04:54:23Z

So the type of foods[:x] would be Vector{VariableRef}?

odow · 2023-02-09T05:01:04Z

Yes. Here's something I've just been playing with:

model = Model(HiGHS.Optimizer)

@variable(model, x[foods.name] >= 0)
foods.x = x.data

@objective(model, Min, foods.cost' * foods.x,);

# A few ways of writing the same thing

for row in eachrow(limits)
    @constraint(model, row.min <= sum(foods[!, row.name] .* foods.x) <= row.max)
end

for df in DataFrames.groupby(limits, :name)
    @constraint(model, df.min[1] <= sum(foods[!, df.name[1]] .* foods.x) <= df.max[1])
end

for df in DataFrames.groupby(
    DataFrames.leftjoin(
        DataFrames.stack(foods, [:calories, :protein, :fat, :sodium]),
        limits;
        on = [:variable => :name],
    ),
    :variable,
)
    @constraint(model, df.min[1] <= sum(df.value .* df.x) <= df.max[1])
end

The groupby stuff makes it harder (for me) to read. I think we have a lot of mileage out of the existing {Dense,Sparse}AxisArray stuff, and a lot of functionality that comes for free in Julia.

But the practice of adding a column of a variables to a DataFrame is a useful tip.

I wonder if there are other models with more complicated data frames.

jd-foster · 2023-02-09T05:11:07Z

The groupby stuff makes it harder (for me) to read.

I agree; the first one (for row in eachrow(limits)...) is the easiest and quite a nice technique.

I wonder if there are other models with more complicated data frames.

As far as I can tell, AnyMOD takes this idea to the limit:

To increase performance, AnyMOD stores variables within DataFrames instead of using JuMPs native containers.

jd-foster · 2023-02-09T05:17:29Z

FYI, I always enjoy re-reading this paper by Robert 4er
Database structures for mathematical programming models
describing the correspondence between variables and tabular data.

(Also, love the screenshots from his 1990's Mac user interface.)

odow · 2023-02-09T07:53:44Z

As far as I can tell, AnyMOD

cc @leonardgoeke: any thoughts on ways JuMP could improve re dataframes?

leonardgoeke · 2023-02-13T12:45:54Z

Overall, I’m very happy with the options JuMP and DataFrames are providing already.

To make their combination more accessible, you could facilitate the conversion of containers. In the example code below creating and storing a variable in a DataFrame is significantly more complex than using the JuMP containers. This example is still simplified, because it is not sparse, and I would rather map the content of a and b to integers and use those in the DataFrame to improve the performance of join or groupby.

using JuMP, DataFrames
a = ["high","low"]
b = ["red","blue"]
model = Model()
@variable(model, x[a, b])
var = JuMP.build_variable(error,  VariableInfo(false, NaN, false, NaN, false, NaN, false, NaN, false, false))
df = DataFrame([(a = z[1], b = z[2], var = JuMP.add_variable(model, var,"x[" * z[1] * "," * z[2] * "]"))  for z in vcat(Iterators.product(a,b)...)])

Also, I miss an in-place multiplication. In the process of combining variables to expressions and ultimately creating constraints, I can use add_to_expression! to sum efficiently but I’m not aware of a similar option for multiplication.

df = DataFrame(a = map(x -> 1.0* JuMP.add_variable(model, var,x),["a1","a2"]), b = map(x -> JuMP.add_variable(model, var,x),["b1","b2"]))
@time add_to_expression!.(df[!,:a], df[!,:b])
@time df[!,:a] = df[!,:a] .+ df[!,:b]
@time df[!,:b] = df[!,:b] * 2.0

odow · 2023-02-13T20:24:35Z

I'd write your first example differently:

julia> using JuMP, DataFrames

julia> a = ["high","low"]
2-element Vector{String}:
 "high"
 "low"

julia> b = ["red","blue"]
2-element Vector{String}:
 "red"
 "blue"

julia> # Option 1
       model = Model();

julia> df = DataFrame(
           vec([
               (a = i, b = j, var = @variable(model, base_name = "x[$i,$j]")) for 
               (i, j) in Iterators.product(a,b)
           ])
       )
4×3 DataFrame
 Row │ a       b       var          
     │ String  String  Variable…    
─────┼──────────────────────────────
   1 │ high    red     x[high,red]
   2 │ low     red     x[low,red]
   3 │ high    blue    x[high,blue]
   4 │ low     blue    x[low,blue]

julia> # Option 2
       model = Model();

julia> @variable(model, x[a, b])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
    Dimension 1, ["high", "low"]
    Dimension 2, ["red", "blue"]
And data, a 2×2 Matrix{VariableRef}:
 x[high,red]  x[high,blue]
 x[low,red]   x[low,blue]

julia> df = DataFrame(
           vec([(a = i, b = j, var = x[i, j]) for (i, j) in Iterators.product(a,b)])
       )
4×3 DataFrame
 Row │ a       b       var          
     │ String  String  Variable…    
─────┼──────────────────────────────
   1 │ high    red     x[high,red]
   2 │ low     red     x[low,red]
   3 │ high    blue    x[high,blue]
   4 │ low     blue    x[low,blue]

julia> # Option 3
       model = Model();

julia> @variable(model, x[a, b])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
    Dimension 1, ["high", "low"]
    Dimension 2, ["red", "blue"]
And data, a 2×2 Matrix{VariableRef}:
 x[high,red]  x[high,blue]
 x[low,red]   x[low,blue]

julia> df = DataFrame(
           vec([
             (i = i, j = j, ai = ai, bj = bj, var = x[ai, bj]) for
             ((i, ai), (j, bj)) in Iterators.product(enumerate(a),enumerate(b))
           ])
       )
4×5 DataFrame
 Row │ i      j      ai      bj      var          
     │ Int64  Int64  String  String  Variable…    
─────┼────────────────────────────────────────────
   1 │     1      1  high    red     x[high,red]
   2 │     2      1  low     red     x[low,red]
   3 │     1      2  high    blue    x[high,blue]
   4 │     2      2  low     blue    x[low,blue]

For the doubling, you can use:

julia> model = Model();

julia> @variable(model, x)
x

julia> aff_expr = 2 * x + 1
2 x + 1

julia> map_coefficients_inplace!(x -> 2x, aff_expr)
4 x + 2

odow · 2023-02-14T23:05:24Z

Is there anything actionable here? I think the conclusion is that JuMP already provides the necessary functionality to work with DataFrames.jl.

odow · 2023-02-16T01:40:32Z

I'm going to close this. If anyone has any further suggestions, please comment below and I will re-open the issue.

slwu89 · 2023-12-05T04:53:18Z

Hello, this is not so much a suggestion as a question, so if not appropriate in this closed issue please redirect me, thanks!

@jd-foster I am wondering if you know of any code implementations of the method described in the Fourer paper, especially related to generating synthetic data for the "hierarchical" or "relational" database schemas (much more interested in the relational schema, at least for now). I'd like to test some ideas I have regarding in memory DBs and JuMP but I'm not familiar with steel production so I really do not know how to come up with some synthetic data with parameters that resembles what he mentioned in the article.

odow · 2023-12-05T05:01:08Z

Please ask these types of questions on https://discourse.julialang.org/c/domain/opt/13.

For SQLite examples, see:

https://jump.dev/JuMP.jl/stable/tutorials/linear/multi/
https://jump.dev/JuMP.jl/stable/tutorials/linear/multi_commodity_network/

odow added Category: Modeling Type: Documentation labels Feb 9, 2023

odow mentioned this issue Feb 9, 2023

[docs] tweak diet.jl tutorial to use more DataFrame logic #3213

Merged

odow closed this as completed Feb 16, 2023

odow mentioned this issue Sep 18, 2023

Improve support for relational algebra #3438

Open

slwu89 mentioned this issue Dec 2, 2023

relational algebra in JuMP slwu89/Orcas.jl#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas from gurobipy-pandas #3212

Ideas from gurobipy-pandas #3212

odow commented Feb 9, 2023

jd-foster commented Feb 9, 2023 •

edited

Loading

odow commented Feb 9, 2023

jd-foster commented Feb 9, 2023 •

edited

Loading

jd-foster commented Feb 9, 2023 •

edited

Loading

odow commented Feb 9, 2023

leonardgoeke commented Feb 13, 2023

odow commented Feb 13, 2023

odow commented Feb 14, 2023

odow commented Feb 16, 2023

slwu89 commented Dec 5, 2023 •

edited

Loading

odow commented Dec 5, 2023

Ideas from gurobipy-pandas #3212

Ideas from gurobipy-pandas #3212

Comments

odow commented Feb 9, 2023

jd-foster commented Feb 9, 2023 • edited Loading

odow commented Feb 9, 2023

jd-foster commented Feb 9, 2023 • edited Loading

jd-foster commented Feb 9, 2023 • edited Loading

odow commented Feb 9, 2023

leonardgoeke commented Feb 13, 2023

odow commented Feb 13, 2023

odow commented Feb 14, 2023

odow commented Feb 16, 2023

slwu89 commented Dec 5, 2023 • edited Loading

odow commented Dec 5, 2023

jd-foster commented Feb 9, 2023 •

edited

Loading

jd-foster commented Feb 9, 2023 •

edited

Loading

jd-foster commented Feb 9, 2023 •

edited

Loading

slwu89 commented Dec 5, 2023 •

edited

Loading