Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas from gurobipy-pandas #3212

Closed
odow opened this issue Feb 9, 2023 · 11 comments
Closed

Ideas from gurobipy-pandas #3212

odow opened this issue Feb 9, 2023 · 11 comments

Comments

@odow
Copy link
Member

odow commented Feb 9, 2023

I'm currently listening to @simonbowly talk about his cool work integrating gurobipy and pandas: https://github.com/Gurobi/gurobipy-pandas

I don't know if we can make it easy for JuMP to work with DataFrames without an extension package (although things might just work and it just requires some documentation), but it'd be nice to improve this:

@objective(
model,
Min,
sum(food["cost"] * x[food["name"]] for food in eachrow(foods)),
);

by adding x as a column to the data frame instead of a separate variable.

At minimum, it'd allow

 @objective( 
     model, 
     Min, 
     sum(food["cost"] * food["x"] for food in eachrow(foods)), 
 )

but potentially some variant of

@objective(model, Min, food["cost"]' * food["x"])
@objective(model, Min, sum(food.cost .* food.x)) 

You could also imagine building constraints/expressions via split-apply-combine. I'd imagine there are lots of models where the variables correspond to a row in a dataframe.

@jd-foster
Copy link
Collaborator

jd-foster commented Feb 9, 2023

So the type of foods[:x] would be Vector{VariableRef}?

@odow
Copy link
Member Author

odow commented Feb 9, 2023

Yes. Here's something I've just been playing with:

model = Model(HiGHS.Optimizer)

@variable(model, x[foods.name] >= 0)
foods.x = x.data

@objective(model, Min, foods.cost' * foods.x,);

# A few ways of writing the same thing

for row in eachrow(limits)
    @constraint(model, row.min <= sum(foods[!, row.name] .* foods.x) <= row.max)
end

for df in DataFrames.groupby(limits, :name)
    @constraint(model, df.min[1] <= sum(foods[!, df.name[1]] .* foods.x) <= df.max[1])
end

for df in DataFrames.groupby(
    DataFrames.leftjoin(
        DataFrames.stack(foods, [:calories, :protein, :fat, :sodium]),
        limits;
        on = [:variable => :name],
    ),
    :variable,
)
    @constraint(model, df.min[1] <= sum(df.value .* df.x) <= df.max[1])
end

The groupby stuff makes it harder (for me) to read. I think we have a lot of mileage out of the existing {Dense,Sparse}AxisArray stuff, and a lot of functionality that comes for free in Julia.

But the practice of adding a column of a variables to a DataFrame is a useful tip.

I wonder if there are other models with more complicated data frames.

@jd-foster
Copy link
Collaborator

jd-foster commented Feb 9, 2023

The groupby stuff makes it harder (for me) to read.

I agree; the first one (for row in eachrow(limits)...) is the easiest and quite a nice technique.

I wonder if there are other models with more complicated data frames.

As far as I can tell, AnyMOD takes this idea to the limit:

To increase performance, AnyMOD stores variables within DataFrames instead of using JuMPs native containers.

@jd-foster
Copy link
Collaborator

jd-foster commented Feb 9, 2023

FYI, I always enjoy re-reading this paper by Robert 4er
Database structures for mathematical programming models
describing the correspondence between variables and tabular data.

(Also, love the screenshots from his 1990's Mac user interface.)

@odow
Copy link
Member Author

odow commented Feb 9, 2023

As far as I can tell, AnyMOD

cc @leonardgoeke: any thoughts on ways JuMP could improve re dataframes?

@leonardgoeke
Copy link

Overall, I’m very happy with the options JuMP and DataFrames are providing already.

To make their combination more accessible, you could facilitate the conversion of containers. In the example code below creating and storing a variable in a DataFrame is significantly more complex than using the JuMP containers. This example is still simplified, because it is not sparse, and I would rather map the content of a and b to integers and use those in the DataFrame to improve the performance of join or groupby.

using JuMP, DataFrames
a = ["high","low"]
b = ["red","blue"]
model = Model()
@variable(model, x[a, b])
var = JuMP.build_variable(error,  VariableInfo(false, NaN, false, NaN, false, NaN, false, NaN, false, false))
df = DataFrame([(a = z[1], b = z[2], var = JuMP.add_variable(model, var,"x[" * z[1] * "," * z[2] * "]"))  for z in vcat(Iterators.product(a,b)...)])

Also, I miss an in-place multiplication. In the process of combining variables to expressions and ultimately creating constraints, I can use add_to_expression! to sum efficiently but I’m not aware of a similar option for multiplication.

df = DataFrame(a = map(x -> 1.0* JuMP.add_variable(model, var,x),["a1","a2"]), b = map(x -> JuMP.add_variable(model, var,x),["b1","b2"]))
@time add_to_expression!.(df[!,:a], df[!,:b])
@time df[!,:a] = df[!,:a] .+ df[!,:b]
@time df[!,:b] = df[!,:b] * 2.0

@odow
Copy link
Member Author

odow commented Feb 13, 2023

I'd write your first example differently:

julia> using JuMP, DataFrames

julia> a = ["high","low"]
2-element Vector{String}:
 "high"
 "low"

julia> b = ["red","blue"]
2-element Vector{String}:
 "red"
 "blue"

julia> # Option 1
       model = Model();

julia> df = DataFrame(
           vec([
               (a = i, b = j, var = @variable(model, base_name = "x[$i,$j]")) for 
               (i, j) in Iterators.product(a,b)
           ])
       )
4×3 DataFrame
 Row │ a       b       var          
     │ String  String  Variable    
─────┼──────────────────────────────
   1 │ high    red     x[high,red]
   2 │ low     red     x[low,red]
   3 │ high    blue    x[high,blue]
   4 │ low     blue    x[low,blue]

julia> # Option 2
       model = Model();

julia> @variable(model, x[a, b])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
    Dimension 1, ["high", "low"]
    Dimension 2, ["red", "blue"]
And data, a 2×2 Matrix{VariableRef}:
 x[high,red]  x[high,blue]
 x[low,red]   x[low,blue]

julia> df = DataFrame(
           vec([(a = i, b = j, var = x[i, j]) for (i, j) in Iterators.product(a,b)])
       )
4×3 DataFrame
 Row │ a       b       var          
     │ String  String  Variable    
─────┼──────────────────────────────
   1 │ high    red     x[high,red]
   2 │ low     red     x[low,red]
   3 │ high    blue    x[high,blue]
   4 │ low     blue    x[low,blue]

julia> # Option 3
       model = Model();

julia> @variable(model, x[a, b])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
    Dimension 1, ["high", "low"]
    Dimension 2, ["red", "blue"]
And data, a 2×2 Matrix{VariableRef}:
 x[high,red]  x[high,blue]
 x[low,red]   x[low,blue]

julia> df = DataFrame(
           vec([
             (i = i, j = j, ai = ai, bj = bj, var = x[ai, bj]) for
             ((i, ai), (j, bj)) in Iterators.product(enumerate(a),enumerate(b))
           ])
       )
4×5 DataFrame
 Row │ i      j      ai      bj      var          
     │ Int64  Int64  String  String  Variable    
─────┼────────────────────────────────────────────
   11      1  high    red     x[high,red]
   22      1  low     red     x[low,red]
   31      2  high    blue    x[high,blue]
   42      2  low     blue    x[low,blue]

For the doubling, you can use:

julia> model = Model();

julia> @variable(model, x)
x

julia> aff_expr = 2 * x + 1
2 x + 1

julia> map_coefficients_inplace!(x -> 2x, aff_expr)
4 x + 2

@odow
Copy link
Member Author

odow commented Feb 14, 2023

Is there anything actionable here? I think the conclusion is that JuMP already provides the necessary functionality to work with DataFrames.jl.

@odow
Copy link
Member Author

odow commented Feb 16, 2023

I'm going to close this. If anyone has any further suggestions, please comment below and I will re-open the issue.

@slwu89
Copy link

slwu89 commented Dec 5, 2023

Hello, this is not so much a suggestion as a question, so if not appropriate in this closed issue please redirect me, thanks!

@jd-foster I am wondering if you know of any code implementations of the method described in the Fourer paper, especially related to generating synthetic data for the "hierarchical" or "relational" database schemas (much more interested in the relational schema, at least for now). I'd like to test some ideas I have regarding in memory DBs and JuMP but I'm not familiar with steel production so I really do not know how to come up with some synthetic data with parameters that resembles what he mentioned in the article.

@odow
Copy link
Member Author

odow commented Dec 5, 2023

Please ask these types of questions on https://discourse.julialang.org/c/domain/opt/13.

For SQLite examples, see:

https://jump.dev/JuMP.jl/stable/tutorials/linear/multi/
https://jump.dev/JuMP.jl/stable/tutorials/linear/multi_commodity_network/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants