Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for log-transforms (and other arithmetic) in formula? #455

Closed
martiningram opened this issue Feb 20, 2022 · 5 comments
Closed

Support for log-transforms (and other arithmetic) in formula? #455

martiningram opened this issue Feb 20, 2022 · 5 comments

Comments

@martiningram
Copy link

martiningram commented Feb 20, 2022

Hi all,

I'm a big fan of bambi, thanks for all your great work!

I have a question about more complicated formulas. I'm currently trying to fit models taken from a textbook. One of the formula strings used there is:

log(weight) ~ log(I(diam1 * diam2 * canopy_height)) + log(I(diam1 * diam2)) + log(I(diam1 / diam2)) + log(total_height) + group

In other words, it includes log transformations, and also some other arithmetic operations. What I'm curious about is whether bambi supports this in some way, or whether all such transformations should be done beforehand, e.g. by creating a variable log_total_height = np.log(total_height)?

Thanks for your help!

@tomicapretto
Copy link
Collaborator

tomicapretto commented Feb 21, 2022

Hi @martiningram

Bambi has support for those.

  • You can pass Python functions as your model terms.
  • You can wrap mathematical operations between terms within I() or using the shorthand {}.

For example, the formula you're sharing would be

import numpy as np
formula = "np.log(weight) ~ np.log(diam1 * diam2 * canopy_height) + np.log(diam1 * diam2) + log(diam1 / diam2) + np.log(total_height) + group"

Note that I'm not using I() because the operations are carried within the arguments of a function, so they are interpreted as regular Python code, and not as model formula code.

If you want a term that is the result of doing some math operation between variables, you do need I() or `{}.

For example

formula = "y ~ I(x * z)"
formula = "y ~ I(x / z)" # 1
formula = "y ~ {x / z}"  # 2
formula = "y ~ {10 * x}"

where 1 and 2 are equivalent representations for the ratio between x and z

Update

You can also use custom functions (not only functions imported from NumPy or any other module)

def f(x):
    return (x - np.median(x)) / 10
formula = "y ~ f(x)"

is also valid

@martiningram
Copy link
Author

martiningram commented Feb 21, 2022

Wow, this is working a treat, thanks Tomas! I was even able to use the formula as-is by defining

def log(x):
    return np.log(x)

as you mentioned in the Update.

Am I right in assuming that keeping the I is OK, even though it's actually not required as you say?

@tomicapretto
Copy link
Collaborator

@martiningram that's right. I() is literally an identity function. It returns whatever you pass in.

@martiningram
Copy link
Author

Terrific, thanks Tomas! This is working very well, so I am closing this issue!

@aflaxman
Copy link

You can also perhaps save the annoyance of defining your own log by importing it explicitly instead with
from numpy import log --- very cool stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants