-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow formatting the categorical encoded variables #158
Comments
I think it would be possible to easily add a
But presently the "variable" argument would be the entire Can you suggest a syntax that would make sense for you so we can further evaluate this? |
Good point. I was coming more from the perspective of having easier to handle variable names. e.g., design = formulaic.model_matrix(["C(BinGrp, contr.treatment)"], all_phenotypes)
model = sm.OLS([1,2,3,1,2,3], design).fit()
model.summary()
model.t_test("C(BinGrp, contr.treatment)[T.1] - C(BinGrp, contr.treatment)[T.0]") # impressively works But, a little cumbersome to do. Similarly if you had multiple encodings e.g.,
Then I think you would specifiy a format for each one? Using your suggested syntax:
If format is not provided it falls back to the default? |
Hmmmm... adding After reflecting more on this, I think sensible (non-mutuially exclusive) ways forward might include:
I think I am leaning toward (1) and (3). I would consider implementing within formula aliasing if there were enough demand for it... but remain unconvinced at present. |
I wasn't aware of 3. I tried it and it almost works, but the value var is still formatted differently.
But, I was digging into the code a little bit and I realized there may be a simple enough way to get what is desired (although perhaps not stable across versions due to not being a "blessed" API). import pandas
from formulaic import model_matrix
import formulaic
data = pandas.DataFrame({"X": ['a', 'b', 'c']})
formulaic.transforms.contrasts.TreatmentContrasts.FACTOR_FORMAT = '{name}.{field}'
model_matrix("~C(X)", data)
Intercept C(X).b C(X).c
0 1.0 0 0
1 1.0 1 0
2 1.0 0 1 This is almost the desired output. The ~C(X) is still being stored in the name. But, perhaps there is a similar hack for this as well? If I can track down where the name is being set. I could do
and that gets me exactly what is needed, but that requires knowing the contrast variables in the formula involves parsing the formula. By chance, is there a similar format constant I can play with to get the formatting needed without an official format support? It already works when I don't explicitly ask for a contrast coding, but converting by values to strings. from formulaic.transforms import C
data.X = data.X.astype(str)
formulaic.transforms.contrasts.TreatmentContrasts.FACTOR_FORMAT = '{name}.{field}'
model_matrix("~ X", data)
Intercept X.b X.c
0 1.0 0 0
1 1.0 1 0
2 1.0 0 1 |
Currently they get formatted as
C({parameter})[T.{value}]
or{parameter}[T.{value}]
if its already a string.E.g.,
It would be nice if we could pass in a format string to get simpler names. E.g. BinGrp0, BinGrp1 if we pass in a format string like "{parameter}{value}"
Moved from #46 (comment)
The text was updated successfully, but these errors were encountered: