Skip to content

Commit

Permalink
FunctionTerm is dead, long live FunctionTerm (#183)
Browse files Browse the repository at this point in the history
* braodcastable for abstract term

* move sorting and "cleanup" stage to runtime/terms

* use tests/Project.toml instead of [extras]

* drop interaction with constant term special in parsing

* move test deps to test/Project.toml

* expand star expression at run time not parse time

* modelcols(::Term, d) pulls out the column and returns it

* remove dead parsing code

* update test for changes to modelcols(::Term, data)

* WIP: alternative FunctionTerm

* clean up formula.jl, parse-time protection

* fleshing out FunctionTerm2 API methods, add exorig

* protection

* exports, missing methods, lead/lag update

* dead code, tests

* tests pass

* delete rest of FunctionTerm code

* FunctionTerm is dead, long live FunctionTerm

* splatted methods of arithmetic ops not needed

* include Compat.only

* use Compat for only

* test on 1.5, not 1.4 or 1.3

* Revert "use tests/Project.toml instead of [extras]"

This reverts commit 36f451a.

* restore compat bounds for test dependencies

* WIP document protection

* basic tests for [un]protect, fix protect in Protected ctx bug

* more docstrings, include protection in API page, fix doctests

* avoid method ambiguities

* just eval un-protected defs for +, &, and *

* remove unnecessary (and un-used) methods for unprotect

* additional tests for term operators and uniqueify & output

* fix test

* Apply suggestions from code review

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

* drop _parsed from lead/lag term

* example

* unprotect, doc

* protection section docs summary

* bye bye "usual special"

* move specials to top, eval unprotect

* Apply suggestions from code review

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

* `@unprotect` -> `@support_unprotect`

* actually defining `@support_unprotect` would be a good idea

* test throws `2 & x`

* update docs, doctest(; fix=true)

* manually specify julia version in docs CI

* few more docstring fixes

* these need to be set at the top level now for some reason

* xtremely breaking

* 0.7

* Update src/terms.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

* stopgap docs build

* whoops

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
  • Loading branch information
kleinschmidt and nalimilan authored Jan 24, 2023
1 parent 463eb0a commit c4b68cf
Show file tree
Hide file tree
Showing 20 changed files with 551 additions and 386 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
with:
version: '1.8'
- run: julia --project=docs -e 'using Pkg; pkg"add GLM#dfk/statsmodels-7"'
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-docdeploy@latest
env:
Expand Down
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
name = "StatsModels"
uuid = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
version = "0.6.33"
version = "0.7.0"

[deps]
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
DataAPI = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Expand Down
2 changes: 1 addition & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ StatsModels = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
Documenter = "0.25"
Documenter = "0.27"
4 changes: 2 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ using Documenter, StatsModels

DocMeta.setdocmeta!(StatsModels, :DocTestSetup, :(using StatsModels, StatsBase); recursive=true)


using Pkg
Pkg.precompile()

Expand All @@ -16,7 +15,8 @@ makedocs(
"Temporal variables and Time Series Terms" => "temporal_terms.md",
"API documentation" => "api.md"
],
modules = [StatsModels]
modules = [StatsModels],
doctestfilters = [r"([a-z]*) => \1", r"getfield\(.*##[0-9]+#[0-9]+"]
)

deploydocs(
Expand Down
14 changes: 13 additions & 1 deletion docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ DocTestSetup = quote
using StatsModels, Random, StatsBase
Random.seed!(2001)
end
DocTestFilters = [r"([a-z]*) => \1", r"getfield\(.*##[0-9]+#[0-9]+"]
```

# StatsModels.jl API
Expand Down Expand Up @@ -48,6 +47,19 @@ collect_matrix_terms
is_matrix_term
```

### Protection

For more fine-grained control over whether function calls are treated as normal
Julia calls ("protected" and captured as `FunctionTerm`s) or as `@formula`
syntax ("unprotected").

```@docs
protect
unprotect
@support_unprotect
Protected
```

## Schema

```@docs
Expand Down
99 changes: 74 additions & 25 deletions docs/src/formula.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Predictors:
julia> resp, pred = modelcols(f, df);
julia> pred
9×7 Array{Float64,2}:
9×7 Matrix{Float64}:
1.0 1.0 0.236782 0.0 0.0 0.0 0.0
1.0 2.0 0.943741 1.0 0.0 0.943741 0.0
1.0 3.0 0.445671 0.0 1.0 0.0 0.445671
Expand Down Expand Up @@ -173,7 +173,7 @@ package) are treated like normal Julia code, and evaluated elementwise:

```jldoctest 1
julia> modelmatrix(@formula(y ~ 1 + a + log(1+a)), df)
9×3 Array{Float64,2}:
9×3 Matrix{Float64}:
1.0 1.0 0.693147
1.0 2.0 1.09861
1.0 3.0 1.38629
Expand All @@ -197,7 +197,7 @@ julia> gt_e(s) = any(c > 'e' for c in s)
gt_e (generic function with 1 method)
julia> modelmatrix(@formula(y ~ 1 + gt_e(c)), df)
9×2 Array{Float64,2}:
9×2 Matrix{Float64}:
1.0 0.0
1.0 0.0
1.0 1.0
Expand All @@ -218,8 +218,9 @@ For instance, to fit a linear regression to a log-transformed response:
```jldoctest 1
julia> using GLM
julia> lm(@formula(log(y) ~ 1 + a + b), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
:(log(y)) ~ 1 + a + b
Expand All @@ -235,7 +236,7 @@ b -1.63199 1.12678 -1.45 0.1977 -4.38911 1.12513
julia> df.log_y = log.(df.y);
julia> lm(@formula(log_y ~ 1 + a + b), df) # equivalent
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
log_y ~ 1 + a + b
Expand All @@ -250,12 +251,12 @@ b -1.63199 1.12678 -1.45 0.1977 -4.38911 1.12513
```

The no-op function `identity` can be used to block the normal formula-specific
The `protect` function can be used to block the normal formula-specific
interpretation of `+`, `*`, and `&`:

```jldoctest 1
julia> modelmatrix(@formula(y ~ 1 + b + identity(1+b)), df)
9×3 Array{Float64,2}:
julia> modelmatrix(@formula(y ~ 1 + b + protect(1+b)), df)
9×3 Matrix{Float64}:
1.0 0.236782 1.23678
1.0 0.943741 1.94374
1.0 0.445671 1.44567
Expand All @@ -270,10 +271,19 @@ julia> modelmatrix(@formula(y ~ 1 + b + identity(1+b)), df)
## Constructing a formula programmatically

A formula can be constructed at runtime by creating `Term`s and combining them
with the formula operators `+`, `&`, and `~`:
with the formula operators `+`, `&`, `*`, and `~`:

```jldoctest 1
julia> Term(:y) ~ ConstantTerm(1) + Term(:a) + Term(:b) + Term(:a) & Term(:b)
julia> Term(:y) ~ ConstantTerm(1) + Term(:a) + Term(:a) & Term(:b)
FormulaTerm
Response:
y(unknown)
Predictors:
1
a(unknown)
a(unknown) & b(unknown)
julia> Term(:y) ~ ConstantTerm(1) + Term(:a) * Term(:b)
FormulaTerm
Response:
y(unknown)
Expand All @@ -284,20 +294,6 @@ Predictors:
a(unknown) & b(unknown)
```

!!! warning

Even though the `@formula` macro supports arbitrary julia functions,
runtime (programmatic) formula construction does not. This is because to
resolve a symbol giving a function's _name_ into the actual _function_
itself, it's necessary to `eval`. In practice this is not often an issue,
_except_ in cases where a package provides special syntax by overloading a
function (like `|` for
[MixedModels.jl](https://github.com/dmbates/MixedModels.jl), or `absorb`
for [Econometrics.jl](https://github.com/Nosferican/Econometrics.jl)). In
these cases, you should use the corresponding constructors for the actual
terms themselves (e.g., `RanefTerm` and `FixedEffectsTerm` respectively), as
long as the packages have [implemented support for them](@ref extend-runtime).

The [`term`](@ref) function constructs a term of the appropriate type from
symbols or strings (`Term`) and numbers (`ConstantTerm`), which makes it easy to
work with collections of mixed type:
Expand Down Expand Up @@ -335,6 +331,59 @@ true
```

### Constructing a `FunctionTerm` programmatically

It is also possible to create a `FunctionTerm` programmatically, matching the
behavior of what happens when a call to a function like `log` is encountered
inside the `@formula` macro, although it takes a bit of care to get right. In
the future we may add more convenience methods to "lift" functions into the
"term domain" but for now they must be constructed manually, like so:

```jldoctest 1
julia> log_term(t::AbstractTerm) = FunctionTerm(log, [t], :(log($(t))))
log_term (generic function with 1 method)
julia> log_term(term(:y))
(y)->log(y)
julia> f = log_term(term(:y)) ~ sum(ts)
FormulaTerm
Response:
(y)->log(y)
Predictors:
1
a(unknown)
b(unknown)
julia> response(f, df)
9-element Vector{Float64}:
-0.5358107653592508
-2.5595706990153952
-0.3331980664948834
-1.1383191195688154
-0.4260357285735626
-1.4412188661761132
-0.34293563140185523
-0.5837776723176953
-2.980055366491228
julia> lm(f, df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
:(log(y)) ~ 1 + a + b
Coefficients:
──────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept) 0.0698025 0.928295 0.08 0.9425 -2.20165 2.34126
a -0.105669 0.128107 -0.82 0.4410 -0.419136 0.207797
b -1.63199 1.12678 -1.45 0.1977 -4.38911 1.12513
──────────────────────────────────────────────────────────────────────────
```

Compared with the example above, the result is the same.

## Fitting a model from a formula

The main use of `@formula` is to streamline specifying and fitting statistical
Expand Down Expand Up @@ -363,7 +412,7 @@ julia> ϵ = randn(rng, 100)*0.1;
julia> data.y = X*β_true .+ ϵ;
julia> mod = fit(LinearModel, @formula(y ~ 1 + a*b), data)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
y ~ 1 + a + b + a & b
Expand Down
Loading

0 comments on commit c4b68cf

Please sign in to comment.