Derived variables from data on the LHS of tilde #519

yebai · 2023-08-18T11:09:02Z

The following model will ignore the fact v is a derived variable from data and treat v as a random variable. This is a known limitation of DynamicPPL. I am opening this issue to track it and discuss ideas on how to fix/disallow it.

@model function demo(y, sigma)
    m ~ Normal(0, sigma)
    v = y
    v ~ Normal(m, sigma)
end

The text was updated successfully, but these errors were encountered:

yebai · 2023-08-18T11:09:29Z

cc @sunxd3 @devmotion @torfjelde

yebai · 2023-08-18T11:13:52Z

Maybe we can perform some activity analysis as in autodiff algorithms. These analyses would allow us to verify whether a variable, like v depend on the model arguments, then throw an error if true.

yebai · 2024-05-07T20:07:32Z

@sunxd3 do you think the JuliaBUGS's transformed data pass can be applicable here?

For example, can we run the model once to compute all transformed data, then call condition to explicitly fix these variables as observations?

EDIT: another option is to make certain mild assumptions on LHS variables, e.g. their eltype can only be concrete real values or missings. Then, derived variables of arguments will inherit the eltype of arguments, thus providing information to tilde to decide between parameter and observation.

sunxd3 · 2024-05-08T16:26:28Z

There are some concerns.

I think checks like TuringLang/Turing.jl#2218 is good, as it eliminated the mutability issues: two stochastic variables shouldn't have the same name. Which means, saving mapping like @varname(x) => observed in the context/varinfo won't cause confusion in the sense that x can't be observed in some statement and need to be sampled in other statement.

That being said, a glaring problem, in my opinion, is the unreliableness of evaluating the RHS of =. Some RHS can be too expensive, some RHS may have different values given the value of some stochastic variable. E.g.

@model function m(y)
     x ~ Normal()
     a = x == 0 ? y : x
     a ~ Normal()
end

which makes "running once and deciding if variable is observation" not reliable.
This is to say that, probably, analysis like JuliaBUGS transformed data can never be exact on the general Julia program allowed by Turing. In JuliaBUGS it is okay, because I essentially say that "JuliaBUGS only allows the original BUGS functions, otherwise I won't eval it". But users may expect different from Turing.

Using the type information to determine if a variable can be a good idea. Still some issues, one is that some function will error with missing values. I am also not sure if we really want to limit the data type of LHS, given that we may want to support something like NamedTupleVariate (JuliaStats/Distributions.jl#1803).

I think we can and maybe should incorporate more static analysis into Turing, but we would also need to communicate the result of the analysis well to the users. Start adding more functionality into DebugUtils and check_model can be a good way forward.

yebai · 2024-05-08T16:35:04Z

Start adding more functionality into DebugUtils and check_model can be a good way forward.

Feel free to explore this further and see whether you can create a check to warn the user at least.

yebai · 2024-05-08T16:51:24Z

On further thoughts, if we allow running the model once, collect all LHS variable names, then run the model again, and check whether these LHS variables are assigned elsewhere (except for constructors like Vector()). Then we could either:

force these LHS variables to be conditioned variables, or
produce an error so user can rewrite their model

Would that work?

sunxd3 · 2024-05-09T08:07:26Z

Conceptually yes, but it may require us to modify the @model macro, so that we can have a assignment function (similar to tilde_) to do some analysis; and I am not sure how to handle functions with mutation effects.

torfjelde · 2024-05-09T17:50:49Z

AFAIK there's no way we can get around this without some form of "abstract interpretation" or the like.

IMO this is not a super-important feature to pursue as you can always just write

@model function demo(y, sigma; v = f(y))
    ...
end

What I do agree with is that it's an issue that users think this is possible, and it might be worth doing something to warn the user. But as @sunxd3 said, this is not possible without altering the @model macro (and we need to alter it in a way where we perform checks in all the = expressions, which immediately strikes me as something that will produce a lot of false positives).

yebai · 2024-05-10T11:27:56Z

I agree that we don't need a new transformed data feature, but we want to find a way to inform the user when a variable is incorrectly treated as a model parameter. Something like model_check sounds good, but it is not yet clear to me how we can do it.

torfjelde · 2024-05-10T17:48:27Z

Something like model_check sounds good, but it is not yet clear to me how we can do it.

As I mentioned above, we can't do that without altering every = in the model to also include a check which we execute at runtime.

AFAIK the only ways to do this without something like that, is to do some static analysis or symbolic tracing.

yebai · 2024-09-09T11:34:24Z

I am now leaning towards depreciating implicit conditioning via the model constructor. For clarity,

implicit conditioning refers to conditioning on variables through the model constructor
explicit conditioning refers to conditioning on variables through the condition model API

For example, demo(y, sigma) will allow passing hyperparameters or constants to the model, but input arguments should not appear on the LHS of ~ statements.

We can always check arguments of explicit conditioning to be LHS variables and throw an error if they are not. This way, users won't be able to condition on intermediate variables, so the issue herein won't apply anymore.

yebai pinned this issue May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derived variables from data on the LHS of tilde #519

Derived variables from data on the LHS of tilde #519

yebai commented Aug 18, 2023 •

edited

Loading

yebai commented Aug 18, 2023

yebai commented Aug 18, 2023

yebai commented May 7, 2024 •

edited

Loading

sunxd3 commented May 8, 2024 •

edited

Loading

yebai commented May 8, 2024

yebai commented May 8, 2024 •

edited

Loading

sunxd3 commented May 9, 2024

torfjelde commented May 9, 2024

yebai commented May 10, 2024

torfjelde commented May 10, 2024

yebai commented Sep 9, 2024 •

edited

Loading

Derived variables from data on the LHS of tilde #519

Derived variables from data on the LHS of tilde #519

Comments

yebai commented Aug 18, 2023 • edited Loading

yebai commented Aug 18, 2023

yebai commented Aug 18, 2023

yebai commented May 7, 2024 • edited Loading

sunxd3 commented May 8, 2024 • edited Loading

yebai commented May 8, 2024

yebai commented May 8, 2024 • edited Loading

sunxd3 commented May 9, 2024

torfjelde commented May 9, 2024

yebai commented May 10, 2024

torfjelde commented May 10, 2024

yebai commented Sep 9, 2024 • edited Loading

yebai commented Aug 18, 2023 •

edited

Loading

yebai commented May 7, 2024 •

edited

Loading

sunxd3 commented May 8, 2024 •

edited

Loading

yebai commented May 8, 2024 •

edited

Loading

yebai commented Sep 9, 2024 •

edited

Loading