Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column vector breaks observed #4652

Closed
michaelosthege opened this issue Apr 20, 2021 · 14 comments · Fixed by #4700
Closed

Column vector breaks observed #4652

michaelosthege opened this issue Apr 20, 2021 · 14 comments · Fixed by #4700
Labels
bug v4 winOS windows OS related

Comments

@michaelosthege
Copy link
Member

michaelosthege commented Apr 20, 2021

While working on #4625 I ran into a failure that is demoed by the following example:

with pm.Model() as model:
    pm.Normal("x1", mu=0, sd=1, observed=np.random.normal(size=(3, 4)))
    model.logp()
    pm.Normal("x2", mu=0, sd=1, observed=np.random.normal(size=(3, 1)))
    model.logp()

Essentially passing a column vector breaks the model, whereas a matrix is fine.

Traceback

TypeError Traceback (most recent call last)
in
----> 1 test_observed_with_column_vector()

in test_observed_with_column_vector()
4 model.logp()
5 pm.Normal("x2", mu=0, sd=1, observed=np.random.normal(size=(3, 1)))
----> 6 model.logp()

e:\source\repos\pymc3\pymc3\model.py in logp(self)
264 def logp(self):
265 """Compiled log probability density function"""
--> 266 return self.model.fn(self.logpt)
267
268 @Property

e:\source\repos\pymc3\pymc3\model.py in logpt(self)
712 with self:
713 factors = [logpt_sum(var, self.rvs_to_values.get(var, None)) for var in self.free_RVs]
--> 714 factors += [logpt_sum(obs, obs.tag.observations) for obs in self.observed_RVs]
715
716 # Convert random variables into their log-likelihood inputs and

e:\source\repos\pymc3\pymc3\model.py in (.0)
712 with self:
713 factors = [logpt_sum(var, self.rvs_to_values.get(var, None)) for var in self.free_RVs]
--> 714 factors += [logpt_sum(obs, obs.tag.observations) for obs in self.observed_RVs]
715
716 # Convert random variables into their log-likelihood inputs and

e:\source\repos\pymc3\pymc3\distributions\logp.py in logpt_sum(*args, **kwargs)
366 if only the sum of the logp values is needed.
367 """
--> 368 return logpt(*args, sum=True, **kwargs)

e:\source\repos\pymc3\pymc3\distributions\logp.py in logpt(var, rv_values, jacobian, scaling, transformed, cdf, sum, **kwargs)
216 replacements.update({rv_var: rv_value, rv_value_var: rv_value})
217
--> 218 (logp_var,), _ = rvs_to_value_vars(
219 (logp_var,),
220 apply_transforms=transformed and not cdf,

e:\source\repos\pymc3\pymc3\aesaraf.py in rvs_to_value_vars(graphs, apply_transforms, initial_replacements, **kwargs)
352 return [trans_rv_value]
353
--> 354 return replace_rvs_in_graphs(graphs, transform_replacements, initial_replacements, **kwargs)
355
356

e:\source\repos\pymc3\pymc3\aesaraf.py in replace_rvs_in_graphs(graphs, replacement_fn, initial_replacements, **kwargs)
300 )
301
--> 302 fg.replace_all(replacements.items(), import_missing=True)
303
304 graphs = list(fg.outputs)

~\miniconda3\envs\pm3-dev\lib\site-packages\aesara\graph\fg.py in replace_all(self, pairs, **kwargs)
555 """Replace variables in the FunctionGraph according to (var, new_var) pairs in a list."""
556 for var, new_var in pairs:
--> 557 self.replace(var, new_var, **kwargs)
558
559 def attach_feature(self, feature: Feature) -> NoReturn:

~\miniconda3\envs\pm3-dev\lib\site-packages\aesara\graph\fg.py in replace(self, var, new_var, reason, verbose, import_missing)
513 print(reason, var, new_var)
514
--> 515 new_var = var.type.filter_variable(new_var, allow_convert=True)
516
517 if var not in self.variables:

~\miniconda3\envs\pm3-dev\lib\site-packages\aesara\tensor\type.py in filter_variable(self, other, allow_convert)
256 return other2
257
--> 258 raise TypeError(
259 f"Cannot convert Type {other.type} "
260 f"(of Variable {other}) into Type {self}. "

TypeError: Cannot convert Type TensorType(float64, matrix) (of Variable Rebroadcast{?,0}.0) into Type TensorType(float64, col). You can try to manually convert Rebroadcast{?,0}.0 into a TensorType(float64, col).

Versions

  • ✔️ v3.11.1
  • ✔️ v3.11.2
  • ✔️ master (d7172c0)
  • v4 (45cb4eb)
@brandonwillard
Copy link
Contributor

I'm unable to reproduce this error on 45cb4eb and Aesara 2.0.7.

@michaelosthege
Copy link
Member Author

FWIW I just reproduced it again with 45cb4eb and Aesara 2.0.6 and 2.0.7.
On two separate computers..

@brandonwillard
Copy link
Contributor

I've added your example code (verbatim) as a test in #4654, where it appears to have passed (i.e. not thrown an exception).

Perhaps this is somehow related to Windows; are both of the machines you tried Windows machines?

@brandonwillard brandonwillard added needs info Additional information required and removed needs info Additional information required labels Apr 20, 2021
@brandonwillard
Copy link
Contributor

brandonwillard commented Apr 20, 2021

It does appear to be a Windows-related issue: see here.

@brandonwillard brandonwillard added the winOS windows OS related label Apr 20, 2021
@michaelosthege
Copy link
Member Author

yes, both my machines are on Windows.
Yet another reason for #4517

@michaelosthege
Copy link
Member Author

michaelosthege commented Apr 20, 2021

Wrapping np.random.normal(size=(3, 1)) in pm.Data or aesara.shared does not cause the problem. (v4)

@michaelosthege michaelosthege added this to the vNext (4.0.0) milestone Apr 21, 2021
michaelosthege added a commit to michaelosthege/pymc that referenced this issue Apr 24, 2021
michaelosthege added a commit to michaelosthege/pymc that referenced this issue Apr 25, 2021
@michaelosthege
Copy link
Member Author

With the following test case:

import aesara.printing
with pm.Model() as model:
    rv = pm.Normal("x1", mu=0, sd=1, observed=np.random.normal(size=(3, 4)))
    print("\n\nobserved = (3, 4) ndarray\n")
    aesara.printing.debugprint([rv])
    model.logp()

    yobs = np.random.normal(size=(3, 1))
    rv = pm.Normal("x2", mu=0, sd=1, observed=at.as_tensor_variable(yobs))
    print("\n\nobserved = as_tensor_variable( (3, 1) ndarray )\n")
    aesara.printing.debugprint([rv])
    model.logp()

    rv = pm.Normal("x3", mu=0, sd=1, observed=pm.aesaraf.pandas_to_array(yobs))
    print("\n\nobserved = pandas_to_array( (3, 1) ndarray )\n")
    aesara.printing.debugprint([rv])
    model.logp()

I get this output:

observed = (3, 4) ndarray

normal_rv.1 [id A] 'x1'
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x2A5F9E82E18>) [id B]
 |Elemwise{Cast{int64}} [id C] ''
 | |MakeVector{dtype='int32'} [id D] ''
 |   |Subtensor{int64} [id E] ''
 |   | |TensorConstant{[3 4]} [id F]
 |   | |ScalarConstant{0} [id G]
 |   |Subtensor{int64} [id H] ''
 |     |TensorConstant{[3 4]} [id F]
 |     |ScalarConstant{1} [id I]
 |TensorConstant{11} [id J]
 |TensorConstant{0} [id K]
 |TensorConstant{1.0} [id L]


observed = as_tensor_variable( (3, 1) ndarray )

normal_rv.1 [id A] 'x2'
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x2A5F9E82E18>) [id B]
 |MakeVector{dtype='int64'} [id C] ''
 | |Subtensor{int64} [id D] ''
 | | |MakeVector{dtype='int64'} [id E] ''
 | | | |Subtensor{int64} [id F] ''
 | | | | |Shape [id G] ''
 | | | | | |TensorConstant{[[-0.22122..18085739]]} [id H]
 | | | | |ScalarConstant{0} [id I]
 | | | |Subtensor{int64} [id J] ''
 | | |   |Shape [id K] ''
 | | |   | |TensorConstant{[[-0.22122..18085739]]} [id H]
 | | |   |ScalarConstant{1} [id L]
 | | |ScalarConstant{0} [id M]
 | |Subtensor{int64} [id N] ''
 |   |MakeVector{dtype='int64'} [id E] ''
 |   |ScalarConstant{1} [id O]
 |TensorConstant{11} [id P]
 |TensorConstant{0} [id Q]
 |TensorConstant{1.0} [id R]


observed = pandas_to_array( (3, 1) ndarray )

normal_rv.1 [id A] 'x3'
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x2A5F9E82E18>) [id B]
 |Elemwise{Cast{int64}} [id C] ''
 | |MakeVector{dtype='int32'} [id D] ''
 |   |Subtensor{int64} [id E] ''
 |   | |TensorConstant{[3 1]} [id F]
 |   | |ScalarConstant{0} [id G]
 |   |Subtensor{int64} [id H] ''
 |     |TensorConstant{[3 1]} [id F]
 |     |ScalarConstant{1} [id I]
 |TensorConstant{11} [id J]
 |TensorConstant{0} [id K]
 |TensorConstant{1.0} [id L]

So (at least on my Windows machine) pandas_to_array and as_tensor_variable applied to the same np.ndarray column vector produce different graphs. And for some reason the graph structure of the output from pandas_to_array works for the matrix, but not the column vector?!

In #4667 there were also test failures with the error in the Linux containers of the CI pipeline: https://github.com/pymc-devs/pymc3/pull/4667/checks?check_run_id=2431198145#step:7:3718

So the problem can occur on Linux if the graph is produced in a certain way.
Can someone check the above code and post the output produced on other OSes?

@canyon289
Copy link
Member

Ill run this within the next ~8 hours when im back at my my ubuntu desktop if no one beats me to it

@canyon289
Copy link
Member

Here's my versions and what I get

(pymc) canyon@canyon-MS-7A59:~/repos/pymc3$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.1 LTS
Release:	20.04
Codename:	focal

(pymc) canyon@canyon-MS-7A59:~/repos/pymc3$ git log -n 1
commit 45cb4ebf36500e502481bdced6980dd9e630acca (HEAD, origin/v4)
Author: Ricardo <ricardo.vieira1994@gmail.com>
Date:   Fri Apr 16 13:00:40 2021 +0200

(pymc) canyon@canyon-MS-7A59:~/repos/pymc3$ conda list | grep aesara
aesara                    2.0.7                    pypi_0    pypi

Aesara/PyMC Output

(pymc) canyon@canyon-MS-7A59:~/repos/pymc3$ python test.py 


observed = (3, 4) ndarray

normal_rv.1 [id A] 'x1'   
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x7F4D941C2840>) [id B]
 |TensorConstant{[3 4]} [id C]
 |TensorConstant{11} [id D]
 |TensorConstant{0} [id E]
 |TensorConstant{1.0} [id F]


observed = as_tensor_variable( (3, 1) ndarray )

normal_rv.1 [id A] 'x2'   
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x7F4D941C2840>) [id B]
 |Shape [id C] ''   
 | |TensorConstant{[[ 0.51816..50539887]]} [id D]
 |TensorConstant{11} [id E]
 |TensorConstant{0} [id F]
 |TensorConstant{1.0} [id G]


observed = pandas_to_array( (3, 1) ndarray )

normal_rv.1 [id A] 'x3'   
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x7F4D941C2840>) [id B]
 |TensorConstant{[3 1]} [id C]
 |TensorConstant{11} [id D]
 |TensorConstant{0} [id E]
 |TensorConstant{1.0} [id F]
(pymc) canyon@canyon-MS-7A59:~/repos/pymc3$ 

@michaelosthege
Copy link
Member Author

Thanks @canyon289 ,
since you ran it on the v4 branch here's the output I get on v4.
(Above I was on the PR #4667 branch.)

observed = (3, 4) ndarray

normal_rv.1 [id A] 'x1'
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x22C703EFAE8>) [id B]
 |Elemwise{Cast{int64}} [id C] ''
 | |TensorConstant{[3 4]} [id D]
 |TensorConstant{11} [id E]
 |TensorConstant{0} [id F]
 |TensorConstant{1.0} [id G]


observed = as_tensor_variable( (3, 1) ndarray )

normal_rv.1 [id A] 'x2'
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x22C703EFAE8>) [id B]
 |Shape [id C] ''
 | |TensorConstant{[[0.583103..51397476]]} [id D]
 |TensorConstant{11} [id E]
 |TensorConstant{0} [id F]
 |TensorConstant{1.0} [id G]


observed = pandas_to_array( (3, 1) ndarray )

normal_rv.1 [id A] 'x3'
 |RandomStateSharedVariable(<RandomState(MT19937) at 0x22C703EFAE8>) [id B]
 |Elemwise{Cast{int64}} [id C] ''
 | |TensorConstant{[3 1]} [id D]
 |TensorConstant{11} [id E]
 |TensorConstant{0} [id F]
 |TensorConstant{1.0} [id G]

Traceback (most recent call last):
  File "test.py", line 21, in <module>
    model.logp()
  File "E:\Repos\pymc3\pymc3\model.py", line 266, in logp
    return self.model.fn(self.logpt)
  File "E:\Repos\pymc3\pymc3\model.py", line 714, in logpt
    factors += [logpt_sum(obs, obs.tag.observations) for obs in self.observed_RVs]
  File "E:\Repos\pymc3\pymc3\model.py", line 714, in <listcomp>
    factors += [logpt_sum(obs, obs.tag.observations) for obs in self.observed_RVs]
  File "E:\Repos\pymc3\pymc3\distributions\logp.py", line 368, in logpt_sum
    return logpt(*args, sum=True, **kwargs)
  File "E:\Repos\pymc3\pymc3\distributions\logp.py", line 221, in logpt
    initial_replacements=replacements,
  File "E:\Repos\pymc3\pymc3\aesaraf.py", line 354, in rvs_to_value_vars
    return replace_rvs_in_graphs(graphs, transform_replacements, initial_replacements, **kwargs)
  File "E:\Repos\pymc3\pymc3\aesaraf.py", line 302, in replace_rvs_in_graphs
    fg.replace_all(replacements.items(), import_missing=True)
  File "e:\repos\aesara\aesara\graph\fg.py", line 557, in replace_all
    self.replace(var, new_var, **kwargs)
  File "e:\repos\aesara\aesara\graph\fg.py", line 515, in replace
    new_var = var.type.filter_variable(new_var, allow_convert=True)
  File "e:\repos\aesara\aesara\tensor\type.py", line 259, in filter_variable
    f"Cannot convert Type {other.type} "
TypeError: Cannot convert Type TensorType(float64, matrix) (of Variable Rebroadcast{?,0}.0) into Type TensorType(float64, col). You can try to manually convert Rebroadcast{?,0}.0 into a TensorType(float64, col).

@brandonwillard
Copy link
Contributor

brandonwillard commented Apr 29, 2021

This problem is caused by a difference in the default int precision on Windows.

Apparently Windows defaults to int32, which produces a RandomVariable with a size that includes a Cast to int64, and the Cast prevents RandomVariable.compute_bcast from determining the broadcastable dimension in [3, 1]. The resulting output has a broadcast pattern of [False, False], while a size without a Cast would've produced an output with the broadcast pattern [False, True].

This result can be reproduced in Linux as follows:

import aesara.tensor as at
import aesara.tensor.random.basic as ar


size_at = at.cast([3.0, 1.0], "int64")
# or
# size_at = at.cast([at.scalar(), 1.0], "int64")

norm_at = ar.normal(0, 1, size=size_at)
>>> norm_at.type
TensorType(float64, matrix)

RandomVariable.compute_bcast can be updated so that it properly infers the broadcast pattern in this case, but this scenario could still arise—even if only "theoretically" in PyMC3—when size is non-trivial in other ways. See pymc-devs/pytensor#390.

Regardless, computing the exact broadcast pattern is nice, but it's not necessary. To fix this issue, we simply need to add a step in PyMC3 that explicitly changes the broadcast pattern of the observations so that it matches its corresponding RandomVariable output (e.g. using aesara.tensor.patternbroadcast).

twiecki pushed a commit that referenced this issue May 14, 2021
twiecki pushed a commit that referenced this issue May 14, 2021
michaelosthege added a commit that referenced this issue May 14, 2021
(The default value for integers leads to a Cast operation that destroys broadcast information.)
@michaelosthege
Copy link
Member Author

I think this issue might actually be the same as pymc-devs/pytensor#390, because it only arises in situations where a size is taken from a Variable in a way that the cast occurs (either because of OS defaults, or because of how the other variable is casted).

Instead of a workaround that "patches" the broadcast pattern, I think we can fix this issue at the root. See my other comment.

michaelosthege added a commit that referenced this issue May 14, 2021
The new test case triggers the issue independent of the default intX.
michaelosthege added a commit that referenced this issue May 14, 2021
The new test case triggers the issue independent of the default intX.
@brandonwillard
Copy link
Contributor

I think this issue might actually be the same as pymc-devs/aesara#390

See my comment above; it explicitly states the connection between the two.

Instead of a workaround that "patches" the broadcast pattern, I think we can fix this issue at the root. See my other comment.

Likewise, my comment above mentions that determining the broadcastable dimensions exactly is not necessary for a robust solution to this problem. This is especially important given that it may not be reasonable to compute the exact broadcastable dimensions in every case.

In other words, fixing pymc-devs/pytensor#390 is the actual "patch"—no matter how it's done.

Instead, if we want to prevent type mismatches like these in general, we need to make sure that the observed values and their corresponding RandomVariables have the same TensorTypes. That's what's causing this issue, and it can be solved somewhere between the Distribution and Model steps.

brandonwillard added a commit to brandonwillard/pymc that referenced this issue May 15, 2021
brandonwillard added a commit to brandonwillard/pymc that referenced this issue May 15, 2021
brandonwillard added a commit to brandonwillard/pymc that referenced this issue May 15, 2021
@brandonwillard brandonwillard linked a pull request May 15, 2021 that will close this issue
michaelosthege pushed a commit that referenced this issue May 15, 2021
@twiecki
Copy link
Member

twiecki commented May 15, 2021

Closed via #4700.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug v4 winOS windows OS related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants