414 long running time of sample posterior predictive and eventual death by oom #436

AlexanderFengler · 2024-05-20T02:56:13Z

posterior predictive now has a safe_mode that chunks computations
the n_samples argument was renamed to draws, and one can pass None | int | list | np.ndarray
when running posterior predictive with kind='mean', the posterior naming cleans up rt,response_mean --> v
prior predictives get assigned to .traces now, and naming is also cleaned up
our sample_prior_predictive() will include the parent parameter as well now via an internal call to .predict()

…-hddm Merging main.

…d to _mean prediction consistently

AlexanderFengler · 2024-05-20T02:57:33Z

I will try to add a few more tests to this before merging.

src/hssm/hssm.py

digicosmos86

Looks good! Two higher-level comments:

Since the only call to simulator is done here:

HSSM/src/hssm/distribution_utils/dist.py

Lines 309 to 315 in 19f786d

    
           sim_out = simulator( 
        
               theta=theta, 
        
               model=model_name, 
        
               n_samples=n_samples, 
        
               random_state=seed, 
        
               **kwargs, 
        
           )

, maybe we can use a for-loop here over n_samples to make the sampling safe, instead of patching the higher-level functions themselves? This way we can avoid running many intermediate-level code multiple times.

InferenceData object does not come with attributes like posterior, or posterior_predictive by default, so type checker complains. The use of the square bracket notation is preferred. Or if this is too annoying we can disable this check (attr-defined) globally in pyproject.toml mypy section, but that can be a bit risky

src/hssm/hssm.py

digicosmos86 · 2024-05-20T13:31:12Z

src/hssm/hssm.py

+
+        if "posterior_predictive" in idata.groups():
+            del idata.posterior_predictive
+            print("pre-existing posterior_predictive group deleted from idata. \n")


This should be a warning

src/hssm/hssm.py

src/hssm/utils.py

src/hssm/hssm.py

digicosmos86

Looks awesome! Just some style suggestions at this point. Feel free to merge after the fixes :)

digicosmos86 · 2024-05-21T13:54:02Z

src/hssm/hssm.py

+        if "posterior_predictive" in idata.groups():
+            if idata is not None:


Should the order be the other way around?

@digicosmos86 changed. This was useless to begin with, just an artifact appeasing mypy...

digicosmos86 · 2024-05-22T17:57:15Z

src/hssm/hssm.py

@@ -10,7 +10,7 @@
 from copy import deepcopy
 from inspect import isclass
 from os import PathLike
-from typing import Any, Callable, Literal
+from typing import Any, Callable, Literal, Union


We don't use Union any more. Now that we have Python 3.10, we use the | operator instead

digicosmos86 · 2024-05-22T17:58:24Z

src/hssm/hssm.py

@@ -404,6 +404,7 @@ def __init__(
            self.model, self._parent_param, self.response_c, self.response_str
        )
        self.set_alias(self._aliases)
+        # _logger.info(self.pymc_model.initial_point())


Should we remove debug comments?

Stylistically eventually yes, but rn, I think it can sometimes still help future PRs that interact with this code. Here I literally have the next PR that I need to work on in mind. So in general agree, but let's skip here :)

digicosmos86 · 2024-05-22T17:58:33Z

src/hssm/hssm.py


        # The parent was previously not part of deterministics --> compute it via
        # posterior_predictive (works because it acts as the 'mu' parameter
        # in the GLM as far as bambi is concerned)
        if self._inference_obj is not None:
            if self._parent not in self._inference_obj.posterior.data_vars.keys():
-                self.model.predict(self._inference_obj, kind="mean", inplace=True)
+                # self.model.predict(self._inference_obj, kind="mean", inplace=True)


Should we remove debug comments?

digicosmos86 · 2024-05-22T18:01:15Z

src/hssm/hssm.py

+                self._parent in self._inference_obj.posterior.data_vars.keys()
+                and "rt,response_mean" in self._inference_obj.posterior.data_vars.keys()


data_vars are dicts, so the Python 3 style is to not use keys()

digicosmos86 · 2024-05-22T18:05:14Z

src/hssm/hssm.py

+            and not np.allclose(draws, idata["posterior"].draw.values)
+        ):
+            # Reassign posterior to sub-sampled version
+            setattr(idata_copy, "posterior", idata["posterior"].isel(draw=draws))


Are there any differences between setattr() and idata.add_groups()?

to be honest I don't know... let me look into that independently to understand it properly.

Actually, at least used somewhat semantically here, add_groups is about new groups, setattr is about reassigning to pre-existing group.

digicosmos86 · 2024-05-22T18:07:58Z

src/hssm/hssm.py

+            if safe_mode:
+                # safe mode splits the draws into chunks of 10 to avoid
+                # memory issues (TODO: Figure out the source of memory issues)
+                split_draws = _split_array(
+                    idata_copy["posterior"].draw.values, divisor=10
+                )
+
+                posterior_predictive_list = []
+                for samples_tmp in split_draws:
+                    tmp_posterior = idata["posterior"].sel(draw=samples_tmp)
+                    setattr(idata_copy, "posterior", tmp_posterior)
+                    self.model.predict(
+                        idata_copy, kind, data, True, include_group_specific
+                    )
+                    posterior_predictive_list.append(idata_copy["posterior_predictive"])
+
+                if inplace:
+                    idata.add_groups(
+                        posterior_predictive=xr.concat(
+                            posterior_predictive_list, dim="draw"
+                        )
+                    )
+                    # for inplace, we don't return anything
+                    return None
+                else:
+                    # Reassign original posterior to idata_copy
+                    setattr(idata_copy, "posterior", idata["posterior"])
+                    # Add new posterior predictive group to idata_copy
+                    del idata_copy["posterior_predictive"]
+                    idata_copy.add_groups(
+                        posterior_predictive=xr.concat(
+                            posterior_predictive_list, dim="draw"
+                        )
+                    )
+                    return idata_copy
+            elif inplace:
+                # If not safe-mode
+                # We call .predict() directly without any
+                # chunking of data.
+
+                # .predict() is called on the copy of idata
+                # since we still subsampled (or assigned) the draws
                self.model.predict(idata_copy, kind, data, True, include_group_specific)
+
+                # posterior predictive group added to idata
                idata.add_groups(
                    posterior_predictive=idata_copy["posterior_predictive"]
                )
-
+                # don't return anything if inplace
                return None
-
+            else:
+                # Not safe mode and not inplace
+                # Function acts as very thin wrapper around
+                # .predict(). It just operates on the
+                # idata_copy object
+                return self.model.predict(
+                    idata_copy, kind, data, inplace, include_group_specific
+                )


This if block looks slightly confusing. I think I understand what you mean, but would

if safe_mode: if inplace: ... else: ... else: if inplace: ... else: ...

be more readable?

digicosmos86 · 2024-05-22T18:08:45Z

src/hssm/hssm.py

            return self.model.predict(
-                idata_copy, kind, data, False, include_group_specific
+                idata, kind, data, inplace, include_group_specific
            )



Add an else clause here to throw an error whenever other values are specified?

digicosmos86 · 2024-05-22T18:15:01Z

src/hssm/hssm.py

@@ -1353,6 +1477,35 @@ def _get_deterministic_var_names(self, idata) -> list[str]:

        return var_names

+    def _drop_parent_str_from_idata(
+        self, idata: Union[az.InferenceData, None]


Suggested change

self, idata: Union[az.InferenceData, None]

self, idata: az.InferenceData | None

AlexanderFengler added 3 commits May 19, 2024 17:36

wip

70c57f2

Merge branch 'main' into 388-change-slice-sampler-parameters-to-match…

927c3b3

…-hddm Merging main.

prior predictive extends idata now and parent parameters gets assigne…

aee9d2e

…d to _mean prediction consistently

AlexanderFengler requested a review from digicosmos86 May 20, 2024 02:56

AlexanderFengler linked an issue May 20, 2024 that may be closed by this pull request

Long running time of sample_posterior_predictive() and eventual death by OOM #414

Closed

jainraj reviewed May 20, 2024

View reviewed changes

src/hssm/hssm.py Show resolved Hide resolved

digicosmos86 requested changes May 20, 2024

View reviewed changes

AlexanderFengler added 2 commits May 20, 2024 22:10

add tests and address final comments

cef7726

fix return type split_array

c290409

AlexanderFengler requested a review from digicosmos86 May 21, 2024 02:14

AlexanderFengler added 3 commits May 21, 2024 22:57

fix tests

2a06b14

drop logging initial point

85321ee

tim

07d1140

digicosmos86 approved these changes May 22, 2024

View reviewed changes

AlexanderFengler added 2 commits May 22, 2024 17:28

one more round of comments

3094391

add few clarifying comments

c965923

AlexanderFengler merged commit 80c5248 into main May 23, 2024
2 checks passed

digicosmos86 deleted the 414-long-running-time-of-sample_posterior_predictive-and-eventual-death-by-oom branch November 28, 2024 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

414 long running time of sample posterior predictive and eventual death by oom #436

414 long running time of sample posterior predictive and eventual death by oom #436

AlexanderFengler commented May 20, 2024

AlexanderFengler commented May 20, 2024

digicosmos86 left a comment

digicosmos86 May 20, 2024

AlexanderFengler May 21, 2024

digicosmos86 left a comment

digicosmos86 May 21, 2024

AlexanderFengler May 22, 2024

digicosmos86 May 22, 2024

AlexanderFengler May 22, 2024

digicosmos86 May 22, 2024

AlexanderFengler May 22, 2024

digicosmos86 May 22, 2024

digicosmos86 May 22, 2024

AlexanderFengler May 22, 2024

digicosmos86 May 22, 2024

AlexanderFengler May 22, 2024

AlexanderFengler May 22, 2024

digicosmos86 May 22, 2024

AlexanderFengler May 22, 2024

digicosmos86 May 22, 2024

digicosmos86 May 22, 2024

AlexanderFengler May 22, 2024

	sim_out = simulator(
	theta=theta,
	model=model_name,
	n_samples=n_samples,
	random_state=seed,
	**kwargs,
	)

		if "posterior_predictive" in idata.groups():
		if idata is not None:

		self._parent in self._inference_obj.posterior.data_vars.keys()
		and "rt,response_mean" in self._inference_obj.posterior.data_vars.keys()

	self, idata: Union[az.InferenceData, None]
	self, idata: az.InferenceData \| None

414 long running time of sample posterior predictive and eventual death by oom #436

414 long running time of sample posterior predictive and eventual death by oom #436

Conversation

AlexanderFengler commented May 20, 2024

AlexanderFengler commented May 20, 2024

digicosmos86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digicosmos86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment