Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured covariates for non-linear functions #1488

Closed
wds15 opened this issue Apr 27, 2023 · 4 comments
Closed

Structured covariates for non-linear functions #1488

wds15 opened this issue Apr 27, 2023 · 4 comments
Labels
Milestone

Comments

@wds15
Copy link
Contributor

wds15 commented Apr 27, 2023

I am working with Pharmacometrics models which require non-linear functions. So the dosing history of each patient is needed to calculate with a non-linear function the drug concentration in blood at any time-point whenever I have an observation of a patient. A solution is a wide data format, but this is inconvenient as the number of dosing time-points which I need to take into account can get large. Below is an example of what I have in mind. So I would like to use the covariates dose_time and dose_amt as argument to a non-linear function.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

set.seed(346356)

Information on drug dosing is most naturally specified in a long
format. Here is an example for two regimens where we administer 8
units of drug to a patient at two occasions. The time-points are 24
hours apart, but we use a different amount at each time-point.

dosing_long <- tribble(~regimen, ~dose_id, ~dose_time, ~dose_amt,
                       1L, 1L, 0,  4,
                       1L, 2L, 24, 4,
                       2L, 1L, 0,  2,
                       2L, 2L, 24, 6)

To use such information in brms the dosing history must be cast
into a wide format.

dosing_wide <- pivot_wider(dosing_long,
                           id_cols="regimen",
                           names_from="dose_id",
                           values_from=c("dose_time", "dose_amt"))

dosing_wide
#> # A tibble: 2 × 5
#>   regimen dose_time_1 dose_time_2 dose_amt_1 dose_amt_2
#>     <int>       <dbl>       <dbl>      <dbl>      <dbl>
#> 1       1           0          24          4          4
#> 2       2           0          24          2          6

Only in wide format one can merge the dosing history with
observations from a patient. Here is a pseudo data set as example:

fake_wide <- expand_grid(subject=1:4, time=24*c(2, 3, 4)) %>%
    left_join(data.frame(subject=1:4, regimen=sample(1:2, 4, TRUE))) %>%
    left_join(dosing_wide) %>%
    mutate(y=rnorm(n())) %>%
    relocate(subject, y)
#> Joining, by = "subject"
#> Joining, by = "regimen"

modeling this data-set works in brms as we have all dosing
information available in a given row, which is needed to model the
measured data y…

knitr::kable(fake_wide)
subject y time regimen dose_time_1 dose_time_2 dose_amt_1 dose_amt_2
1 -1.4335139 48 2 0 24 2 6
1 -0.2447368 72 2 0 24 2 6
1 0.5337662 96 2 0 24 2 6
2 0.8153970 48 1 0 24 4 4
2 -0.3382850 72 1 0 24 4 4
2 0.0922915 96 1 0 24 4 4
3 0.8734750 48 1 0 24 4 4
3 -2.4113110 72 1 0 24 4 4
3 2.4293361 96 1 0 24 4 4
4 0.6915627 48 1 0 24 4 4
4 -1.6689784 72 1 0 24 4 4
4 1.1158514 96 1 0 24 4 4

… but it is super in-convenient, since we need to deal with an
arbitrary wide format here! More doses mean more columns, etc.
Thus a nested format would be much easier to deal with.

So let’s cast data into a nested format such that the dosing
columns themselve contain data vetors in each entry:

dosing_nested <- dosing_long %>%
    group_by(regimen) %>%
    nest(data=c("dose_id", "dose_time", "dose_amt")) %>%
    unnest_wider("data")

tibble

dosing_nested
#> # A tibble: 2 × 4
#> # Groups:   regimen [2]
#>   regimen     dose_id   dose_time    dose_amt
#>     <int> <list<int>> <list<dbl>> <list<dbl>>
#> 1       1         [2]         [2]         [2]
#> 2       2         [2]         [2]         [2]

having nested data

knitr::kable(dosing_nested)
regimen dose_id dose_time dose_amt
1 1, 2 0, 24 4, 4
2 1, 2 0, 24 2, 6

now the model data set becomes a lot easier

fake_nested <- select(fake_wide, 1:4) %>%
    left_join(dosing_nested)
#> Joining, by = "regimen"

tibble

fake_nested
#> # A tibble: 12 × 7
#>    subject       y  time regimen     dose_id   dose_time    dose_amt
#>      <int>   <dbl> <dbl>   <int> <list<int>> <list<dbl>> <list<dbl>>
#>  1       1 -1.43      48       2         [2]         [2]         [2]
#>  2       1 -0.245     72       2         [2]         [2]         [2]
#>  3       1  0.534     96       2         [2]         [2]         [2]
#>  4       2  0.815     48       1         [2]         [2]         [2]
#>  5       2 -0.338     72       1         [2]         [2]         [2]
#>  6       2  0.0923    96       1         [2]         [2]         [2]
#>  7       3  0.873     48       1         [2]         [2]         [2]
#>  8       3 -2.41      72       1         [2]         [2]         [2]
#>  9       3  2.43      96       1         [2]         [2]         [2]
#> 10       4  0.692     48       1         [2]         [2]         [2]
#> 11       4 -1.67      72       1         [2]         [2]         [2]
#> 12       4  1.12      96       1         [2]         [2]         [2]

having nested data

knitr::kable(fake_nested)
subject y time regimen dose_id dose_time dose_amt
1 -1.4335139 48 2 1, 2 0, 24 2, 6
1 -0.2447368 72 2 1, 2 0, 24 2, 6
1 0.5337662 96 2 1, 2 0, 24 2, 6
2 0.8153970 48 1 1, 2 0, 24 4, 4
2 -0.3382850 72 1 1, 2 0, 24 4, 4
2 0.0922915 96 1 1, 2 0, 24 4, 4
3 0.8734750 48 1 1, 2 0, 24 4, 4
3 -2.4113110 72 1 1, 2 0, 24 4, 4
3 2.4293361 96 1 1, 2 0, 24 4, 4
4 0.6915627 48 1 1, 2 0, 24 4, 4
4 -1.6689784 72 1 1, 2 0, 24 4, 4
4 1.1158514 96 1 1, 2 0, 24 4, 4

What would now be needed in brms is to allow for strucutred columns to be passed as covariates to
non-linear functions in brms.

Created on 2023-04-27 with reprex v2.0.2

... and as a fancy add-on... I think it would actually be nice to also allow the response y in a brms model to come from such a nested thing.... but that's not directly related...

@paul-buerkner paul-buerkner added this to the 2.20.0 milestone Apr 27, 2023
paul-buerkner added a commit that referenced this issue May 17, 2023
@paul-buerkner
Copy link
Owner

This feature is now implemented. Below you can find an example:

set.seed(2134)
N <- 100
dat <- data.frame(y=rnorm(N))
dat$X <- matrix(rnorm(N*2), N, 2)

nlfun_stan <- "
  real nlfun(real a, real b, real c, row_vector X) {
     return a + b * X[1] + c * X[2];
  }
"
nlstanvar <- stanvar(scode = nlfun_stan, block = "functions")

# version for R post processing
nlfun <- function(a, b, c, X) {
  a + b * X[, , 1] + c * X[, , 2]
}

# fit the model
bform <- bf(y~nlfun(a, b, c, X), a~1, b~1, c~1, nl = TRUE)
fit <- brm(bform, dat, stanvars = nlstanvar)
summary(fit)

# fit benchmark model that should yield the same results up to MCMC error
fit2 <- brm(y~X, dat)
summary(fit2)

# post processing works too
str(posterior_epred(fit))
loo(fit, fit2)

@wds15
Copy link
Contributor Author

wds15 commented May 17, 2023

WOW... it works with integers too:

set.seed(2134)
N <- 100
dat <- data.frame(y=rnorm(N))
dat$X <- matrix(as.integer(rnorm(N*2, 0, 10)), N, 2)

nlfun_stan <- "
  real nlfun(real a, real b, real c, int[] X) {
     return a + b * X[1] + c * X[2];
  }
"
nlstanvar <- stanvar(scode = nlfun_stan, block = "functions")

# version for R post processing
nlfun <- function(a, b, c, X) {
  a + b * X[, , 1] + c * X[, , 2]
}

# fit the model
bform <- bf(y~nlfun(a, b, c, X), a~1, b~1, c~1, nl = TRUE)
fit <- brm(bform, dat, stanvars = nlstanvar)
summary(fit)


stancode(fit)

Awesome!

@wds15
Copy link
Contributor Author

wds15 commented May 26, 2023

FYI... the expose_functions(model, vectorize=TRUE) does not work any longer with these structured covariates...new issue?

@paul-buerkner
Copy link
Owner

paul-buerkner commented May 26, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants