Add information about the inputs of the models #192

mandel · 2020-09-17T15:41:33Z

Thank you for your nice project!

For some posteriors, the data contains more entries than what is expected for the model. For example in kidiq-kidscore_momiq:

the model (kidscore_momiq) expects N, kid_score, and mom_iq
but the data (kidiq) also provides mom_hs, mom_hs_new, and mom_iq_new.

In order to pass automatically the correct inputs for the inference, I find useful to have information about the expected inputs of the model.

In this pull request, I have added for each Stan model in the model.info.json file an additional field with the information about the expected data. For example, on the kidscore_momiq model, now there is the following field:

  "inputs": { "N": { "type": "int", "dimensions": 0},
              "kid_score": { "type": "real", "dimensions": 1},
              "mom_iq": { "type": "real", "dimensions": 1} },

I also need the expected type to convert the data into the right format. So, I have associated this information to each input.

Thank you for considering this PR. I am happy to change the format if you want.

Best

MansMeg · 2020-09-17T15:50:40Z

I think this is excellent! I have been thinking about doing a similar thing to the documentation but also enable som descriptions of each input (in addition to the stuff you propose).

It seems like you have a script to generate these things from the stan code? Is it possible to extract this from the Stan code?

The two problems I see is that we, if we add this, need to check that the documentation is consistent with the stan code, that why I ask if there is an automated script behind to extract this.

@paul-buerkner and @avehtari do you have any thoughts on this?

mandel · 2020-09-17T16:54:40Z

Thanks @MansMeg.

Yes, this information was extracted automatically. I did a small hack in the Stan compiler to add this field to the json. I can share the code (but for now, it is really a hack).

MansMeg · 2020-09-17T17:06:27Z

Great!

I’m wondering how to include this then.

I guess the description of the variables should probably be on the dataset side. But probably with the same structure. So I guess this is probably a good way to do it.

The name input make sense. Then we can add that as a slot with default value NULL and see if we can find a way to update this automatically further down the line.

But lets see if @vehtari or @paul-buerkner has any thoughts befor I merge.

mandel · 2020-09-17T17:12:11Z

The code which updates the *.info.json files is in the info branch of the fork of the Stan compiler which is on my github account (https://github.com/mandel/stanc3/tree/info). It makes the following assumptions:

For a model file m.stan is going to update the info file ../m.info.json.
It adds the inputs field before the model_implementations field.

It has been developed as a one time thing. I could cleanup the code if you think that is can be useful. I can also compute other information on the model if you want.

MansMeg · 2020-09-17T18:46:54Z

I think both those assumptions are reasonable and should work well.

I think in the long run we would like to be able to reuse that script. The question is just where to put it.

MansMeg · 2020-09-21T13:59:52Z

Hi again!

So now we have discussed and since this information is already available in the data block, we think it is better to have a script to access it rather than the actual results. In this way, we would not store any information that could fall out of sync (due to changes in models).

Hence it would be great if you could contribute the script, rather than the JSON information - if that would be ok for you?

mandel · 2020-09-22T13:26:51Z

HI,

I understand your point of view.

I am happy to cleanup the code such that it emits the information. But I don't know how to contribute this code (it is a fork of the Stan compiler). Do you have anything in mind?

MansMeg · 2020-09-28T12:21:53Z

Hi. I close this PR and continue this discussion as an issue instead.

I don't have any quick solution. Maybe the simplest is that we just refer to your repo if you have a README specifying how this can be done in a simple way?

Add information about the inputs of the models

5313a21

MansMeg closed this Sep 28, 2020

MansMeg mentioned this pull request Sep 28, 2020

Extract model inputs directly from the compiler #198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add information about the inputs of the models #192

Add information about the inputs of the models #192

mandel commented Sep 17, 2020

MansMeg commented Sep 17, 2020

mandel commented Sep 17, 2020

MansMeg commented Sep 17, 2020

mandel commented Sep 17, 2020

MansMeg commented Sep 17, 2020

MansMeg commented Sep 21, 2020

mandel commented Sep 22, 2020

MansMeg commented Sep 28, 2020

Add information about the inputs of the models #192

Add information about the inputs of the models #192

Conversation

mandel commented Sep 17, 2020

MansMeg commented Sep 17, 2020

mandel commented Sep 17, 2020

MansMeg commented Sep 17, 2020

mandel commented Sep 17, 2020

MansMeg commented Sep 17, 2020

MansMeg commented Sep 21, 2020

mandel commented Sep 22, 2020

MansMeg commented Sep 28, 2020