Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information about the inputs of the models #192

Closed
wants to merge 1 commit into from

Conversation

mandel
Copy link

@mandel mandel commented Sep 17, 2020

Thank you for your nice project!

For some posteriors, the data contains more entries than what is expected for the model. For example in kidiq-kidscore_momiq:

  • the model (kidscore_momiq) expects N, kid_score, and mom_iq
  • but the data (kidiq) also provides mom_hs, mom_hs_new, and mom_iq_new.

In order to pass automatically the correct inputs for the inference, I find useful to have information about the expected inputs of the model.

In this pull request, I have added for each Stan model in the model.info.json file an additional field with the information about the expected data. For example, on the kidscore_momiq model, now there is the following field:

  "inputs": { "N": { "type": "int", "dimensions": 0},
              "kid_score": { "type": "real", "dimensions": 1},
              "mom_iq": { "type": "real", "dimensions": 1} },

I also need the expected type to convert the data into the right format. So, I have associated this information to each input.

Thank you for considering this PR. I am happy to change the format if you want.

Best

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 17, 2020

I think this is excellent! I have been thinking about doing a similar thing to the documentation but also enable som descriptions of each input (in addition to the stuff you propose).

It seems like you have a script to generate these things from the stan code? Is it possible to extract this from the Stan code?

The two problems I see is that we, if we add this, need to check that the documentation is consistent with the stan code, that why I ask if there is an automated script behind to extract this.

@paul-buerkner and @avehtari do you have any thoughts on this?

@mandel
Copy link
Author

mandel commented Sep 17, 2020

Thanks @MansMeg.

Yes, this information was extracted automatically. I did a small hack in the Stan compiler to add this field to the json. I can share the code (but for now, it is really a hack).

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 17, 2020

Great!

I’m wondering how to include this then.

I guess the description of the variables should probably be on the dataset side. But probably with the same structure. So I guess this is probably a good way to do it.

The name input make sense. Then we can add that as a slot with default value NULL and see if we can find a way to update this automatically further down the line.

But lets see if @vehtari or @paul-buerkner has any thoughts befor I merge.

@mandel
Copy link
Author

mandel commented Sep 17, 2020

The code which updates the *.info.json files is in the info branch of the fork of the Stan compiler which is on my github account (https://github.com/mandel/stanc3/tree/info). It makes the following assumptions:

  • For a model file m.stan is going to update the info file ../m.info.json.
  • It adds the inputs field before the model_implementations field.

It has been developed as a one time thing. I could cleanup the code if you think that is can be useful. I can also compute other information on the model if you want.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 17, 2020

I think both those assumptions are reasonable and should work well.

I think in the long run we would like to be able to reuse that script. The question is just where to put it.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 21, 2020

Hi again!

So now we have discussed and since this information is already available in the data block, we think it is better to have a script to access it rather than the actual results. In this way, we would not store any information that could fall out of sync (due to changes in models).

Hence it would be great if you could contribute the script, rather than the JSON information - if that would be ok for you?

@mandel
Copy link
Author

mandel commented Sep 22, 2020

HI,

I understand your point of view.

I am happy to cleanup the code such that it emits the information. But I don't know how to contribute this code (it is a fork of the Stan compiler). Do you have anything in mind?

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 28, 2020

Hi. I close this PR and continue this discussion as an issue instead.

I don't have any quick solution. Maybe the simplest is that we just refer to your repo if you have a README specifying how this can be done in a simple way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants