Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Description for the p-values #76

Closed
Jenny060399 opened this issue Oct 3, 2023 · 1 comment
Closed

Description for the p-values #76

Jenny060399 opened this issue Oct 3, 2023 · 1 comment

Comments

@Jenny060399
Copy link

First of all, many thanks for the R package "oolong"! I have a brief question about the description of the p-values in the Overview vignette. It says that for the Word Intrusion Test the null hypothesis "H0: MP is not better than 1/ n_top_terms" is assumed. However, random guessing would correspond to "1/(number of candidates) = 1/(n_top_terms + 1)" (because of an intruder), if I understand it correctly. Am I misunderstanding something here?

In addition, I would have another question regarding the number of topics to be evaluated for the word intrusion test. I read in another issue that it is critical not to evaluate all topics of the model in the Word Intrusion Test, but in my specific use case it was necessary because I was only interested in the seeded topics of keyATM and seededLDA. I made a change to the code locally for this purpose. However, I am unsure whether I also need to make changes for the statistical tests, i.e. do they use the number of questions or the number of topics of the model as the number of attempts "n"? If you could help me here, I would be very grateful!

@chainsawriot
Copy link
Collaborator

@Jenny060399

You are right that the null hypothesis is the number of choices, i.e. n_top_terms + 1.

https://github.com/chainsawriot/oolong/blob/55bb0173b005288e635c937586d26bfa9b7e2bad/R/oolong_summary_tm.R#L43-L59

I will update the overview accordingly.

Statistically, reducing the K (for whatever reason) should not affect the p-value.

chainsawriot added a commit that referenced this issue Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants