You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, many thanks for the R package "oolong"! I have a brief question about the description of the p-values in the Overview vignette. It says that for the Word Intrusion Test the null hypothesis "H0: MP is not better than 1/ n_top_terms" is assumed. However, random guessing would correspond to "1/(number of candidates) = 1/(n_top_terms + 1)" (because of an intruder), if I understand it correctly. Am I misunderstanding something here?
In addition, I would have another question regarding the number of topics to be evaluated for the word intrusion test. I read in another issue that it is critical not to evaluate all topics of the model in the Word Intrusion Test, but in my specific use case it was necessary because I was only interested in the seeded topics of keyATM and seededLDA. I made a change to the code locally for this purpose. However, I am unsure whether I also need to make changes for the statistical tests, i.e. do they use the number of questions or the number of topics of the model as the number of attempts "n"? If you could help me here, I would be very grateful!
The text was updated successfully, but these errors were encountered:
First of all, many thanks for the R package "oolong"! I have a brief question about the description of the p-values in the Overview vignette. It says that for the Word Intrusion Test the null hypothesis "H0: MP is not better than 1/ n_top_terms" is assumed. However, random guessing would correspond to "1/(number of candidates) = 1/(n_top_terms + 1)" (because of an intruder), if I understand it correctly. Am I misunderstanding something here?
In addition, I would have another question regarding the number of topics to be evaluated for the word intrusion test. I read in another issue that it is critical not to evaluate all topics of the model in the Word Intrusion Test, but in my specific use case it was necessary because I was only interested in the seeded topics of keyATM and seededLDA. I made a change to the code locally for this purpose. However, I am unsure whether I also need to make changes for the statistical tests, i.e. do they use the number of questions or the number of topics of the model as the number of attempts "n"? If you could help me here, I would be very grateful!
The text was updated successfully, but these errors were encountered: