Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use document retriever component only? #4

Open
serenayj opened this issue Jun 23, 2022 · 3 comments
Open

Can I use document retriever component only? #4

serenayj opened this issue Jun 23, 2022 · 3 comments

Comments

@serenayj
Copy link

Hi,

Congrats on finishing such nice work! I would like to test my encoder (document reader) and want to use the IR document retriever component only. Could you tell me where I could find this part of the codes and how to do it? Thank you in advance!

@jind11
Copy link
Owner

jind11 commented Jun 29, 2022

I am sorry for the late reply. Thanks for reaching out to me! This code base provides the elastic search based IR baseline and you can follow the readme file to implement it. Specifically for the text (sentence or paragraph) retrieval, you can refer to this file: https://github.com/jind11/MedQA/blob/master/IR/aristomini/solvers/textsearch.py

@serenayj
Copy link
Author

Hi,

Thanks for answering my question!

A following question I have is: in your paper where you describe the fine-tuning pre-training BERT models, you mentioned that :
Specifically, we construct the input sequence by concatenating [CLS], tokens in c, [SEP], tokens in qai, [SEP], where [CLS] and [SEP] are the classifier token and sentence separator in a pre-trained language model, respectively
My understanding is that context c is a concatenation of all textbooks. Wouldn't that exceed the BERT token limit if you concatenate both questions, answers, and the context c ?

@jind11
Copy link
Owner

jind11 commented Jul 12, 2022

The c here should be the top-K retrieved sentences/paragraphs in the textbooks so that we do not need to concatenate all textbooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants