Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 issues about the range of documents when computing cross-document attention and the size of sentenceTransformer's embedding u_k/v_k and sentential encoding e #6

Open
xxr5566833 opened this issue May 23, 2021 · 0 comments

Comments

@xxr5566833
Copy link

xxr5566833 commented May 23, 2021

  1. preprocess.py

when use SentenceTransformer's pretrained model to encode document(title + abstract), the document's collection is determined by the "files_path" variable in preprocess.py.

Why you annotate "data/keyphrase/json/kp20k/kp20k_train.json"(add # at the begin of this line) ?

I think kp20k_train.json's documents should be included when computing the cross-document attention just as your paper shows.

  1. the size of e and u_k/v_k

I change the sentenceTransformer model so I have a different size of u_k/v_k, should the size of word_vec_size is determined by the sentenceTransformer's model's embedding size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant