-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added initial evals #8
Conversation
😍 |
|
||
def forward(self, task, model, input_ids, attention_mask): | ||
|
||
if task == "retrieval": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TaskType
# the number of bi-directional links created for every new element during | ||
# construction. Reasonable range for M is 2-100. Higher M work better on | ||
# datasets with high intrinsic dimensionality and/or high recall, | ||
# while low M work better for datasets with low intrinsic dimensionality and/or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you measure "intrinsic dimensionality"? I'm curious both qualitatively and quantitatively
# low recalls | ||
|
||
# Initializing index - the maximum number of elements should be known beforehand | ||
search_index.init_index(max_elements=num_elements, ef_construction=200, M=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's figure out how to make these values dynamic for a given dataset. Even something as simple as building a heuristic around tf-idf
@@ -0,0 +1,120 @@ | |||
import numpy as np | |||
import hnswlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason we use this over annoy, faiss, or even lancedb?
Is this index out-of-core or is it stored in-memory?
k, search_index, query_embeddings, ids_to_cat_dict, threshold=0.7 | ||
): | ||
# Controlling the recall by setting ef: | ||
search_index.set_ef(100) # ef should always be > k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make k
and ef
global variables then so we can assure that ef >> k
).detach().float().cpu() | ||
|
||
start_index = step * args.test_batch_size | ||
end_index = start_index + args.test_batch_size if (start_index + args.test_batch_size) < num_passages else num_passages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end_index = start_index + args.test_batch_size if (start_index + args.test_batch_size) < num_passages else num_passages | |
end_index = min(start_index + args.test_batch_size, num_passaged) |
) | ||
search_results = get_nearest_neighbours(args.top_k, passage_search_index, query_embeddings, passage_to_id_dict, threshold=0.0) | ||
|
||
retrieved_cats = [item[0] for item in search_results] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
little easier to understand
retrieved_cats = [item[0] for item in search_results] | |
retrieved_cats = [cat for cat, _ in search_results] |
query:text , query:embeddings # you need to take both of things in to the account | ||
passage:text , query:embeddings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these lines doing? What is this syntax?
for seq in sequences: | ||
print(f"Result: {seq['generated_text']}") | ||
|
||
# use regex and see whether the answer is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we eval the generation? gpt-4? Check for hallucinations? Bleu/rouge against original answer?
** not completed**