Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added initial evals #8

Merged
merged 5 commits into from
Sep 12, 2023
Merged

added initial evals #8

merged 5 commits into from
Sep 12, 2023

Conversation

shamanez
Copy link
Member

@shamanez shamanez commented Sep 10, 2023

** not completed**

@Jacobsolawetz
Copy link
Contributor

😍


def forward(self, task, model, input_ids, attention_mask):

if task == "retrieval":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskType

# the number of bi-directional links created for every new element during
# construction. Reasonable range for M is 2-100. Higher M work better on
# datasets with high intrinsic dimensionality and/or high recall,
# while low M work better for datasets with low intrinsic dimensionality and/or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you measure "intrinsic dimensionality"? I'm curious both qualitatively and quantitatively

# low recalls

# Initializing index - the maximum number of elements should be known beforehand
search_index.init_index(max_elements=num_elements, ef_construction=200, M=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's figure out how to make these values dynamic for a given dataset. Even something as simple as building a heuristic around tf-idf

@@ -0,0 +1,120 @@
import numpy as np
import hnswlib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason we use this over annoy, faiss, or even lancedb?

Is this index out-of-core or is it stored in-memory?

k, search_index, query_embeddings, ids_to_cat_dict, threshold=0.7
):
# Controlling the recall by setting ef:
search_index.set_ef(100) # ef should always be > k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make k and ef global variables then so we can assure that ef >> k

).detach().float().cpu()

start_index = step * args.test_batch_size
end_index = start_index + args.test_batch_size if (start_index + args.test_batch_size) < num_passages else num_passages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
end_index = start_index + args.test_batch_size if (start_index + args.test_batch_size) < num_passages else num_passages
end_index = min(start_index + args.test_batch_size, num_passaged)

)
search_results = get_nearest_neighbours(args.top_k, passage_search_index, query_embeddings, passage_to_id_dict, threshold=0.0)

retrieved_cats = [item[0] for item in search_results]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

little easier to understand

Suggested change
retrieved_cats = [item[0] for item in search_results]
retrieved_cats = [cat for cat, _ in search_results]

Comment on lines +238 to +239
query:text , query:embeddings # you need to take both of things in to the account
passage:text , query:embeddings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these lines doing? What is this syntax?

for seq in sequences:
print(f"Result: {seq['generated_text']}")

# use regex and see whether the answer is the
Copy link
Contributor

@Ben-Epstein Ben-Epstein Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we eval the generation? gpt-4? Check for hallucinations? Bleu/rouge against original answer?

@shamanez shamanez merged commit bd591e4 into main Sep 12, 2023
@shamanez shamanez deleted the revisit-evals branch September 12, 2023 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants