-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add locality calibration and text generation #120
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have a variety of components available in the core library, so I'm generally fine with adding locality-based calibration and context generation here, but as written I think this crosses the line where it makes more sense for this code to live in a fork of the repo. In particular, it's fine if you want to use openai
in your experiment, but I'm very hesitant to add it as a general dependency of this library that everyone will need to install in order to build recommenders. I'm also a bit concerned about making calls out to an external service—it might work fine but that makes the recommender vulnerable to upstream issues with the OpenAI API that are outside our control.
I can see two potential paths forward, depending on how important OpenAI is for what you want to do:
- Use a local text generation model from Hugging Face (since we're already depending on
transformers
) - Move this code to a fork of the repo and invest some time in making the platform more resilient to timeouts and failures
Thoughts on which seems more workable?
self.add_article_to_localities(rec_localities_with_candidate, article) | ||
return normalized_category_count(rec_localities_with_candidate) | ||
|
||
def calibration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding this method to calibrate across the two specific dimensions you care about for this study is an expedient way to get where you want to go and probably makes sense here. Outside the scope of this PR and not a requested change to this code, but we should also think about how to make the default implementation in the base class flexible enough to handle an arbitrary number of dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A rough idea—in general, the calculation method for calibration in this function should be similar regardless of the number of dimensions (i.e., preference distribution and defined theta). So, would it be possible to make the preference and theta inputs into a one-to-one matched list (i.e. [topic preference, locality preference], [topic theta, locality theta])? Then we could enumerate the calibration terms in the list🤔
def add_article_to_localities(self, rec_localities, article): | ||
localities = extract_locality(article) | ||
for local in localities: | ||
rec_localities[local] = rec_localities.get(local, 0) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods that modify their parameters should work fine but are a little awkward compared to the methods below that return a value. Maybe the two methods above could also copy the parameter, make modifications, and return the modified copy?
return gpt_generate(system_prompt, input_prompt) | ||
|
||
|
||
model = SentenceTransformer("all-MiniLM-L6-v2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense that you'd need to embed the subheads separately here since the article embedder only does the headlines right now. I'd probably use the same language model for this that the NRMS model is based on, which I think is distilbert-base-uncased
Also worth thinking about whether the article embedder should be configurable to specify which article text fields (headline, subhead, body text) to embed. You can do it here but this probably isn't the last time we're going to run into this, so might make sense to move it upstream and make it re-usable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m thinking, for handling headlines, should we use the original distilbert transformer, or can we easily extract embeddings from our fine-tuned distilbert on the MIND data? I agree that making this part more flexible for selecting configurations and text fields would be beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it should be possible to use the fine-tuned one by loading the fine-tuned weights into a plain distilbert model
recommended_list = text_generation(candidate_articles.articles, article_indices, interest_profile) | ||
return ArticleSet(articles=recommended_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a whole separate component to me, since the text generation gets applied to every selected article and isn't otherwise coupled to the calibration code. I'd probably use the article_indices
and candidate_articles
to build and return an ArticleSet
here:
recommended_list = text_generation(candidate_articles.articles, article_indices, interest_profile) | |
return ArticleSet(articles=recommended_list) | |
return ArticleSet(articles=[candidate_articles.articles[idx] for idx in article_indices]) |
and then wire that output into a new component (called something like ContextGenerator
?) that does the text generation part. More on this in the text_generation
code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a challenge I was referring to a few weeks ago over Slack. Text generation shouldn't be applied to every article, so we'd need to set a flag of sorts if we want to pull out this functionality to another component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you can set whatever properties you want on ArticleSet
, so you could add a list or tensor with boolean flags to indicate whether you want it applied or not and that'll get sent along to subsequent components. I might be missing something but I don't see where in the current code the equivalent selection is happening—it appears to be applying LLM generated text to all recommended articles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have moved text_generation
out of the __call__
method.
return top_k_indices | ||
|
||
|
||
def text_generation(candidate_articles, article_indices, interest_profile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like candidate_articles
and article_indices
are used here to reconstruct the equivalent of an ArticleSet
of the selected articles. If this turns into a separate component, then the __call__()
method could receive the same information using an ArticleSet
and an InterestProfile
as inputs.
@@ -3,8 +3,9 @@ | |||
|
|||
from poprox_concepts import ArticleSet | |||
from poprox_concepts.api.recommendations import RecommendationRequest, RecommendationResponse | |||
from poprox_recommender.components.diversifiers.locality_calibration import user_interest_generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handler shouldn't know anything about individual recommenders, so importing code from this recommender-specific component is a clue that the code below is probably in the wrong place. Could these changes live somewhere in the locality_calibration
module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved tolocality_calibration
module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, could we modify the code in the handler.py in our forked repository?
The tests are failing due to the |
I agree with @karlhigley's comments on dependencies — let's keep everything in the main recommenders repo self-contained (no external services). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall! I think we can determine the number of clicked articles (k) after doing some internal testing.
For cold start implementation, I don't see the current need for that (assuming we can exclude inactive users in the pre-experiment observation window);
As for the clicked article sorting, I think it'll be useful to apply it to count for user memory decay (i.e. a clicked article last week would be more salient to users than one from 2 months ago). Maybe we can fuse the chronological factor into embedding similarity computation so there's another layer to consider for candidate article selection other than content embeddings?
Final note -- regarding Karl and Michael's comments, I think maybe it'll be a better idea if we migrate this PR to our forked repo?
similarities = cosine_similarity([target_embedding], candidate_embeddings)[0] | ||
top_k_indices = np.argsort(similarities)[-k:][::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I'm thinking if we can fuse chronological order into candidate selection
Description:
This PR introduces the basic framework for integrating locality calibration and text generation into the recommender system for the locality experiment. The main components are as follows:
Locality Calibration: Calibrates recommendations based on today’s local news distribution to align with relevant geographical content.
Text Generation: Generates personalized narratives for recommended articles by referencing similar articles previously read by the user.
To-Do: