Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add locality calibration and text generation #120

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

imrecommender
Copy link
Contributor

Description:

This PR introduces the basic framework for integrating locality calibration and text generation into the recommender system for the locality experiment. The main components are as follows:

Locality Calibration: Calibrates recommendations based on today’s local news distribution to align with relevant geographical content.
Text Generation: Generates personalized narratives for recommended articles by referencing similar articles previously read by the user.

To-Do:

  1. Determine the optimal number of clicked articles to generate each narrative.
  2. Decide whether to implement a cold start handling mechanism for narrative generation.
  3. Confirm if clicked articles should be sorted chronologically forward or backward for narrative accuracy.

Copy link
Collaborator

@karlhigley karlhigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have a variety of components available in the core library, so I'm generally fine with adding locality-based calibration and context generation here, but as written I think this crosses the line where it makes more sense for this code to live in a fork of the repo. In particular, it's fine if you want to use openai in your experiment, but I'm very hesitant to add it as a general dependency of this library that everyone will need to install in order to build recommenders. I'm also a bit concerned about making calls out to an external service—it might work fine but that makes the recommender vulnerable to upstream issues with the OpenAI API that are outside our control.

I can see two potential paths forward, depending on how important OpenAI is for what you want to do:

  • Use a local text generation model from Hugging Face (since we're already depending on transformers)
  • Move this code to a fork of the repo and invest some time in making the platform more resilient to timeouts and failures

Thoughts on which seems more workable?

self.add_article_to_localities(rec_localities_with_candidate, article)
return normalized_category_count(rec_localities_with_candidate)

def calibration(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding this method to calibrate across the two specific dimensions you care about for this study is an expedient way to get where you want to go and probably makes sense here. Outside the scope of this PR and not a requested change to this code, but we should also think about how to make the default implementation in the base class flexible enough to handle an arbitrary number of dimensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A rough idea—in general, the calculation method for calibration in this function should be similar regardless of the number of dimensions (i.e., preference distribution and defined theta). So, would it be possible to make the preference and theta inputs into a one-to-one matched list (i.e. [topic preference, locality preference], [topic theta, locality theta])? Then we could enumerate the calibration terms in the list🤔

def add_article_to_localities(self, rec_localities, article):
localities = extract_locality(article)
for local in localities:
rec_localities[local] = rec_localities.get(local, 0) + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods that modify their parameters should work fine but are a little awkward compared to the methods below that return a value. Maybe the two methods above could also copy the parameter, make modifications, and return the modified copy?

return gpt_generate(system_prompt, input_prompt)


model = SentenceTransformer("all-MiniLM-L6-v2")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense that you'd need to embed the subheads separately here since the article embedder only does the headlines right now. I'd probably use the same language model for this that the NRMS model is based on, which I think is distilbert-base-uncased

Also worth thinking about whether the article embedder should be configurable to specify which article text fields (headline, subhead, body text) to embed. You can do it here but this probably isn't the last time we're going to run into this, so might make sense to move it upstream and make it re-usable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m thinking, for handling headlines, should we use the original distilbert transformer, or can we easily extract embeddings from our fine-tuned distilbert on the MIND data? I agree that making this part more flexible for selecting configurations and text fields would be beneficial.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it should be possible to use the fine-tuned one by loading the fine-tuned weights into a plain distilbert model

Comment on lines +57 to +58
recommended_list = text_generation(candidate_articles.articles, article_indices, interest_profile)
return ArticleSet(articles=recommended_list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a whole separate component to me, since the text generation gets applied to every selected article and isn't otherwise coupled to the calibration code. I'd probably use the article_indices and candidate_articles to build and return an ArticleSet here:

Suggested change
recommended_list = text_generation(candidate_articles.articles, article_indices, interest_profile)
return ArticleSet(articles=recommended_list)
return ArticleSet(articles=[candidate_articles.articles[idx] for idx in article_indices])

and then wire that output into a new component (called something like ContextGenerator?) that does the text generation part. More on this in the text_generation code below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a challenge I was referring to a few weeks ago over Slack. Text generation shouldn't be applied to every article, so we'd need to set a flag of sorts if we want to pull out this functionality to another component.

Copy link
Collaborator

@karlhigley karlhigley Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you can set whatever properties you want on ArticleSet, so you could add a list or tensor with boolean flags to indicate whether you want it applied or not and that'll get sent along to subsequent components. I might be missing something but I don't see where in the current code the equivalent selection is happening—it appears to be applying LLM generated text to all recommended articles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have moved text_generation out of the __call__ method.

return top_k_indices


def text_generation(candidate_articles, article_indices, interest_profile):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like candidate_articles and article_indices are used here to reconstruct the equivalent of an ArticleSet of the selected articles. If this turns into a separate component, then the __call__() method could receive the same information using an ArticleSet and an InterestProfile as inputs.

@@ -3,8 +3,9 @@

from poprox_concepts import ArticleSet
from poprox_concepts.api.recommendations import RecommendationRequest, RecommendationResponse
from poprox_recommender.components.diversifiers.locality_calibration import user_interest_generate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handler shouldn't know anything about individual recommenders, so importing code from this recommender-specific component is a clue that the code below is probably in the wrong place. Could these changes live somewhere in the locality_calibration module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved tolocality_calibrationmodule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, could we modify the code in the handler.py in our forked repository?

@karlhigley
Copy link
Collaborator

The tests are failing due to the openai import, since openai hasn't been added to the dependencies yet

@mdekstrand
Copy link
Contributor

mdekstrand commented Oct 29, 2024

I agree with @karlhigley's comments on dependencies — let's keep everything in the main recommenders repo self-contained (no external services).

Copy link
Contributor

@sophiasun0515 sophiasun0515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall! I think we can determine the number of clicked articles (k) after doing some internal testing.

For cold start implementation, I don't see the current need for that (assuming we can exclude inactive users in the pre-experiment observation window);

As for the clicked article sorting, I think it'll be useful to apply it to count for user memory decay (i.e. a clicked article last week would be more salient to users than one from 2 months ago). Maybe we can fuse the chronological factor into embedding similarity computation so there's another layer to consider for candidate article selection other than content embeddings?

Final note -- regarding Karl and Michael's comments, I think maybe it'll be a better idea if we migrate this PR to our forked repo?

Comment on lines +237 to +238
similarities = cosine_similarity([target_embedding], candidate_embeddings)[0]
top_k_indices = np.argsort(similarities)[-k:][::-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I'm thinking if we can fuse chronological order into candidate selection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants