Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve personalization for users of large sites #8

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dmarti
Copy link
Contributor

@dmarti dmarti commented Jan 25, 2022

Allow callers to specify a section name that the classifier can
use to develop a topics list, to improve personalization for users
of large, multi-topic sites.

If the topic list is per-hostname, a user of a large general-interest
site may receive inadequate personalization compared to a user of
multiple niche sites with only a few topics per site.

For example, a classifier might map a large news site's hostname to
the topics "real estate", "political news", and "crossword puzzles"
resulting in sub-optimal personalization for the users who primarily
visit the site for its fashion coverage, electronics reviews,
or parenting advice.

Without section information, a general-interest video hosting or
social site would also provide very little helpful personalization
info to its users, while it receives helpful personalization info
as a result of the same people's visits to small sites that cover
only a few topics.

A section can be any subdivision of a site, including a "channel"
"group" or "space."

Fixes #17

dmarti and others added 3 commits January 25, 2022 06:03
Allow callers to specify a `section` name that the classifier can
use to develop a topics list, to improve personalization for users
of large, multi-topic sites.

If the topic list is per-hostname, a user of a large general-interest
site may receive inadequate personalization compared to a user of
multiple niche sites with only a few topics per site.

For example, a classifier might map a large news site's hostname to
the topics "real estate", "political news", and "crossword puzzles"
resulting in sub-optimal personalization for the users who primarily
visit the site for its fashion coverage, electronics reviews,
or parenting advice.

Without section information, a general-interest video hosting or
social site would also provide very little helpful personalization
info to its users, while it receives helpful personalization info
as a result of the same people's visits to small sites that cover
only a few topics.

A section can be any subdivision of a site, including a "channel"
"group" or "space."
@AramZS
Copy link

AramZS commented Jan 26, 2022

I really like this idea, being able to mark pages as under a vertical topic would be super helpful for improving the problems of one-topic to one-hostname. Reasonable restrictions could be built out in order to avoid the risk of a fraudulent publisher doing something misleading like generating too many sections or too shallow a number of content pieces within a section

@jkarlin
Copy link
Collaborator

jkarlin commented Jan 26, 2022

Thanks for the PR Dan. I'm generally supportive of the notion of sites or callers being able to add additional topics, or replace the existing topics. But, it comes with concerns that I think need to be addressed first:

  1. It allows for topics to have coded meanings. e.g., the caller might add a topic (section) such as cats, but what the caller really means by cats is that the caller has a particular health problem.
    1. I don't know how much of a problem this is, since all callers receive the same topics. And thus the encoding might not work well. But it's theoretically possible that large players in the industry come to an agreement on a coded meaning, and that's a problem.
  2. It can potentially pollute the data. If lots of sites add arbitrary topics just to mess with the API, then the utility of the topics for advertising drops.
  3. It allows for sites to game the system, by adding the most valuable topics to their site.

So I think we'd want to either a) decide that the concerns are unwarranted, or b) propose practical mitigations to them alongside this.

Can you create an issue for this and link the PR to it? I'd rather we have the discussion there.

@dmarti
Copy link
Contributor Author

dmarti commented Jan 26, 2022

Hi Josh, thank you for the comment. I agree that sites should not (at least for now) be able to specify individual topics, for the reasons you provide and possibly others. The publisher-provided "section" is just an identifier applied to a subset of pages on that site, and the actual topics for pages in that section would still have to be determined by the classifier.

(For example, at Linux Journal, we had a "Driving me Nuts" section that was not about "nuts" or "driving", it was about Linux device drivers. The classifier would not use the section name as the topic, and probably classify that section with the topic "C Programming" just based on the text of pages in that section.)

I'll update the PR to clarify this.

Sections are some kind of sub-division of a site that can split out
groups of topics. Section names are not used as topics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable a site to set an optional section name
4 participants