-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve personalization for users of large sites #8
base: main
Are you sure you want to change the base?
Conversation
Allow callers to specify a `section` name that the classifier can use to develop a topics list, to improve personalization for users of large, multi-topic sites. If the topic list is per-hostname, a user of a large general-interest site may receive inadequate personalization compared to a user of multiple niche sites with only a few topics per site. For example, a classifier might map a large news site's hostname to the topics "real estate", "political news", and "crossword puzzles" resulting in sub-optimal personalization for the users who primarily visit the site for its fashion coverage, electronics reviews, or parenting advice. Without section information, a general-interest video hosting or social site would also provide very little helpful personalization info to its users, while it receives helpful personalization info as a result of the same people's visits to small sites that cover only a few topics. A section can be any subdivision of a site, including a "channel" "group" or "space."
I really like this idea, being able to mark pages as under a vertical topic would be super helpful for improving the problems of one-topic to one-hostname. Reasonable restrictions could be built out in order to avoid the risk of a fraudulent publisher doing something misleading like generating too many |
Thanks for the PR Dan. I'm generally supportive of the notion of sites or callers being able to add additional topics, or replace the existing topics. But, it comes with concerns that I think need to be addressed first:
So I think we'd want to either a) decide that the concerns are unwarranted, or b) propose practical mitigations to them alongside this. Can you create an issue for this and link the PR to it? I'd rather we have the discussion there. |
Hi Josh, thank you for the comment. I agree that sites should not (at least for now) be able to specify individual topics, for the reasons you provide and possibly others. The publisher-provided "section" is just an identifier applied to a subset of pages on that site, and the actual topics for pages in that section would still have to be determined by the classifier. (For example, at Linux Journal, we had a "Driving me Nuts" section that was not about "nuts" or "driving", it was about Linux device drivers. The classifier would not use the section name as the topic, and probably classify that section with the topic "C Programming" just based on the text of pages in that section.) I'll update the PR to clarify this. |
Sections are some kind of sub-division of a site that can split out groups of topics. Section names are not used as topics.
Allow callers to specify a
section
name that the classifier canuse to develop a topics list, to improve personalization for users
of large, multi-topic sites.
If the topic list is per-hostname, a user of a large general-interest
site may receive inadequate personalization compared to a user of
multiple niche sites with only a few topics per site.
For example, a classifier might map a large news site's hostname to
the topics "real estate", "political news", and "crossword puzzles"
resulting in sub-optimal personalization for the users who primarily
visit the site for its fashion coverage, electronics reviews,
or parenting advice.
Without section information, a general-interest video hosting or
social site would also provide very little helpful personalization
info to its users, while it receives helpful personalization info
as a result of the same people's visits to small sites that cover
only a few topics.
A section can be any subdivision of a site, including a "channel"
"group" or "space."
Fixes #17