Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index presence of ads, trackers #34

Open
mlinksva opened this issue Mar 13, 2016 · 5 comments
Open

Index presence of ads, trackers #34

mlinksva opened this issue Mar 13, 2016 · 5 comments

Comments

@mlinksva
Copy link
Contributor

https://filterlists.com/ could help determine.

Allow users to filter based on index and/or boost results lacking presence.

Looking at https://about.commonsearch.org/values it seems such filters would be mainstream (more so than license filters) and possibly aligned with privacy, though as stated the value is only about what Common Search does with user data. But Common Search's independence could allow it to take stronger (or at least different) measures to protect searchers than Google does.

I'd love to be able to search the web sans ad-laden sites. Not to avoid the ads (for that I use an ad blocker) but to avoid the junk content. Searching for info on many consumer products on Google, one has to wade through ad/affiliate-driven reviews and stores to find neutral information or even information provided by the manufacturer. Filtering out stores would be harder so I didn't put in the title of this issue.

@sylvinus
Copy link
Contributor

Wow I didn't know about filterlists, looks very useful, thanks!

We could use some of those lists for better parsing, for instance better remove cookie notices that usually pollute the top of the pages. => #35 :-)

I definitely agree that junk content should be a negative ranking signal for websites. The questions is where to draw the line (or which weights to give to each category). I'm pretty sure we want to outright drop websites containing malware, but what about the rest?

Are there lists that differentiate between common trackers like Google Analytics and "less acceptable" ones?

There is also a greater discussion to have on the number of options we want to provide users with in a future "advanced search" feature. There is a balance to find between the additional stress these searches could cause on the infrastructure (because they wouldn't be part of the "mainstream" caches) and the number of users/powerusers they could interest.

@indolering
Copy link

Copying information over from (dupe) #59: FilterLists is working on a 2.0 version and I've requested that they include a machine readable format we could parse.

@indolering
Copy link

I'm pretty sure we want to outright drop websites containing malware, but what about the rest?

I think we should just warn users, these issues are typically transitory. All-things-being-equal, a result that doesn't have tracking should be promoted above one that does.

@collinbarrett
Copy link

hey, maintainer of FilterLists here. just discovered commonsearch via @indolering . looks like a great project! no promises on timely completion of a machine-readable format (non-monetized side-project), but it is on my radar to work on. will check back here with updates.

@collinbarrett
Copy link

collinbarrett commented Jan 29, 2017

I just launched v2 of FilterLists, and the data is now in json format on GitHub over here. Feel free to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants