Index presence of ads, trackers #34

mlinksva · 2016-03-13T18:55:28Z

https://filterlists.com/ could help determine.

Allow users to filter based on index and/or boost results lacking presence.

Looking at https://about.commonsearch.org/values it seems such filters would be mainstream (more so than license filters) and possibly aligned with privacy, though as stated the value is only about what Common Search does with user data. But Common Search's independence could allow it to take stronger (or at least different) measures to protect searchers than Google does.

I'd love to be able to search the web sans ad-laden sites. Not to avoid the ads (for that I use an ad blocker) but to avoid the junk content. Searching for info on many consumer products on Google, one has to wade through ad/affiliate-driven reviews and stores to find neutral information or even information provided by the manufacturer. Filtering out stores would be harder so I didn't put in the title of this issue.

sylvinus · 2016-03-13T22:34:12Z

Wow I didn't know about filterlists, looks very useful, thanks!

We could use some of those lists for better parsing, for instance better remove cookie notices that usually pollute the top of the pages. => #35 :-)

I definitely agree that junk content should be a negative ranking signal for websites. The questions is where to draw the line (or which weights to give to each category). I'm pretty sure we want to outright drop websites containing malware, but what about the rest?

Are there lists that differentiate between common trackers like Google Analytics and "less acceptable" ones?

There is also a greater discussion to have on the number of options we want to provide users with in a future "advanced search" feature. There is a balance to find between the additional stress these searches could cause on the infrastructure (because they wouldn't be part of the "mainstream" caches) and the number of users/powerusers they could interest.

indolering · 2016-09-01T23:02:58Z

Copying information over from (dupe) #59: FilterLists is working on a 2.0 version and I've requested that they include a machine readable format we could parse.

indolering · 2016-09-01T23:39:59Z

I'm pretty sure we want to outright drop websites containing malware, but what about the rest?

I think we should just warn users, these issues are typically transitory. All-things-being-equal, a result that doesn't have tracking should be promoted above one that does.

collinbarrett · 2016-09-02T13:07:17Z

hey, maintainer of FilterLists here. just discovered commonsearch via @indolering . looks like a great project! no promises on timely completion of a machine-readable format (non-monetized side-project), but it is on my radar to work on. will check back here with updates.

collinbarrett · 2017-01-29T22:23:46Z

I just launched v2 of FilterLists, and the data is now in json format on GitHub over here. Feel free to use.

sylvinus added the needs discussion label Mar 13, 2016

sylvinus mentioned this issue Mar 13, 2016

Add first document-level quality signals #28

Open

sylvinus mentioned this issue Aug 29, 2016

Advertising Lists #59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index presence of ads, trackers #34

Index presence of ads, trackers #34

mlinksva commented Mar 13, 2016

sylvinus commented Mar 13, 2016

indolering commented Sep 1, 2016

indolering commented Sep 1, 2016

collinbarrett commented Sep 2, 2016

collinbarrett commented Jan 29, 2017 •

edited

Loading

Index presence of ads, trackers #34

Index presence of ads, trackers #34

Comments

mlinksva commented Mar 13, 2016

sylvinus commented Mar 13, 2016

indolering commented Sep 1, 2016

indolering commented Sep 1, 2016

collinbarrett commented Sep 2, 2016

collinbarrett commented Jan 29, 2017 • edited Loading

collinbarrett commented Jan 29, 2017 •

edited

Loading