-
Notifications
You must be signed in to change notification settings - Fork 6
Filter Tutorial
The filter is used to strip specific elements from the article html that are known to be causing false positive matches of main content by readability.
If you encounter a wrong article text in a feedability feed that, for instance is a comment, part of the navigation or any other section of the website, you’ve found a false-positive of readability. This can for an example happen when the article text is not the largest coherent section of the page.
One very easy way to fix this, is to specifically remove the elements on the page that distract readability.
Feedability is using jQuery selectors to select the elements to be removed, they are a very powerful tool similar/identical(?) to XPath and CSS Selectors. You can find documentation about selectors at the jQuery API Documentation or at w3schools.
To find the right elements I use Firebug to view the structure of the page i encountered false-positives.
- First manually visit the article site and use the original Readability bookmarklet to verify the problem.
- Return to the original page by opening it up again.
- Now mark some of the text that is mistakenly detected as the main content. Open the context menu and select Inspect Element. Firebug should open up:
- Try to look around, highlight elements and navigate in the html tree and css classes. There many ways to specify the comments section that causing the problem and there is no general receipt of how to identify it. In this case the most promising method looks like to be selecting the CSS class named “commentlist” that includes all article comments of the page.
- Specifing a CSS class as jquery selector is easy as pie:
.commentlist
(side note: you can select element ids using#id
)
To put it all together we create a user_settings.json
file that should contain our own filtering rules:
{
"filter": {
"jquery_filters": {
"sixserv.org": [".commentlist"]
}
}
}
Now, the filter is only applied to article sites that match with the regular expression sixserv.org
¹.
Please not that if you change the filter rules you need to remove the *.rdby
caching files to apply the new filters on already fetched articles.
¹ i know i don’t escaped the point