-
Notifications
You must be signed in to change notification settings - Fork 6
Filters
Filters are applied before and after readability tries to extract the main content and can be used to improve or correct the detection on specific sites. In general the extraction algorithm of readability works pretty good, but sometimes it is unavoidable to use some kind of pre or post processing to fix false-positives and other problems.
Feedability supports 4 different types of rules per url pattern (regular expression against the article url), that are applied in some way on specific HTML elements. The elements are specified using jQuery selectors. You can get documentation about them at the jQuery API documentation or at w3schools. The different types are:
Remove rules can be used to strip specific elements from the article html that are known to be causing false positive matches of main content by readability.
Elements that are selected by exclusive rules are replacing the body of the document. (so, currently it only makes sense to specify one element, but this may change in the future)
Selected html by those rules are prepended to the final extracted text.
Selected html by those rules are appended to the final extracted text.
If you change the filter rules you need to remove the *.rdby
caching files to apply the new filters on already fetched articles. Example:
"rules": {
"sixserv.org": {
"remove": ["#sidebar", ".commentlist"],
"exclusive": ["#content"]
}
}
For a short tutorial on how to create remove filter rules read Filter-Tutorial.