Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add a custom extractor for www.engadget.com. (#552)
* feat:Add a custom extractor for ma.ttias.be. When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. This resolves that as follows: * Remove "id" attributes from "h1" and "h2" elements. Those attributes would result in the elements having a low weight. * Since Mercury Parser demotes "h1" elements to "h2", demote "h2" elements to "h3". * Add class="entry-content-asset" to "ul" elements to avoid them being removed. * removed redundant comment. * feat: Add a custom extractor for engadget.com. Co-authored-by: John Holdun <john@johnholdun.com>
- Loading branch information