Skip to content

Commit

Permalink
feat: Add a custom extractor for www.engadget.com. (#552)
Browse files Browse the repository at this point in the history
* feat:Add a custom extractor for ma.ttias.be.

When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. This resolves that as follows:

* Remove "id" attributes from "h1" and "h2" elements. Those attributes would result in the elements having a low weight.
* Since Mercury Parser demotes "h1" elements to "h2", demote "h2" elements to "h3".
* Add class="entry-content-asset" to "ul" elements to avoid them being removed.

* removed redundant comment.

* feat: Add a custom extractor for engadget.com.

Co-authored-by: John Holdun <john@johnholdun.com>
  • Loading branch information
jbrayton and johnholdun authored Aug 10, 2022
1 parent 13dfe72 commit 3c5c0bd
Show file tree
Hide file tree
Showing 5 changed files with 1,741 additions and 1 deletion.
2 changes: 1 addition & 1 deletion cli.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ const {
extendedListTypes,
headers,
addExtractor,
version,
version
) => {
if (version) {
console.log(package_info.version);
Expand Down
Loading

0 comments on commit 3c5c0bd

Please sign in to comment.