-
Notifications
You must be signed in to change notification settings - Fork 75
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #473 from BanoMarvey/master
Added Daily Mail as a publisher
- Loading branch information
Showing
6 changed files
with
124 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
import datetime | ||
from typing import List, Optional | ||
|
||
from lxml.cssselect import CSSSelector | ||
|
||
from fundus.parser import ArticleBody, BaseParser, ParserProxy, attribute | ||
from fundus.parser.utility import ( | ||
extract_article_body_with_selector, | ||
generic_author_parsing, | ||
generic_date_parsing, | ||
generic_topic_parsing, | ||
) | ||
|
||
|
||
class DailyMailParser(ParserProxy): | ||
class V1(BaseParser): | ||
_paragraph_selector = CSSSelector("div[itemprop='articleBody'] > p") | ||
_summary_selector = CSSSelector("#js-article-text > h1") | ||
|
||
@attribute | ||
def body(self) -> ArticleBody: | ||
return extract_article_body_with_selector( | ||
self.precomputed.doc, | ||
paragraph_selector=self._paragraph_selector, | ||
) | ||
|
||
@attribute | ||
def publishing_date(self) -> Optional[datetime.datetime]: | ||
return generic_date_parsing(self.precomputed.ld.bf_search("datePublished")) | ||
|
||
@attribute | ||
def authors(self) -> List[str]: | ||
return generic_author_parsing(self.precomputed.ld.bf_search("author")) | ||
|
||
@attribute | ||
def title(self) -> Optional[str]: | ||
return self.precomputed.meta.get("og:title") | ||
|
||
@attribute | ||
def topics(self) -> List[str]: | ||
filtered_topics = [] | ||
for topic in generic_topic_parsing(self.precomputed.meta.get("keywords")): | ||
if topic.casefold() != topic: | ||
filtered_topics.append(topic) | ||
return filtered_topics |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
{ | ||
"V1": { | ||
"authors": [ | ||
"Dolores Chang" | ||
], | ||
"body": { | ||
"summary": [], | ||
"sections": [ | ||
{ | ||
"headline": [], | ||
"paragraphs": [ | ||
"Think twice before you spend any $1 bill in your wallet, as it could fetch thousands of dollars for you.", | ||
"Currency collectors nationwide are on hunt for some rare dollar bills, willing to pay up to $150,000 for those with a specific printing error.", | ||
"According to the personal finance blog Wealthynickel, two batches of $1 bills printed in 2014 and 2016 contain this particular error from the US Bureau of Engraving and Printing.", | ||
"'It's very rare that the Federal Reserve would mess up an order, and then it reaches circulation,' Chad Hawk, vice president of PMG, a paper money grading company in Florida told Fox.", | ||
"Scroll down to see how to identify the rare bucks worth thousands", | ||
"Typically, every bill in circulation needs a unique serial number to identify it, but the US Bureau of Engraving and Printing had a miscommunication with federal banks.", | ||
"This resulted in 6.4 million pairs of $1 bills with matching serial numbers being circulated before the mistake was noticed by the Federal Reserve.", | ||
"While the first batch was issued in New York and the second was issued in Washington, D.C., these bills could now be anywhere in the world.", | ||
"'In the last two or three years, people started to discover the error. The community, through social media, has been able to connect,' Hawk said.", | ||
"'And people have been able to pair up their notes in a lot of ways. The last pairing I think I saw sold for about $6,000,' he added.", | ||
"Only nine of these pairs have been matched, leaving millions of rare $1 bills out there.", | ||
"According to Wealthynickel, currency collecting companies are willing to pay between $20,000 and $150,000 for a pair from the two batches.", | ||
"Here's what to look for:", | ||
"If you're fortunate enough to have one of these $1 bills, the next step is to find the other bill with a matching serial number.", | ||
"According to Hawk, the best approach is to utilize social media.", | ||
"'The best thing to do is look online, go on social media — and there are actually websites dedicated to this,' he said.", | ||
"'You can find outlets where people are collecting the data, so you can see if notes are out there already.", | ||
"'If someone's already reported this number, you might be able to pair up with someone looking for this number. They may be willing to pay a big premium for that,' he said." | ||
] | ||
} | ||
] | ||
}, | ||
"publishing_date": "2024-04-27 15:56:35+01:00", | ||
"title": "Your $1 bill could be worth up THOUSANDS - here's how to check", | ||
"topics": [ | ||
"Florida", | ||
"Federal Reserve", | ||
"New York" | ||
] | ||
} | ||
} |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters