-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update crypto newsletter substack to RSS. fixes #4496 #4497
Update crypto newsletter substack to RSS. fixes #4496 #4497
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only weird issue is the large LOC in the test. Because the feed XML fetch is wide, the LOC change is large. Do we typically see this many lines added in test cassettes?
@@ -18,8 +18,8 @@ | |||
|
|||
|
|||
@log_start_end(log=logger) | |||
def scrape_substack(url: str) -> List[List[str]]: | |||
"""Helper method to scrape newsletters from substack. | |||
def scrape_substack_rss(url: str, limit: int = 10) -> List[List[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to Substack RSS naming for less confusion.
) | ||
rss = soup.find("rss") | ||
if rss: | ||
posts = rss.find_all("item")[:limit] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RSS XML feeds use "item"
time: str = post.find("time").get("datetime") | ||
title: str = post.title.text | ||
post_url: str = post.link.text | ||
time_str = post.pubDate.text.split(" (")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RSS items use "pubDate" as the date. This also splits since substack returns extra text in the pubdate that has to be parsed out.
"https://thedefiant.io/api/feed", | ||
"https://thedailygwei.substack.com/feed", | ||
"https://todayindefi.substack.com/feed", | ||
"https://defislate.substack.com/feed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the same substacks, just the RSS feed. Also removed Bankless.
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## develop #4497 +/- ##
==========================================
Coverage ? 55.37%
==========================================
Files ? 585
Lines ? 53216
Branches ? 0
==========================================
Hits ? 29470
Misses ? 23746
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
Description
Substack must have updated their HTML and UI, so this PR migrates to RSS feeds to fetch newsletter updates.
This PR:
Issue
#Fixes 4496
Screenshots
Before
After
How has this been tested?
Checklist:
feature/feature-name
orhotfix/hotfix-name
.Update our documentation following these guidelines. Update any user guides that are affected by the changes.If a feature was added make sure to add it to the corresponding integration test script.Others
I have commented my code, particularly in hard-to-understand areas.