You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+12-12
Original file line number
Diff line number
Diff line change
@@ -1,30 +1,30 @@
1
1
# feedxtract
2
2
FeedXtract takes your bookmarks manager export file, searches the root domain of all your bookmarks, and extracts the RSS/Atom Feeds from them, providing a .opml file for use in RSS feed readers like Newsboat.
3
3
4
-
Detailed Explanation
4
+
**Detailed Explanation**
5
5
This simple Python script is designed to extract URLs from an HTML file, identify RSS/Atom feeds from those URLs root domain, and then create an OPML (Outline Processor Markup Language) file containing the list of identified feeds. So far, I have only tested this in KDE Neon.
6
6
7
-
Use Case:
7
+
**Use Case:**
8
8
I use Raindrop.io as my bookmarks manager and wanted an easy way to find all the RSS feeds from all my bookmarks and feed them into the Newsboat CLI RSS Reader. So this simple script will comb through the HTML export from Raindrop (or any other bookmarks manager, so long as it's an html file), search the root domain of each URL it finds for available RSS/Atom feeds, and if it finds them, will drop them into an OPML file ready for Newsboat to import. This was mainly meant to get my RSS Reader started easily, rather than having to individually find each RSS Feed from all my bookmarks. This makes it a little bit easier to populate my reader with feeds and then curate afterwards.
9
9
10
-
Extract URLs from HTML:
10
+
**Extract URLs from HTML:**
11
11
extract_urls_from_html(html_content): This function uses BeautifulSoup to parse HTML content and extract all URLs from <a> tags.
12
12
13
-
Find RSS Feeds:
13
+
**Find RSS Feeds:**
14
14
find_rss_feeds(url): This function takes a URL, sends a GET request to fetch its HTML content, and then uses BeautifulSoup to find RSS or Atom feed links within the <link> tags.
15
15
16
-
Create OPML:
16
+
**Create OPML:**
17
17
create_opml(feeds): This function generates an OPML file from a list of feed dictionaries, each containing a title and url.
18
18
Main Function:
19
19
20
20
main(): This function reads HTML content from a file named input.html, extracts URLs using extract_urls_from_html, finds RSS feeds for each URL using find_rss_feeds, and finally creates an OPML file using create_opml.
21
21
Dependencies
22
22
23
-
FeedXtract requires the following dependencies:
23
+
**FeedXtract requires the following dependencies:**
24
24
requests: For making HTTP requests to fetch web pages.
25
25
beautifulsoup4: For parsing HTML content and extracting URLs and RSS feed links.
Note: lxml is optional but should speed up parsing for BeautifulSoup4
37
37
38
38
39
-
Usage Guide
39
+
**Usage Guide**
40
40
Prepare the HTML File:
41
41
Option 1 : Create New
42
42
Create an HTML file named input.html in the same directory as FeedXtract. This file should contain the HTML content with the URLs you want to extract.
43
43
44
44
Option 2: Import
45
45
Export an HTML file from your chosen bookmarks manager, rename it input.html, and place it in the same directory as FeedXtract
46
46
47
-
Run the Script:
47
+
**Run the Script:**
48
48
Execute the script by running the following command in your terminal:
49
49
python feedxtract.py
50
50
51
51
52
-
Check the Output:
52
+
**Check the Output:**
53
53
After running the script, an OPML file named feeds.opml will be created in the same directory. This file will contain the list of identified RSS/Atom feeds.
54
54
55
-
Notes
55
+
**Notes**
56
56
-Ensure the input.html file is correctly formatted (you only need to name the file input.html, anything that isn't a URL in the file will be ignored) and contains valid URLs.
57
57
-The script assumes that the root domain of each URL might contain RSS/Atom feeds. This may not always be accurate, so adjust the logic if needed for more specific use cases.
58
58
-This script will take a while to run, depending on how bit input.html is and how many bookmarks you have.
@@ -61,7 +61,7 @@ Notes
61
61
-After importing feeds.opml into Newsboat for the first time, I noticed that no items were actually loaded in the feeds. I hit Shift-R to refresh all, and voila! Everything updated and items became available.
0 commit comments