Skip to content

Commit 639b8f8

Browse files
authored
Update README.md
Adding flare
1 parent b97ac16 commit 639b8f8

File tree

1 file changed

+12
-12
lines changed

1 file changed

+12
-12
lines changed

README.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,30 @@
11
# feedxtract
22
FeedXtract takes your bookmarks manager export file, searches the root domain of all your bookmarks, and extracts the RSS/Atom Feeds from them, providing a .opml file for use in RSS feed readers like Newsboat.
33

4-
Detailed Explanation
4+
**Detailed Explanation**
55
This simple Python script is designed to extract URLs from an HTML file, identify RSS/Atom feeds from those URLs root domain, and then create an OPML (Outline Processor Markup Language) file containing the list of identified feeds. So far, I have only tested this in KDE Neon.
66

7-
Use Case:
7+
**Use Case:**
88
I use Raindrop.io as my bookmarks manager and wanted an easy way to find all the RSS feeds from all my bookmarks and feed them into the Newsboat CLI RSS Reader. So this simple script will comb through the HTML export from Raindrop (or any other bookmarks manager, so long as it's an html file), search the root domain of each URL it finds for available RSS/Atom feeds, and if it finds them, will drop them into an OPML file ready for Newsboat to import. This was mainly meant to get my RSS Reader started easily, rather than having to individually find each RSS Feed from all my bookmarks. This makes it a little bit easier to populate my reader with feeds and then curate afterwards.
99

10-
Extract URLs from HTML:
10+
**Extract URLs from HTML:**
1111
extract_urls_from_html(html_content): This function uses BeautifulSoup to parse HTML content and extract all URLs from <a> tags.
1212

13-
Find RSS Feeds:
13+
**Find RSS Feeds:**
1414
find_rss_feeds(url): This function takes a URL, sends a GET request to fetch its HTML content, and then uses BeautifulSoup to find RSS or Atom feed links within the <link> tags.
1515

16-
Create OPML:
16+
**Create OPML:**
1717
create_opml(feeds): This function generates an OPML file from a list of feed dictionaries, each containing a title and url.
1818
Main Function:
1919

2020
main(): This function reads HTML content from a file named input.html, extracts URLs using extract_urls_from_html, finds RSS feeds for each URL using find_rss_feeds, and finally creates an OPML file using create_opml.
2121
Dependencies
2222

23-
FeedXtract requires the following dependencies:
23+
**FeedXtract requires the following dependencies:**
2424
requests: For making HTTP requests to fetch web pages.
2525
beautifulsoup4: For parsing HTML content and extracting URLs and RSS feed links.
2626

27-
Installation
27+
**Installation:**
2828
Clone the repository using the following command:
2929
git clone https://github.com/wickedjackal/feedxtract.git
3030

@@ -36,23 +36,23 @@ pip install requests beautifulsoup4 lxml
3636
Note: lxml is optional but should speed up parsing for BeautifulSoup4
3737

3838

39-
Usage Guide
39+
**Usage Guide**
4040
Prepare the HTML File:
4141
Option 1 : Create New
4242
Create an HTML file named input.html in the same directory as FeedXtract. This file should contain the HTML content with the URLs you want to extract.
4343

4444
Option 2: Import
4545
Export an HTML file from your chosen bookmarks manager, rename it input.html, and place it in the same directory as FeedXtract
4646

47-
Run the Script:
47+
**Run the Script:**
4848
Execute the script by running the following command in your terminal:
4949
python feedxtract.py
5050

5151

52-
Check the Output:
52+
**Check the Output:**
5353
After running the script, an OPML file named feeds.opml will be created in the same directory. This file will contain the list of identified RSS/Atom feeds.
5454

55-
Notes
55+
**Notes**
5656
-Ensure the input.html file is correctly formatted (you only need to name the file input.html, anything that isn't a URL in the file will be ignored) and contains valid URLs.
5757
-The script assumes that the root domain of each URL might contain RSS/Atom feeds. This may not always be accurate, so adjust the logic if needed for more specific use cases.
5858
-This script will take a while to run, depending on how bit input.html is and how many bookmarks you have.
@@ -61,7 +61,7 @@ Notes
6161
-After importing feeds.opml into Newsboat for the first time, I noticed that no items were actually loaded in the feeds. I hit Shift-R to refresh all, and voila! Everything updated and items became available.
6262

6363

64-
Patch Notes:
64+
**Patch Notes:**
6565

6666
V0.4.1 - Revert
6767
Explanation:

0 commit comments

Comments
 (0)