doc: Update README.md

Adding a few more details and a link to article.
philippe2803 · Apr 29, 2024 · 62ac810 · 62ac810
1 parent b5dd0f3
commit 62ac810
Showing 1 changed file with 15 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -4,6 +4,10 @@ A way to share content from a specific domain using SQLite as an alternative to
 RSS feeds. The purpose of this library is to simply create a dataset for all the
 content on your website, using the XML sitemap as a starting point.
 
+Possibility to include vector search similarity features in the dataset very easily.
+
+Article that explains the rationale behind this type of datasets [here](https://philippeoger.com/pages/can-we-rag-the-whole-web/).
+
 
 ## Installation
 
@@ -15,15 +19,21 @@ pip install contentmap
 
 ## Quickstart
 
-To build your contentmap.db that will contain all your content using your XML 
-sitemap as a starting point, you only need to write the following: 
+To build your contentmap.db with vector search capabilities and containing all 
+your content using your XML sitemap as a starting point, you only need to write the
+following: 
 
 ```python
 from contentmap.sitemap import SitemapToContentDatabase
 
-database = SitemapToContentDatabase("https://yourblog.com/sitemap.xml")
-database.load()
+database = SitemapToContentDatabase(
+    sitemap_url="https://yourblog.com/sitemap.xml",
+    concurrency=10,
+    include_vss=True
+)
+database.build()
 
 ```
 
-You can control how many urls can be crawled concurrently and also set some timeout.
+This will automatically create the SQLite database file, with vector search 
+capabilities (piggybacking on sqlite-vss integration on Langchain).