diff --git a/docs/user_guide/05_hash_vs_json.ipynb b/docs/user_guide/05_hash_vs_json.ipynb index 071cff5c..217ab63f 100644 --- a/docs/user_guide/05_hash_vs_json.ipynb +++ b/docs/user_guide/05_hash_vs_json.ipynb @@ -27,7 +27,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ @@ -311,19 +311,6 @@ "metadata": {}, "source": [ "### Working with JSON\n", - "Redis also supports native **JSON** objects. These can be multi-level (nested) objects, with full JSONPath support for updating/retrieving sub elements:\n", - "\n", - "```python\n", - "{\n", - " \"name\": \"bike\",\n", - " \"metadata\": {\n", - " \"model\": \"Deimos\",\n", - " \"brand\": \"Ergonom\",\n", - " \"type\": \"Enduro bikes\",\n", - " \"price\": 4972,\n", - " }\n", - "}\n", - "```\n", "\n", "JSON is best suited for use cases with the following characteristics:\n", "- Ease of use and data model flexibility are top concerns\n", @@ -331,16 +318,6 @@ "- Replacing another document storage/db solution" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Full JSON Path support\n", - "Because Redis enables full JSON path support, when creating an index schema, elements need to be indexed and selected by their path with the desired `name` AND `path` that points to where the data is located within the objects.\n", - "\n", - "> By default, RedisVL will assume the path as `$.{name}` if not provided in JSON fields schema." - ] - }, { "cell_type": "code", "execution_count": 11, @@ -505,11 +482,230 @@ "source": [ "jindex.delete()" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Working with nested data in JSON\n", + "\n", + "Redis also supports native **JSON** objects. These can be multi-level (nested) objects, with full JSONPath support for updating/retrieving sub elements:\n", + "\n", + "```json\n", + "{\n", + " \"name\": \"Specialized Stump jumper\",\n", + " \"metadata\": {\n", + " \"model\": \"Stumpjumper\",\n", + " \"brand\": \"Specialized\",\n", + " \"type\": \"Enduro bikes\",\n", + " \"price\": 3000\n", + " },\n", + "}\n", + "```\n", + "\n", + "#### Full JSON Path support\n", + "Because Redis enables full JSON path support, when creating an index schema, elements need to be indexed and selected by their path with the desired `name` AND `path` that points to where the data is located within the objects.\n", + "\n", + "> By default, RedisVL will assume the path as `$.{name}` if not provided in JSON fields schema. If nested provide path as `$.object.attribute`\n", + "\n", + "### As an example:" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/robert.shelton/.pyenv/versions/3.11.9/lib/python3.11/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "from redisvl.utils.vectorize import HFTextVectorizer\n", + "\n", + "emb_model = HFTextVectorizer()\n", + "\n", + "bike_data = [\n", + " {\n", + " \"name\": \"Specialized Stump jumper\",\n", + " \"metadata\": {\n", + " \"model\": \"Stumpjumper\",\n", + " \"brand\": \"Specialized\",\n", + " \"type\": \"Enduro bikes\",\n", + " \"price\": 3000\n", + " },\n", + " \"description\": \"The Specialized Stumpjumper is a versatile enduro bike that dominates both climbs and descents. Features a FACT 11m carbon fiber frame, FOX FLOAT suspension with 160mm travel, and SRAM X01 Eagle drivetrain. The asymmetric frame design and internal storage compartment make it a practical choice for all-day adventures.\"\n", + " },\n", + " {\n", + " \"name\": \"bike_2\",\n", + " \"metadata\": {\n", + " \"model\": \"Slash\",\n", + " \"brand\": \"Trek\",\n", + " \"type\": \"Enduro bikes\",\n", + " \"price\": 5000\n", + " },\n", + " \"description\": \"Trek's Slash is built for aggressive enduro riding and racing. Featuring Trek's Alpha Aluminum frame with RE:aktiv suspension technology, 160mm travel, and Knock Block frame protection. Equipped with Bontrager components and a Shimano XT drivetrain, this bike excels on technical trails and enduro race courses.\"\n", + " }\n", + "]\n", + "\n", + "bike_data = [{**d, \"bike_embedding\": emb_model.embed(d[\"description\"])} for d in bike_data]\n", + "\n", + "bike_schema = {\n", + " \"index\": {\n", + " \"name\": \"bike-json\",\n", + " \"prefix\": \"bike-json\",\n", + " \"storage_type\": \"json\", # JSON storage type\n", + " },\n", + " \"fields\": [\n", + " {\n", + " \"name\": \"model\",\n", + " \"type\": \"tag\",\n", + " \"path\": \"$.metadata.model\" # note the '$'\n", + " },\n", + " {\n", + " \"name\": \"brand\",\n", + " \"type\": \"tag\",\n", + " \"path\": \"$.metadata.brand\"\n", + " },\n", + " {\n", + " \"name\": \"price\",\n", + " \"type\": \"numeric\",\n", + " \"path\": \"$.metadata.price\"\n", + " },\n", + " {\n", + " \"name\": \"bike_embedding\",\n", + " \"type\": \"vector\",\n", + " \"attrs\": {\n", + " \"dims\": len(bike_data[0][\"bike_embedding\"]),\n", + " \"distance_metric\": \"cosine\",\n", + " \"algorithm\": \"flat\",\n", + " \"datatype\": \"float32\"\n", + " }\n", + "\n", + " }\n", + " ],\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [], + "source": [ + "# construct a search index from the json schema\n", + "bike_index = SearchIndex.from_dict(bike_schema)\n", + "\n", + "# connect to local redis instance\n", + "bike_index.connect(\"redis://localhost:6379\")\n", + "\n", + "# create the index (no data yet)\n", + "bike_index.create(overwrite=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['bike-json:de92cb9955434575b20f4e87a30b03d5',\n", + " 'bike-json:054ab3718b984532b924946fa5ce00c6']" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bike_index.load(bike_data)" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [], + "source": [ + "from redisvl.query import VectorQuery\n", + "\n", + "vec = emb_model.embed(\"I'd like a bike for aggressive riding\")\n", + "\n", + "v = VectorQuery(vector=vec,\n", + " vector_field_name=\"bike_embedding\",\n", + " return_fields=[\n", + " \"brand\",\n", + " \"name\",\n", + " \"$.metadata.type\"\n", + " ]\n", + " )\n", + "\n", + "\n", + "results = bike_index.query(v)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note:** As shown in the example if you want to retrieve a field from json object that was not indexed you will also need to supply the full path as with `$.metadata.type`." + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'id': 'bike-json:054ab3718b984532b924946fa5ce00c6',\n", + " 'vector_distance': '0.519989073277',\n", + " 'brand': 'Trek',\n", + " '$.metadata.type': 'Enduro bikes'},\n", + " {'id': 'bike-json:de92cb9955434575b20f4e87a30b03d5',\n", + " 'vector_distance': '0.657624483109',\n", + " 'brand': 'Specialized',\n", + " '$.metadata.type': 'Enduro bikes'}]" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Cleanup" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [], + "source": [ + "bike_index.delete()" + ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3.8.13 ('redisvl2')", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -523,14 +719,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.12" + "version": "3.11.9" }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "9b1e6e9c2967143209c2f955cb869d1d3234f92dc4787f49f155f3abbdfb1316" - } - } + "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2