From 8c04e295df4a911c727698cb3281b0924c58ec4c Mon Sep 17 00:00:00 2001 From: Rudyard Richter Date: Tue, 7 Aug 2018 17:27:38 -0500 Subject: [PATCH 1/2] docs(ipynb): add example notebook --- examples/indexd_demo.ipynb | 759 +++++++++++++++++++++++++++++++++++++ 1 file changed, 759 insertions(+) create mode 100644 examples/indexd_demo.ipynb diff --git a/examples/indexd_demo.ipynb b/examples/indexd_demo.ipynb new file mode 100644 index 00000000..9970a9f3 --- /dev/null +++ b/examples/indexd_demo.ipynb @@ -0,0 +1,759 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# indexd Demo" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## About indexd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Indexd, in a nutshell, is a microservice which maintains URLs as pointers to stored data files. Indexd adds a layer of abstraction over stored data files: the data can move between or live in multiple locations, while the unique identifier for each file, kept in indexd, allows us to obtain the URLs (and some miscellaneous metadata) for the same stored data. Additionally, indexd tracks revisions of the same data file." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Python Demo" + ] + }, + { + "cell_type": "code", + "execution_count": 223, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from urllib.parse import urljoin\n", + "\n", + "import requests" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To start, run indexd on `localhost:8080`. Probably the easiest way is with a docker container:\n", + "```bash\n", + "# Start from indexd directory\n", + "# Build the docker image if you don't have it yet\n", + "docker build -t indexd .\n", + "# Now run the image, and set it to forward to port 8080.\n", + "docker run -d --name indexd -p 8080:80 indexd\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to use endpoints requiring admin authorization, set up a username and password in the indexd docker image:\n", + "```bash\n", + "docker exec indexd python /indexd/bin/index_admin.py create --username test --password test\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(Here we set up a bit of code just to make the API calls more concise and readable.)" + ] + }, + { + "cell_type": "code", + "execution_count": 224, + "metadata": {}, + "outputs": [], + "source": [ + "base = 'http://localhost:8080'\n", + "\n", + "# NOTE\n", + "# Fill in the auth with whatever username/password you set before.\n", + "request_auth = requests.auth.HTTPBasicAuth('test', 'test')\n", + "\n", + "indexd = lambda path: urljoin(base, path)\n", + "\n", + "def print_response(response):\n", + " print(response)\n", + " try:\n", + " print(json.dumps(response.json(), indent=4))\n", + " except ValueError:\n", + " print(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Just for the purposes of re-using this demo with the same indexd instance, we'll clear out all the records from indexd. (For the sake of the tutorial, this shouldn't make sense yet—so ignore this, and move along!)" + ] + }, + { + "cell_type": "code", + "execution_count": 225, + "metadata": {}, + "outputs": [], + "source": [ + "def wipe_indexd():\n", + " \"\"\"\n", + " Delete all records from indexd.\n", + " \"\"\"\n", + " records = requests.get(indexd('/index/')).json()['records']\n", + " for record in records:\n", + " path = indexd('/index/{}'.format(record['did']))\n", + " params = {'rev': record['rev']}\n", + " response = requests.delete(path, auth=request_auth, params=params)" + ] + }, + { + "cell_type": "code", + "execution_count": 226, + "metadata": {}, + "outputs": [], + "source": [ + "# WARNING: don't do this if you want to keep your existing records!\n", + "wipe_indexd()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's check that indexd is alive, using the status endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": 227, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Healthy\n" + ] + } + ], + "source": [ + "print_response(requests.get(indexd('/_status')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So far so good. Let's get the list of records stored in indexd right now, by sending a `GET` to `/index/`." + ] + }, + { + "cell_type": "code", + "execution_count": 228, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"version\": null,\n", + " \"metadata\": {},\n", + " \"urls\": [],\n", + " \"start\": null,\n", + " \"size\": null,\n", + " \"limit\": 100,\n", + " \"records\": [],\n", + " \"ids\": null,\n", + " \"acl\": [],\n", + " \"hashes\": null,\n", + " \"file_name\": null\n", + "}\n" + ] + } + ], + "source": [ + "print_response(requests.get(indexd('/index/')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There's no records registered yet...let's create one." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Creating a Record" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Just below is some example data for a record. We `POST` this to the `/index/` endpoint on indexd to register the record.\n", + "\n", + "The minimum information necessary to supply to indexd is the file size, the hash (in any of several common formats), a list of URLs pointing to where the data file is stored (which can be left empty),\n", + "and the form TODO. For this example we'll also give our imaginary file a name, and add `'*'` in the ACL list." + ] + }, + { + "cell_type": "code", + "execution_count": 229, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"rev\": \"be8c395f\",\n", + " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", + " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\"\n", + "}\n" + ] + } + ], + "source": [ + "data = {\n", + " 'size': 8,\n", + " 'hashes': {'md5': 'e561f9248d7563d15dd93457b02ebbb6'},\n", + " 'urls': [],\n", + " 'form': 'object',\n", + " 'file_name': 'example_file',\n", + " 'acl': ['*'],\n", + "}\n", + "response = requests.post(indexd('/index/'), json=data, auth=request_auth)\n", + "print_response(response)\n", + "# Save this stuff, we'll need to use it later.\n", + "v_0_did = response.json()['did']\n", + "v_0_baseid = response.json()['baseid']\n", + "v_0_rev = response.json()['rev']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Success!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieving Records" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now the list of records returned from indexd should have our new entry—let's check, again using a `GET` to the `/index/` endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": 230, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"version\": null,\n", + " \"metadata\": {},\n", + " \"urls\": [],\n", + " \"start\": null,\n", + " \"size\": null,\n", + " \"limit\": 100,\n", + " \"records\": [\n", + " {\n", + " \"version\": null,\n", + " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", + " \"urls_metadata\": {},\n", + " \"urls\": [],\n", + " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", + " \"created_date\": \"2018-08-07T22:19:03.068052\",\n", + " \"size\": 8,\n", + " \"acl\": [\n", + " \"*\"\n", + " ],\n", + " \"metadata\": {},\n", + " \"hashes\": {\n", + " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", + " },\n", + " \"rev\": \"be8c395f\",\n", + " \"form\": \"object\",\n", + " \"updated_date\": \"2018-08-07T22:19:03.068062\",\n", + " \"file_name\": \"example_file\"\n", + " }\n", + " ],\n", + " \"ids\": null,\n", + " \"acl\": [],\n", + " \"hashes\": null,\n", + " \"file_name\": null\n", + "}\n" + ] + } + ], + "source": [ + "print_response(requests.get(indexd('/index/')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also look up this specific record using `GET` `/index/{UUID}`, where the UUID is the DID that indexd returned before when we created this record." + ] + }, + { + "cell_type": "code", + "execution_count": 231, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"version\": null,\n", + " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", + " \"urls_metadata\": {},\n", + " \"urls\": [],\n", + " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", + " \"created_date\": \"2018-08-07T22:19:03.068052\",\n", + " \"size\": 8,\n", + " \"acl\": [\n", + " \"*\"\n", + " ],\n", + " \"metadata\": {},\n", + " \"hashes\": {\n", + " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", + " },\n", + " \"rev\": \"be8c395f\",\n", + " \"form\": \"object\",\n", + " \"updated_date\": \"2018-08-07T22:19:03.068062\",\n", + " \"file_name\": \"example_file\"\n", + "}\n" + ] + } + ], + "source": [ + "path = indexd('/index/{}'.format(v_0_did))\n", + "print_response(requests.get(path))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Great, so we made a new record!...but what does any of that stuff mean?? Let's break this down." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### About Records in indexd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A single record in indexd contains several fields; let's go through each field and explain what these are for.\n", + "\n", + "#### `did` (\"digital identifier\")\n", + "\n", + "A unique identifier (UUID4) for the file; indexd will make these for new records automatically. Notice that the one that indexd generated for us looks like this:\n", + "```\n", + ":\n", + "```\n", + "TODO\n", + "\n", + "#### `baseid`\n", + "\n", + "The `baseid` is a common identifier for all versions of one file, across revisions.\n", + "\n", + "#### `rev`\n", + "\n", + "The `rev` field identifies a particular version of a file with multiple versions.\n", + "\n", + "#### `form`\n", + "\n", + "#### `size`\n", + "\n", + "This is just the filesize that we gave indexd originally for this file.\n", + "\n", + "#### `file_name`\n", + "\n", + "Optional field recording the filename of the indexed file.\n", + "\n", + "#### `metadata`\n", + "\n", + "#### `urls_metadata`\n", + "\n", + "#### `version`\n", + "\n", + "#### `urls`\n", + "\n", + "Like we mentioned above, this is the list of URLs which point to the real location of the stored data.\n", + "\n", + "#### `acl`\n", + "\n", + "#### `hashes`\n", + "\n", + "`hashes` is an object storing one or more hashes for the file itself. These can be any of:\n", + "- MD5\n", + "- SHA\n", + "- SHA256\n", + "- SHA512\n", + "- CRC\n", + "- ETag" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Record Versions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we've created a record, let's look at the process of updating this record with a new version. We're going to change the contents—and thus the size and the hash—of our imaginary file. Let's update indexd with the new information. To add a new version, we `POST` to `/index/{UUID}`, where the UUID is the DID of the existing file." + ] + }, + { + "cell_type": "code", + "execution_count": 232, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"rev\": \"2d09fa8d\",\n", + " \"did\": \"88bca605-42f9-40b1-a0e3-41e632276125\",\n", + " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\"\n", + "}\n" + ] + } + ], + "source": [ + "# Here's the new data for the \"file\".\n", + "data['size'] = 10\n", + "data['hashes'] = {'md5': 'f7952a9483fae0af6d41370d9333020b'}\n", + "\n", + "# We saved the DID for this file before.\n", + "path = indexd('/index/{}'.format(v_0_did))\n", + "response = requests.post(path, json=data, auth=request_auth)\n", + "v_1_baseid = response.json()['baseid']\n", + "v_1_did = response.json()['did']\n", + "v_1_rev = response.json()['rev']\n", + "print_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, if we compare this `baseid` to the `baseid` that indexd returned when we created the record for the original file, we see that this `baseid` remains the same." + ] + }, + { + "cell_type": "code", + "execution_count": 233, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True\n" + ] + } + ], + "source": [ + "print(v_0_baseid == v_1_baseid)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "However, this record has a different `rev` and a different `did` than the original." + ] + }, + { + "cell_type": "code", + "execution_count": 234, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "False\n", + "False\n" + ] + } + ], + "source": [ + "print(v_0_did == v_1_did)\n", + "print(v_0_rev == v_1_rev)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Having created the new version for this file, let's again make a request `GET` `/index/{UUID}`, using the shared `baseid`." + ] + }, + { + "cell_type": "code", + "execution_count": 235, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"version\": \"2.0\",\n", + " \"did\": \"88bca605-42f9-40b1-a0e3-41e632276125\",\n", + " \"urls_metadata\": {},\n", + " \"urls\": [],\n", + " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", + " \"created_date\": \"2018-08-07T22:19:03.147919\",\n", + " \"size\": 10,\n", + " \"acl\": [\n", + " \"*\"\n", + " ],\n", + " \"metadata\": {},\n", + " \"hashes\": {\n", + " \"md5\": \"f7952a9483fae0af6d41370d9333020b\"\n", + " },\n", + " \"rev\": \"2d09fa8d\",\n", + " \"form\": \"object\",\n", + " \"updated_date\": \"2018-08-07T22:19:03.147927\",\n", + " \"file_name\": \"example_file\"\n", + "}\n" + ] + } + ], + "source": [ + "path = indexd('/index/{}'.format(v_0_baseid))\n", + "response = requests.get(path)\n", + "print_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The information for this record reflects the new changes to the file." + ] + }, + { + "cell_type": "code", + "execution_count": 236, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True\n" + ] + } + ], + "source": [ + "print(response.json()['did'] == v_1_did)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "However, the original information still exists. We can make a request again using the DID of the original file, and see that this revision hasn't changed." + ] + }, + { + "cell_type": "code", + "execution_count": 237, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"version\": null,\n", + " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", + " \"urls_metadata\": {},\n", + " \"urls\": [],\n", + " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", + " \"created_date\": \"2018-08-07T22:19:03.068052\",\n", + " \"size\": 8,\n", + " \"acl\": [\n", + " \"*\"\n", + " ],\n", + " \"metadata\": {},\n", + " \"hashes\": {\n", + " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", + " },\n", + " \"rev\": \"be8c395f\",\n", + " \"form\": \"object\",\n", + " \"updated_date\": \"2018-08-07T22:19:03.068062\",\n", + " \"file_name\": \"example_file\"\n", + "}\n" + ] + } + ], + "source": [ + "path = indexd('/index/{}'.format(v_0_did))\n", + "print_response(requests.get(path))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we can look at the whole list of versions for a single file, with `GET` `/index/{UUID}/versions`. The object in the response will contain the records for every version of this file as key-value pairs, where the keys are just numeric indexes (in string form) and the values are the records." + ] + }, + { + "cell_type": "code", + "execution_count": 266, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"1\": {\n", + " \"version\": \"2.0\",\n", + " \"did\": \"88bca605-42f9-40b1-a0e3-41e632276125\",\n", + " \"urls_metadata\": {},\n", + " \"urls\": [],\n", + " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", + " \"created_date\": \"2018-08-07T22:19:03.147919\",\n", + " \"size\": 10,\n", + " \"acl\": [\n", + " \"*\"\n", + " ],\n", + " \"metadata\": {},\n", + " \"hashes\": {\n", + " \"md5\": \"f7952a9483fae0af6d41370d9333020b\"\n", + " },\n", + " \"rev\": \"2d09fa8d\",\n", + " \"form\": \"object\",\n", + " \"updated_date\": \"2018-08-07T22:19:03.147927\",\n", + " \"file_name\": \"example_file\"\n", + " },\n", + " \"0\": {\n", + " \"version\": null,\n", + " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", + " \"urls_metadata\": {},\n", + " \"urls\": [],\n", + " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", + " \"created_date\": \"2018-08-07T22:19:03.068052\",\n", + " \"size\": 8,\n", + " \"acl\": [\n", + " \"*\"\n", + " ],\n", + " \"metadata\": {},\n", + " \"hashes\": {\n", + " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", + " },\n", + " \"rev\": \"be8c395f\",\n", + " \"form\": \"object\",\n", + " \"updated_date\": \"2018-08-07T22:19:03.068062\",\n", + " \"file_name\": \"example_file\"\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "path = indexd('/index/{}/versions'.format(v_0_baseid))\n", + "print_response(requests.get(path))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Record Aliases" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 1a03bdfca066956167b08d7d1958d6eb1885f186 Mon Sep 17 00:00:00 2001 From: Rudyard Richter Date: Wed, 8 Aug 2018 19:27:46 -0500 Subject: [PATCH 2/2] docs(ipynb): work on demo notebook --- examples/indexd_demo.ipynb | 947 ++++++++++++++++++++++++++++--------- 1 file changed, 733 insertions(+), 214 deletions(-) diff --git a/examples/indexd_demo.ipynb b/examples/indexd_demo.ipynb index 9970a9f3..5fa29732 100644 --- a/examples/indexd_demo.ipynb +++ b/examples/indexd_demo.ipynb @@ -11,14 +11,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## About indexd" + "## What is indexd?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Indexd, in a nutshell, is a microservice which maintains URLs as pointers to stored data files. Indexd adds a layer of abstraction over stored data files: the data can move between or live in multiple locations, while the unique identifier for each file, kept in indexd, allows us to obtain the URLs (and some miscellaneous metadata) for the same stored data. Additionally, indexd tracks revisions of the same data file." + "The name \"indexd\" signifies (in the typical convention) \"index daemon\". While the name might not be accurate in the technical sense of a daemon, this summarizes its basic purpose. Indexd, in a nutshell, is a microservice which maintains URLs as pointers to stored data files. Indexd adds a layer of abstraction over stored data files: the data can move between or live in multiple locations, while the unique identifier for each file, kept in indexd, allows us to obtain the URLs (and some miscellaneous metadata) for the same stored data. Additionally, indexd tracks revisions of the same data file." ] }, { @@ -29,15 +29,10 @@ ] }, { - "cell_type": "code", - "execution_count": 223, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "import json\n", - "from urllib.parse import urljoin\n", - "\n", - "import requests" + "Throughout this demo we're going to use direct API calls to indexd, just to get a sense for the API and what's going on \"under the hood\". For actually interfacing with indexd in our code we use another library called \"indexclient\" (can you guess what this does?). As we work through the demo we'll show the code both for making calls directly to indexd and for using indexclient." ] }, { @@ -51,7 +46,31 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To start, run indexd on `localhost:8080`. Probably the easiest way is with a docker container:\n", + "For this demo make sure the `indexclient` package is installed such that it can be used here in jupyter. I used this to install it in this notebook:\n", + "```\n", + "import sys\n", + "!cd ~/cdis/indexclient; {sys.executable} setup.py develop --user\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 430, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from urllib.parse import urljoin\n", + "\n", + "from indexclient.client import IndexClient\n", + "import requests" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To start, we'll run indexd on `localhost:8080`. Probably the easiest way is with a docker container:\n", "```bash\n", "# Start from indexd directory\n", "# Build the docker image if you don't have it yet\n", @@ -75,12 +94,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "(Here we set up a bit of code just to make the API calls more concise and readable.)" + "(Here we set up a bit of code just to make printing out the API calls more concise and readable.)" ] }, { "cell_type": "code", - "execution_count": 224, + "execution_count": 431, "metadata": {}, "outputs": [], "source": [ @@ -109,7 +128,7 @@ }, { "cell_type": "code", - "execution_count": 225, + "execution_count": 432, "metadata": {}, "outputs": [], "source": [ @@ -126,14 +145,29 @@ }, { "cell_type": "code", - "execution_count": 226, + "execution_count": 433, "metadata": {}, "outputs": [], "source": [ - "# WARNING: don't do this if you want to keep your existing records!\n", "wipe_indexd()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll set up an `IndexClient` as well, which is what our other code actually uses to interface with indexd." + ] + }, + { + "cell_type": "code", + "execution_count": 434, + "metadata": {}, + "outputs": [], + "source": [ + "client = IndexClient(baseurl=base, auth=request_auth)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -143,22 +177,49 @@ }, { "cell_type": "code", - "execution_count": 227, + "execution_count": 435, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ + "GET http://localhost:8080/_status\n", + "\n", "\n", "Healthy\n" ] } ], "source": [ + "print('GET {}'.format(indexd('/_status')))\n", + "print()\n", "print_response(requests.get(indexd('/_status')))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also check the status through the client." + ] + }, + { + "cell_type": "code", + "execution_count": 436, + "metadata": {}, + "outputs": [], + "source": [ + "client.check_status()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(It doesn't return anything if indexd is working.)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -168,7 +229,7 @@ }, { "cell_type": "code", - "execution_count": 228, + "execution_count": 437, "metadata": { "scrolled": false }, @@ -177,27 +238,58 @@ "name": "stdout", "output_type": "stream", "text": [ + "GET http://localhost:8080/index/\n", + "\n", "\n", "{\n", " \"version\": null,\n", - " \"metadata\": {},\n", - " \"urls\": [],\n", - " \"start\": null,\n", " \"size\": null,\n", - " \"limit\": 100,\n", - " \"records\": [],\n", - " \"ids\": null,\n", + " \"file_name\": null,\n", " \"acl\": [],\n", + " \"ids\": null,\n", + " \"start\": null,\n", + " \"metadata\": {},\n", + " \"limit\": 100,\n", " \"hashes\": null,\n", - " \"file_name\": null\n", + " \"urls\": [],\n", + " \"records\": []\n", "}\n" ] } ], "source": [ + "print('GET {}'.format(indexd('/index/')))\n", + "print()\n", "print_response(requests.get(indexd('/index/')))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Listing records with the client (the return value will have just the records, and not the extra information returned from the endpoint):" + ] + }, + { + "cell_type": "code", + "execution_count": 438, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 438, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(client.list())" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -224,7 +316,7 @@ }, { "cell_type": "code", - "execution_count": 229, + "execution_count": 439, "metadata": { "scrolled": false }, @@ -233,158 +325,156 @@ "name": "stdout", "output_type": "stream", "text": [ + "POST http://localhost:8080/index/\n", + "\n", "\n", "{\n", - " \"rev\": \"be8c395f\",\n", - " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", - " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\"\n", + " \"baseid\": \"7b044aa0-1c65-4874-831b-9d69a602d6f4\",\n", + " \"rev\": \"55633f08\",\n", + " \"did\": \"testprefix:8511e34c-655c-4025-8d21-a4bf3bf2e5d3\"\n", "}\n" ] } ], "source": [ - "data = {\n", + "hashes = {'md5': 'e561f9248d7563d15dd93457b02ebbb6'}\n", + "size = 8\n", + "data_v_0 = {\n", + " 'hashes': hashes,\n", " 'size': 8,\n", - " 'hashes': {'md5': 'e561f9248d7563d15dd93457b02ebbb6'},\n", - " 'urls': [],\n", + " 'urls': [\"storage://file/path/example_file\"],\n", " 'form': 'object',\n", " 'file_name': 'example_file',\n", " 'acl': ['*'],\n", "}\n", - "response = requests.post(indexd('/index/'), json=data, auth=request_auth)\n", - "print_response(response)\n", - "# Save this stuff, we'll need to use it later.\n", - "v_0_did = response.json()['did']\n", - "v_0_baseid = response.json()['baseid']\n", - "v_0_rev = response.json()['rev']" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Success!" + "\n", + "print('POST {}'.format(indexd('/index/')))\n", + "print()\n", + "response = requests.post(indexd('/index/'), json=data_v_0, auth=request_auth)\n", + "print_response(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Retrieving Records" + "Success! We see in the response we have these three fields, `rev`, `did`, and `baseid`. These uniquely identify certain things about this record.\n", + "\n", + "- `did` is the ID for this record specifically.\n", + "- `baseid` is a common identifier for all versions of the same record; we'll come back to versioning later.\n", + "- `rev` is the identifier for this version." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now the list of records returned from indexd should have our new entry—let's check, again using a `GET` to the `/index/` endpoint." + "Let's repeat that, this time using the client. The `IndexClient`, for returning index records, returns a `Document` object containing all the information for an index record." ] }, { "cell_type": "code", - "execution_count": 230, - "metadata": { - "scrolled": false - }, + "execution_count": 440, + "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "\n", - "{\n", - " \"version\": null,\n", - " \"metadata\": {},\n", - " \"urls\": [],\n", - " \"start\": null,\n", - " \"size\": null,\n", - " \"limit\": 100,\n", - " \"records\": [\n", - " {\n", - " \"version\": null,\n", - " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", - " \"urls_metadata\": {},\n", - " \"urls\": [],\n", - " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", - " \"created_date\": \"2018-08-07T22:19:03.068052\",\n", - " \"size\": 8,\n", - " \"acl\": [\n", - " \"*\"\n", - " ],\n", - " \"metadata\": {},\n", - " \"hashes\": {\n", - " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", - " },\n", - " \"rev\": \"be8c395f\",\n", - " \"form\": \"object\",\n", - " \"updated_date\": \"2018-08-07T22:19:03.068062\",\n", - " \"file_name\": \"example_file\"\n", - " }\n", - " ],\n", - " \"ids\": null,\n", - " \"acl\": [],\n", - " \"hashes\": null,\n", - " \"file_name\": null\n", - "}\n" + "Document attributes and methods:\n", + "[\n", + " \"acl\",\n", + " \"baseid\",\n", + " \"client\",\n", + " \"created_date\",\n", + " \"delete\",\n", + " \"did\",\n", + " \"file_name\",\n", + " \"form\",\n", + " \"hashes\",\n", + " \"metadata\",\n", + " \"patch\",\n", + " \"rev\",\n", + " \"size\",\n", + " \"to_json\",\n", + " \"updated_date\",\n", + " \"urls\",\n", + " \"urls_metadata\",\n", + " \"version\"\n", + "]\n" ] } ], "source": [ - "print_response(requests.get(indexd('/index/')))" + "wipe_indexd()\n", + "client_create_kwargs = dict(data_v_0)\n", + "client_create_kwargs.pop('form')\n", + "\n", + "# Use the IndexClient to create a new record.\n", + "doc = client.create(**client_create_kwargs)\n", + "\n", + "print('Document attributes and methods:')\n", + "print(json.dumps(\n", + " list(attr for attr in dir(doc) if not attr.startswith('_')),\n", + " indent=4,\n", + "))\n", + "\n", + "# Save this stuff, we'll need to use it later.\n", + "v_0 = doc.to_json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can also look up this specific record using `GET` `/index/{UUID}`, where the UUID is the DID that indexd returned before when we created this record." + "We can convert the document into JSON, to get all the properties in the same format as they would be returned from the API." ] }, { "cell_type": "code", - "execution_count": 231, - "metadata": { - "scrolled": false - }, + "execution_count": 441, + "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "\n", "{\n", - " \"version\": null,\n", - " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", - " \"urls_metadata\": {},\n", - " \"urls\": [],\n", - " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", - " \"created_date\": \"2018-08-07T22:19:03.068052\",\n", + " \"updated_date\": \"2018-08-09T00:22:20.039646\",\n", + " \"urls_metadata\": {\n", + " \"storage://file/path/example_file\": {}\n", + " },\n", + " \"baseid\": \"91a1a213-b630-4ec8-88d4-e38fc9fae968\",\n", + " \"hashes\": {\n", + " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", + " },\n", + " \"urls\": [\n", + " \"storage://file/path/example_file\"\n", + " ],\n", + " \"form\": \"object\",\n", " \"size\": 8,\n", + " \"file_name\": \"example_file\",\n", + " \"did\": \"testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\",\n", " \"acl\": [\n", " \"*\"\n", " ],\n", " \"metadata\": {},\n", - " \"hashes\": {\n", - " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", - " },\n", - " \"rev\": \"be8c395f\",\n", - " \"form\": \"object\",\n", - " \"updated_date\": \"2018-08-07T22:19:03.068062\",\n", - " \"file_name\": \"example_file\"\n", + " \"created_date\": \"2018-08-09T00:22:20.039637\",\n", + " \"rev\": \"861b0dab\",\n", + " \"version\": null\n", "}\n" ] } ], "source": [ - "path = indexd('/index/{}'.format(v_0_did))\n", - "print_response(requests.get(path))" + "print(json.dumps(doc.to_json(), indent=4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Great, so we made a new record!...but what does any of that stuff mean?? Let's break this down." + "Great, so we made a new record with some basic information. Now, let's take a closer look at the fields the go into a record." ] }, { @@ -406,7 +496,7 @@ "```\n", ":\n", "```\n", - "TODO\n", + "We're going to discuss these prefixes in more detail in a later section.\n", "\n", "#### `baseid`\n", "\n", @@ -416,8 +506,6 @@ "\n", "The `rev` field identifies a particular version of a file with multiple versions.\n", "\n", - "#### `form`\n", - "\n", "#### `size`\n", "\n", "This is just the filesize that we gave indexd originally for this file.\n", @@ -426,11 +514,9 @@ "\n", "Optional field recording the filename of the indexed file.\n", "\n", - "#### `metadata`\n", - "\n", - "#### `urls_metadata`\n", + "#### `created_date`\n", "\n", - "#### `version`\n", + "The time that this record was created.\n", "\n", "#### `urls`\n", "\n", @@ -438,6 +524,8 @@ "\n", "#### `acl`\n", "\n", + "\"Access control list\". Fence uses this list to control authorization when generating pre-signed URLs.\n", + "\n", "#### `hashes`\n", "\n", "`hashes` is an object storing one or more hashes for the file itself. These can be any of:\n", @@ -446,7 +534,218 @@ "- SHA256\n", "- SHA512\n", "- CRC\n", - "- ETag" + "- ETag\n", + "\n", + "For this demo we'll skip over a few record fields: `form`, `metadata`, `urls_metadata`, and `version`, all of which are not used extensively or specific to use in the GDC." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we've seen some examples and know what the fields mean, we're going to trim the fields in the next examples to keep things concise." + ] + }, + { + "cell_type": "code", + "execution_count": 442, + "metadata": {}, + "outputs": [], + "source": [ + "def print_record(record):\n", + " \"\"\"\n", + " Utility function to print subset of record fields.\n", + " \"\"\"\n", + " print(record['file_name'])\n", + " print('urls: {}'.format(record['urls']))\n", + " print('size: {}'.format(record['size']))\n", + " print('baseid: {}'.format(record['baseid']))\n", + " print('rev: {}'.format(record['rev']))\n", + " print('did: {}'.format(record['did']))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Retrieving Records" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now the list of records returned from indexd should have our new entry—let's check, again using a `GET` to the `/index/` endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": 443, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GET http://localhost:8080/index/\n", + "\n", + "example_file\n", + "urls: ['storage://file/path/example_file']\n", + "size: 8\n", + "baseid: 91a1a213-b630-4ec8-88d4-e38fc9fae968\n", + "rev: 861b0dab\n", + "did: testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n" + ] + } + ], + "source": [ + "print('GET {}'.format(indexd('/index/')))\n", + "print()\n", + "response = requests.get(indexd('/index/'))\n", + "print_record(response.json()['records'][0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can look up this specific record using `GET` `/index/{UUID}`, where the UUID is the DID that indexd returned before when we created this record." + ] + }, + { + "cell_type": "code", + "execution_count": 444, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GET http://localhost:8080/index/testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n", + "\n", + "example_file\n", + "urls: ['storage://file/path/example_file']\n", + "size: 8\n", + "baseid: 91a1a213-b630-4ec8-88d4-e38fc9fae968\n", + "rev: 861b0dab\n", + "did: testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n" + ] + } + ], + "source": [ + "path = indexd('/index/{}'.format(v_0['did']))\n", + "\n", + "print('GET {}'.format(path))\n", + "print()\n", + "print_record(requests.get(path).json())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's also search for this record through the client." + ] + }, + { + "cell_type": "code", + "execution_count": 445, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "example_file\n", + "urls: ['storage://file/path/example_file']\n", + "size: 8\n", + "baseid: 91a1a213-b630-4ec8-88d4-e38fc9fae968\n", + "rev: 861b0dab\n", + "did: testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n" + ] + } + ], + "source": [ + "doc = client.get(v_0['did'])\n", + "print_record(doc.to_json())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also search through all the records, but apply an argument to filter by hash, size, and/or URL.\n", + "\n", + "Let's apply the `hash` argument in the query string, and give it the md5 hash for our file." + ] + }, + { + "cell_type": "code", + "execution_count": 446, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GET http://localhost:8080/index?hash=md5:e561f9248d7563d15dd93457b02ebbb6\n", + "\n", + "Returned 1 records\n", + "\n", + "example_file\n", + "urls: ['storage://file/path/example_file']\n", + "size: 8\n", + "baseid: 91a1a213-b630-4ec8-88d4-e38fc9fae968\n", + "rev: 861b0dab\n", + "did: testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n" + ] + } + ], + "source": [ + "path = indexd('/index?hash=md5:{}'.format(v_0['hashes']['md5']))\n", + "records = requests.get(path).json()['records']\n", + "\n", + "print('GET {}'.format(path))\n", + "print()\n", + "print('Returned {} records'.format(len(records)))\n", + "for record in records:\n", + " print()\n", + " print_record(record)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And of course, we can accomplish the same thing using the `IndexClient`." + ] + }, + { + "cell_type": "code", + "execution_count": 447, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "example_file\n", + "urls: ['storage://file/path/example_file']\n", + "size: 8\n", + "baseid: 91a1a213-b630-4ec8-88d4-e38fc9fae968\n", + "rev: 861b0dab\n", + "did: testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n" + ] + } + ], + "source": [ + "doc = client.get_with_params(params={'hashes': v_0['hashes']})\n", + "print_record(doc.to_json())" ] }, { @@ -460,38 +759,42 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that we've created a record, let's look at the process of updating this record with a new version. We're going to change the contents—and thus the size and the hash—of our imaginary file. Let's update indexd with the new information. To add a new version, we `POST` to `/index/{UUID}`, where the UUID is the DID of the existing file." + "Now that we've created a record, let's look at the process of updating this record with a new version. We're going to change the contents—and thus the size and the hash—of our imaginary file. Let's update indexd with the new information. To add a new version, we `POST` to `/index/{UUID}`, where the UUID is an identifier for the existing file." ] }, { "cell_type": "code", - "execution_count": 232, + "execution_count": 448, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ + "POST http://localhost:8080/index/testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n", + "\n", "\n", "{\n", - " \"rev\": \"2d09fa8d\",\n", - " \"did\": \"88bca605-42f9-40b1-a0e3-41e632276125\",\n", - " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\"\n", + " \"baseid\": \"91a1a213-b630-4ec8-88d4-e38fc9fae968\",\n", + " \"rev\": \"f792a6b2\",\n", + " \"did\": \"d7b0ad4e-8afe-4480-ae68-9ac6ea60a082\"\n", "}\n" ] } ], "source": [ "# Here's the new data for the \"file\".\n", - "data['size'] = 10\n", - "data['hashes'] = {'md5': 'f7952a9483fae0af6d41370d9333020b'}\n", + "data_v_1 = dict(data_v_0)\n", + "data_v_1['size'] = 10\n", + "data_v_1['hashes'] = {'md5': 'f7952a9483fae0af6d41370d9333020b'}\n", "\n", "# We saved the DID for this file before.\n", - "path = indexd('/index/{}'.format(v_0_did))\n", - "response = requests.post(path, json=data, auth=request_auth)\n", - "v_1_baseid = response.json()['baseid']\n", - "v_1_did = response.json()['did']\n", - "v_1_rev = response.json()['rev']\n", + "path = indexd('/index/{}'.format(v_0['did']))\n", + "print('POST {}'.format(path))\n", + "print()\n", + "response = requests.post(path, json=data_v_1, auth=request_auth)\n", + "# Also stash the return values from this response.\n", + "v_1 = response.json()\n", "print_response(response)" ] }, @@ -504,19 +807,26 @@ }, { "cell_type": "code", - "execution_count": 233, + "execution_count": 449, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "True\n" + "Same `baseid`? True\n" ] } ], "source": [ - "print(v_0_baseid == v_1_baseid)" + "print('Same `baseid`? {}'.format(v_0['baseid'] == v_1['baseid']))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All revisions of the same file will share this `baseid`." ] }, { @@ -528,21 +838,21 @@ }, { "cell_type": "code", - "execution_count": 234, + "execution_count": 450, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "False\n", - "False\n" + "Same `did`? False\n", + "Same `rev`? False\n" ] } ], "source": [ - "print(v_0_did == v_1_did)\n", - "print(v_0_rev == v_1_rev)" + "print('Same `did`? {}'.format(v_0['did'] == v_1['did']))\n", + "print('Same `rev`? {}'.format(v_0['rev'] == v_1['rev']))" ] }, { @@ -554,53 +864,42 @@ }, { "cell_type": "code", - "execution_count": 235, + "execution_count": 451, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "\n", - "{\n", - " \"version\": \"2.0\",\n", - " \"did\": \"88bca605-42f9-40b1-a0e3-41e632276125\",\n", - " \"urls_metadata\": {},\n", - " \"urls\": [],\n", - " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", - " \"created_date\": \"2018-08-07T22:19:03.147919\",\n", - " \"size\": 10,\n", - " \"acl\": [\n", - " \"*\"\n", - " ],\n", - " \"metadata\": {},\n", - " \"hashes\": {\n", - " \"md5\": \"f7952a9483fae0af6d41370d9333020b\"\n", - " },\n", - " \"rev\": \"2d09fa8d\",\n", - " \"form\": \"object\",\n", - " \"updated_date\": \"2018-08-07T22:19:03.147927\",\n", - " \"file_name\": \"example_file\"\n", - "}\n" + "GET http://localhost:8080/index/91a1a213-b630-4ec8-88d4-e38fc9fae968\n", + "\n", + "example_file\n", + "urls: ['storage://file/path/example_file']\n", + "size: 10\n", + "baseid: 91a1a213-b630-4ec8-88d4-e38fc9fae968\n", + "rev: f792a6b2\n", + "did: d7b0ad4e-8afe-4480-ae68-9ac6ea60a082\n" ] } ], "source": [ - "path = indexd('/index/{}'.format(v_0_baseid))\n", + "path = indexd('/index/{}'.format(v_0['baseid']))\n", + "print('GET {}'.format(path))\n", + "print()\n", "response = requests.get(path)\n", - "print_response(response)" + "print_record(response.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The information for this record reflects the new changes to the file." + "The information for this record reflects the new changes to the file: the size and `did` have changed. The `baseid` is the same." ] }, { "cell_type": "code", - "execution_count": 236, + "execution_count": 452, "metadata": {}, "outputs": [ { @@ -612,7 +911,7 @@ } ], "source": [ - "print(response.json()['did'] == v_1_did)" + "print(response.json()['did'] == v_1['did'])" ] }, { @@ -624,40 +923,29 @@ }, { "cell_type": "code", - "execution_count": 237, + "execution_count": 453, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "\n", - "{\n", - " \"version\": null,\n", - " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", - " \"urls_metadata\": {},\n", - " \"urls\": [],\n", - " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", - " \"created_date\": \"2018-08-07T22:19:03.068052\",\n", - " \"size\": 8,\n", - " \"acl\": [\n", - " \"*\"\n", - " ],\n", - " \"metadata\": {},\n", - " \"hashes\": {\n", - " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", - " },\n", - " \"rev\": \"be8c395f\",\n", - " \"form\": \"object\",\n", - " \"updated_date\": \"2018-08-07T22:19:03.068062\",\n", - " \"file_name\": \"example_file\"\n", - "}\n" + "GET http://localhost:8080/index/testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n", + "\n", + "example_file\n", + "urls: ['storage://file/path/example_file']\n", + "size: 8\n", + "baseid: 91a1a213-b630-4ec8-88d4-e38fc9fae968\n", + "rev: 861b0dab\n", + "did: testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\n" ] } ], "source": [ - "path = indexd('/index/{}'.format(v_0_did))\n", - "print_response(requests.get(path))" + "path = indexd('/index/{}'.format(v_0['did']))\n", + "print('GET {}'.format(path))\n", + "print()\n", + "print_record(requests.get(path).json())" ] }, { @@ -669,70 +957,301 @@ }, { "cell_type": "code", - "execution_count": 266, + "execution_count": 454, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ + "GET http://localhost:8080/index/91a1a213-b630-4ec8-88d4-e38fc9fae968/versions\n", + "\n", "\n", "{\n", " \"1\": {\n", - " \"version\": \"2.0\",\n", - " \"did\": \"88bca605-42f9-40b1-a0e3-41e632276125\",\n", - " \"urls_metadata\": {},\n", - " \"urls\": [],\n", - " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", - " \"created_date\": \"2018-08-07T22:19:03.147919\",\n", - " \"size\": 10,\n", - " \"acl\": [\n", - " \"*\"\n", - " ],\n", - " \"metadata\": {},\n", + " \"updated_date\": \"2018-08-09T00:22:20.246230\",\n", + " \"urls_metadata\": {\n", + " \"storage://file/path/example_file\": {}\n", + " },\n", + " \"baseid\": \"91a1a213-b630-4ec8-88d4-e38fc9fae968\",\n", " \"hashes\": {\n", " \"md5\": \"f7952a9483fae0af6d41370d9333020b\"\n", " },\n", - " \"rev\": \"2d09fa8d\",\n", + " \"urls\": [\n", + " \"storage://file/path/example_file\"\n", + " ],\n", " \"form\": \"object\",\n", - " \"updated_date\": \"2018-08-07T22:19:03.147927\",\n", - " \"file_name\": \"example_file\"\n", - " },\n", - " \"0\": {\n", + " \"size\": 10,\n", + " \"file_name\": \"example_file\",\n", " \"version\": null,\n", - " \"did\": \"testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc\",\n", - " \"urls_metadata\": {},\n", - " \"urls\": [],\n", - " \"baseid\": \"cef3e517-a7e9-4381-9687-0ba11fc177b1\",\n", - " \"created_date\": \"2018-08-07T22:19:03.068052\",\n", - " \"size\": 8,\n", " \"acl\": [\n", " \"*\"\n", " ],\n", " \"metadata\": {},\n", + " \"created_date\": \"2018-08-09T00:22:20.246219\",\n", + " \"rev\": \"f792a6b2\",\n", + " \"did\": \"d7b0ad4e-8afe-4480-ae68-9ac6ea60a082\"\n", + " },\n", + " \"0\": {\n", + " \"updated_date\": \"2018-08-09T00:22:20.039646\",\n", + " \"urls_metadata\": {\n", + " \"storage://file/path/example_file\": {}\n", + " },\n", + " \"baseid\": \"91a1a213-b630-4ec8-88d4-e38fc9fae968\",\n", " \"hashes\": {\n", " \"md5\": \"e561f9248d7563d15dd93457b02ebbb6\"\n", " },\n", - " \"rev\": \"be8c395f\",\n", + " \"urls\": [\n", + " \"storage://file/path/example_file\"\n", + " ],\n", " \"form\": \"object\",\n", - " \"updated_date\": \"2018-08-07T22:19:03.068062\",\n", - " \"file_name\": \"example_file\"\n", + " \"size\": 8,\n", + " \"file_name\": \"example_file\",\n", + " \"version\": null,\n", + " \"acl\": [\n", + " \"*\"\n", + " ],\n", + " \"metadata\": {},\n", + " \"created_date\": \"2018-08-09T00:22:20.039637\",\n", + " \"rev\": \"861b0dab\",\n", + " \"did\": \"testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c\"\n", " }\n", "}\n" ] } ], "source": [ - "path = indexd('/index/{}/versions'.format(v_0_baseid))\n", + "path = indexd('/index/{}/versions'.format(v_0['baseid']))\n", + "print('GET {}'.format(path))\n", + "print()\n", "print_response(requests.get(path))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As a final point on the versioning capabilities in indexd, note what happens when we try to update the \"version 0\" of this file. We can do this using a `PUT` to `/index/{did}?{rev}` using the `did` and `rev` values for the first version we created. Let's suppose we're going to try to move this file to a different storage location." + ] + }, + { + "cell_type": "code", + "execution_count": 460, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "PUT http://localhost:8080/index/testprefix:a62d7817-f43a-4281-ac5d-98a8d7e5af1c?861b0dab\n", + "\n", + "\n", + "{\n", + " \"error\": \"revision mismatch\"\n", + "}\n" + ] + } + ], + "source": [ + "data_v_1_1 = {\n", + " 'urls': ['storage://different/file/path']\n", + "}\n", + "\n", + "path = indexd('/index/{}?{}'.format(v_0['did'], v_0['rev']))\n", + "\n", + "print('PUT {}'.format(path))\n", + "print()\n", + "response = requests.put(path, json=data_v_1_1, auth=request_auth)\n", + "print_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This operation is not allowed because we tried to modify an older version of this record. This disallows applying conflicting updates to the same record, since they must always operate on the latest version." + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Record Aliases" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Keeping track of records with UUID4s works well for Python code but less so for humans (or even not for human-readability but just semantic significance). To help the humans keep things straight, indexd supports aliases for its records. The endpoints for listing, creating, updating, and removing aliases are at the `/alias` endpoints in indexd.\n", + "\n", + "To start, let's list the existing aliases." + ] + }, + { + "cell_type": "code", + "execution_count": 456, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"size\": null,\n", + " \"limit\": 100,\n", + " \"aliases\": [\n", + " \"foo\"\n", + " ],\n", + " \"hashes\": null,\n", + " \"start\": null\n", + "}\n" + ] + } + ], + "source": [ + "print_response(requests.get(indexd('/alias/')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The response is, unsurpringly, empty, since we haven't made any yet. Let's do that. To make an alias we're going to send a `PUT` to `/alias/{ALIAS_STRING}`, where `ALIAS_STRING` is the more human-readable name that we want to attach to a record. In the body, we send the information about the record we want to use." + ] + }, + { + "cell_type": "code", + "execution_count": 457, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"name\": \"foo\",\n", + " \"rev\": \"8ff8d788\"\n", + "}\n" + ] + } + ], + "source": [ + "data = {\n", + " 'release': 'public',\n", + " 'size': data_v_1['size'],\n", + " 'hashes': data_v_1['hashes'],\n", + "}\n", + "alias = 'foo'\n", + "path = indexd('/alias/{}'.format(alias))\n", + "response = requests.put(path, json=data, auth=request_auth)\n", + "print_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we have an alias for this record." + ] + }, + { + "cell_type": "code", + "execution_count": 458, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "{\n", + " \"size\": null,\n", + " \"limit\": 100,\n", + " \"aliases\": [\n", + " \"foo\"\n", + " ],\n", + " \"hashes\": null,\n", + " \"start\": null\n", + "}\n" + ] + } + ], + "source": [ + "print_response(requests.get(indexd('/alias/')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can get the information for this alias now using `GET` `/alias/{ALIAS_NAME}`." + ] + }, + { + "cell_type": "code", + "execution_count": 459, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GET http://localhost:8080/alias/foo\n", + "\n", + "\n", + "{\n", + " \"host_authorities\": [],\n", + " \"name\": \"foo\",\n", + " \"size\": 10,\n", + " \"start\": 0,\n", + " \"keeper_authority\": null,\n", + " \"release\": \"public\",\n", + " \"metadata\": null,\n", + " \"limit\": 100,\n", + " \"rev\": \"8ff8d788\",\n", + " \"hashes\": {\n", + " \"md5\": \"f7952a9483fae0af6d41370d9333020b\"\n", + " },\n", + " \"urls\": [\n", + " {\n", + " \"metadata\": {},\n", + " \"url\": \"storage://file/path/example_file\"\n", + " }\n", + " ]\n", + "}\n" + ] + } + ], + "source": [ + "path = indexd('/alias/foo')\n", + "print('GET {}'.format(path))\n", + "print()\n", + "print_response(requests.get(path))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## About Prefixes and Data GUIDs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Indexd in other Gen3 Services" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In sheepdog, creating metadata automatically registers an index with indexd; see [`FileUploadEntity._register_index`](https://github.com/uc-cdis/sheepdog/blob/0c2e9eec3d6c79d46cbf35d687958cfbadcb1ce1/sheepdog/transactions/upload/sub_entities.py#L273-L312)." + ] } ], "metadata": {