Skip to content

Commit a055272

Browse files
authored
Merge pull request #548 from Mats-SX/notebook-updates
Prepare notebook for field testing
2 parents 6480da6 + 419281f commit a055272

File tree

2 files changed

+94
-31
lines changed

2 files changed

+94
-31
lines changed

examples/dev/aura-only-features.ipynb

Lines changed: 94 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,19 @@
2727
"In this notebook, we will illustrate just briefly what example data we are using."
2828
]
2929
},
30+
{
31+
"cell_type": "code",
32+
"execution_count": null,
33+
"metadata": {},
34+
"outputs": [],
35+
"source": [
36+
"# Just to begin, let's make sure we have the correct version of the GDS Python Client installed\n",
37+
"\n",
38+
"from graphdatascience import __version__\n",
39+
"\n",
40+
"assert __version__ == \"1.9a1\""
41+
]
42+
},
3043
{
3144
"cell_type": "markdown",
3245
"metadata": {},
@@ -40,7 +53,7 @@
4053
"metadata": {},
4154
"outputs": [],
4255
"source": [
43-
"db_id = \"YOU_DATABASE_ID\"\n",
56+
"db_id = \"YOUR_DATABASE_ID\"\n",
4457
"db_password = \"YOUR_DATABASE_PASSWORD\""
4558
]
4659
},
@@ -59,17 +72,17 @@
5972
"source": [
6073
"import os\n",
6174
"\n",
62-
"# AuraDB data ingestion\n",
6375
"from neo4j import GraphDatabase\n",
6476
"from graphdatascience.query_runner.aura_db_arrow_query_runner import AuraDbConnectionInfo\n",
6577
"\n",
66-
"# We need to tell the GDS client that we are working with a devenvironment. This does not need to be set in production.\n",
78+
"# We need to tell the GDS client that we are working with a devenvironment.\n",
79+
"# This does not need to be set in production.\n",
6780
"os.environ[\"AURA_ENV\"] = \"devstrawberryfield\"\n",
6881
"\n",
6982
"db_connection_info = AuraDbConnectionInfo(\n",
7083
" f\"neo4j+s://{db_id}-{os.environ['AURA_ENV']}.databases.neo4j-dev.io\", (\"neo4j\", db_password)\n",
7184
")\n",
72-
"# start a Driver\n",
85+
"# start a standard Neo4j Python Driver to connect to the AuraDB instance\n",
7386
"driver = GraphDatabase.driver(db_connection_info.uri, auth=db_connection_info.auth)\n",
7487
"\n",
7588
"# try out our connection\n",
@@ -198,13 +211,13 @@
198211
"The GDS session offers all the GDS functionality that we are familiar with from AuraDS.\n",
199212
"However, since the idea is to offload database work to AuraDB, the GDS session is not to be considered a database instance.\n",
200213
"\n",
201-
"That means that all projections will go from AuraDB to GDS session, not from a local database.\n",
202-
"Similarly, writing back will follow the same path back to AuraDB, and not to a local database.\n",
214+
"That means that all projections will go from AuraDB to GDS session, not from a co-located database.\n",
215+
"Similarly, writing back will follow the same path back to AuraDB, and not to a co-located database.\n",
203216
"\n",
204217
"## Implementation limitation\n",
205218
"\n",
206219
"As mentioned in the parenthesis above, we do make use of existing AuraDS infrastructure to host the GDS sessions.\n",
207-
"Due to that fact, there actually is a local database, but we try to not expose its Bolt URI, in an attempt to prohibit users adding data to that database. "
220+
"Due to that fact, there actually is a co-located database, but we try to not expose its Bolt URI, in an attempt to prohibit users adding data to that database. "
208221
]
209222
},
210223
{
@@ -241,25 +254,34 @@
241254
},
242255
{
243256
"cell_type": "markdown",
257+
"metadata": {},
244258
"source": [
245259
"#### Creating a new session\n",
246260
"\n",
247261
"A user can create a new session by calling `sessions.create_gds`.\n",
248262
"A session is identified by a name and needs a password to be set. The password is necessary to reconnect to an existing session.\n",
249-
"Additionally an instance size can be provided. Possible values are `8GB`, `16GB`, `24GB` (`32GB`, `48GB`, `64GB`, `96GB` are not available in the testing environment)\n",
263+
"Additionally an instance size can be provided. Possible values are `8GB`, `16GB`, `24GB` (`32GB`, `48GB`, `64GB`, `96GB` are not available in the testing environment).\n",
250264
"\n",
251-
"Creating a new session takes a few minutes to complete. We know that this is not ideal and the problem is even exaggerated in the development environment because we do not keep that many cloud instances running in order to safe on cost. "
252-
],
253-
"metadata": {
254-
"collapsed": false
255-
}
265+
"Creating a new session takes a few minutes to complete. We know that this is not ideal and the problem is even exaggerated in the development environment because we do not keep that many cloud VMs running in order to keep costs low.\n",
266+
"\n",
267+
"💵💵💵💵💵💵\n",
268+
"\n",
269+
"💰💰💰💰💰💰\n",
270+
"\n",
271+
"💸💸💸💸💸💸\n",
272+
"\n",
273+
"NOTE: the creation of a session marks the start of billable activity.\n",
274+
"Sessions are machines that run in the cloud, and they cost money.\n",
275+
"This cost will accumulate for the lifetime of the session, which needs to be manually deleted."
276+
]
256277
},
257278
{
258279
"cell_type": "code",
259280
"execution_count": null,
260281
"metadata": {},
261282
"outputs": [],
262283
"source": [
284+
"# let's create a GDS session!\n",
263285
"gds = sessions.create_gds(\"pagerank-compute\", \"my-password\", \"8GB\")"
264286
]
265287
},
@@ -268,7 +290,7 @@
268290
"metadata": {},
269291
"source": [
270292
"Alternatively it is possible to reconnect to an existing session.\n",
271-
"This is especiially handy if the session ran a long computation and the client is disconnected."
293+
"This is especially handy if the session ran a long computation and the client is disconnected."
272294
]
273295
},
274296
{
@@ -289,19 +311,22 @@
289311
"In order to project graphs from an AuraDB instance into the GDS session we created a new projection method: `gds.graph.project.remoteDb`\n",
290312
"The projection works similar to Cypher projections V2 and is implemented as an Cypher Aggregation function.\n",
291313
"The Cypher query containing the projection function is executed on the AuraDB instance and the data it produces is transferred to the \n",
292-
"GDS session instance via Arrow. \n",
314+
"GDS session instance via an Arrow connection. \n",
293315
"\n",
294316
"There are two key differences between the remote projection and Cypher projections V2:\n",
295317
"\n",
296318
"1. In AuraDB, the aggregating function does not take a graph name as a parameter.\n",
297-
"2. The aggregation function should only be called through the python client `gds.graph.project.remoteDb`\n",
319+
"2. The aggregation function should only be called through the GDS Python Client endpoint `gds.graph.project.remoteDb`\n",
298320
"\n",
299321
"### Limitations\n",
300322
"\n",
301323
"The aggregation function is currently limited to projecting homogeneous graph schemas. \n",
302324
"That means that all nodes/relationships will have the same property keys regardless of their labels or type. \n",
303325
"The caller of the aggregation function must ensure to supply all possible properties for each node or relationship. Null values are not supported.\n",
304-
"\n"
326+
"\n",
327+
"The example data in this notebook contains only `User` nodes with `age` properties.\n",
328+
"If there are also `Product` nodes with `cost` properties then we would need to add placeholder `cost` and `age` properties on the `User` and `Product` nodes, respectively.\n",
329+
"This is a limitation we will attempt to address.\n"
305330
]
306331
},
307332
{
@@ -367,27 +392,53 @@
367392
},
368393
{
369394
"cell_type": "markdown",
395+
"metadata": {},
370396
"source": [
371397
"# Writing back to AuraDB\n",
372398
"\n",
373-
"The sessions in-memory graph was projected from data in AuraDB.\n",
374-
"Write back operations should thus also persist the data back to AuraDB.\n",
399+
"The session's in-memory graph was projected from data in AuraDB.\n",
400+
"Write back operations will thus persist the data back to the same AuraDB.\n",
375401
"\n",
376-
"When calling any write operations the python client will automatically use the new remote write back functionality so that no API changes are necessary."
377-
],
378-
"metadata": {
379-
"collapsed": false
380-
}
402+
"When calling any write operations the python client will automatically use the new remote write back functionality so that no API changes are necessary.\n",
403+
"\n",
404+
"The AuraDB coordinates are not stored in the GDS session, but in the client.\n",
405+
"Thus, it is important to set up the AuraSessions object with the DB credentials that identify the correct database from which the projection came."
406+
]
381407
},
382408
{
383409
"cell_type": "code",
384410
"execution_count": null,
385411
"metadata": {},
386412
"outputs": [],
387413
"source": [
414+
"# if this fails once with some error like \"unable to retrieve routing table\"\n",
415+
"# then run it again. this is a transient error with a stale server cache.\n",
388416
"gds.graph.nodeProperties.write(G, \"pagerank\")"
389417
]
390418
},
419+
{
420+
"cell_type": "markdown",
421+
"metadata": {},
422+
"source": [
423+
"Of course, we can just use `.write` modes as well:"
424+
]
425+
},
426+
{
427+
"cell_type": "code",
428+
"execution_count": null,
429+
"metadata": {},
430+
"outputs": [],
431+
"source": [
432+
"gds.fastRP.write(\n",
433+
" G,\n",
434+
" writeProperty=\"fastRP\",\n",
435+
" embeddingDimension=64,\n",
436+
" featureProperties=[\"pagerank\"],\n",
437+
" propertyRatio=0.2,\n",
438+
" nodeSelfInfluence=0.2,\n",
439+
")"
440+
]
441+
},
391442
{
392443
"cell_type": "markdown",
393444
"metadata": {},
@@ -402,7 +453,14 @@
402453
"metadata": {},
403454
"outputs": [],
404455
"source": [
405-
"gds.run_cypher(\"MATCH (u:User) RETURN u.pagerank\")"
456+
"gds.run_cypher(\n",
457+
" \"\"\"\n",
458+
" MATCH (u:User) \n",
459+
" RETURN u.id, u.age, u.fastRP, u.pagerank AS rank \n",
460+
" ORDER BY rank DESC\n",
461+
" LIMIT 5\n",
462+
" \"\"\"\n",
463+
")"
406464
]
407465
},
408466
{
@@ -412,10 +470,16 @@
412470
"# Closing the session\n",
413471
"\n",
414472
"Generally we intend for the sessions to only live for the time it takes to run a single workload.\n",
415-
"If the same workload needs to be re-run, for example to wor with updated data, a new session would be created.\n",
473+
"If the same workload needs to be re-run, for example to work with updated data, a new session would be created.\n",
474+
"\n",
475+
"💵💵💵💵💵💵\n",
476+
"\n",
477+
"💰💰💰💰💰💰\n",
478+
"\n",
479+
"💸💸💸💸💸💸\n",
416480
"\n",
417-
"The `session.delete_gds` operations will close the session and release all resources associated with it.\n",
418-
"It is important to note, that until this command was called the customer will be charged for the AuraDS instance that is used to host the session."
481+
"The `session.delete_gds` operation will delete the session and release all resources associated with it.\n",
482+
"It is important to note, that until this command was called the customer will be charged for the costs associated with hosting the session instance."
419483
]
420484
},
421485
{
@@ -424,6 +488,8 @@
424488
"metadata": {},
425489
"outputs": [],
426490
"source": [
491+
"# this will return True if it did delete something\n",
492+
"# it will return False otherwise, but it will not normally fail\n",
427493
"sessions.delete_gds(\"pagerank-compute\")"
428494
]
429495
}

requirements/dev/dev.txt

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,8 @@ isort == 5.12.0
55
mypy == 1.5.1
66
nbconvert == 7.6.0
77
pandas-stubs == 2.0.3.230814
8-
pytest == 7.4.0
98
pytest-annotate == 1.0.5
109
tox == 4.11.3
1110
types-setuptools == 68.1.0.1
1211
sphinx == 7.2.6
13-
requests_mock == 1.11.0
14-
pytest_mock == 3.12.0
1512
types-requests

0 commit comments

Comments
 (0)