Skip to content

Commit

Permalink
Merge pull request #28 from shankari/tune_clustering_params
Browse files Browse the repository at this point in the history
Finally compare dataset characteristics against each other
  • Loading branch information
shankari authored Jun 7, 2022
2 parents 37839c8 + 93590d8 commit b8368d9
Show file tree
Hide file tree
Showing 17 changed files with 9,666 additions and 6 deletions.
824 changes: 824 additions & 0 deletions tour_model_eval/Compare user mode mapping effect with outputs.ipynb

Large diffs are not rendered by default.

Large diffs are not rendered by default.

947 changes: 947 additions & 0 deletions tour_model_eval/Explore multiple datasets.ipynb

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

747 changes: 747 additions & 0 deletions tour_model_eval/Explore trip clustering using DBSCAN unrolled.ipynb

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

44 changes: 42 additions & 2 deletions tour_model_eval/Federating and saving multiple datasets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -169,20 +169,60 @@
" len(all_expanded_df[all_expanded_df.program == \"stage\"].user_id.unique()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "brown-poison",
"metadata": {},
"outputs": [],
"source": [
"all_expanded_df.reset_index(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fabulous-aruba",
"metadata": {},
"outputs": [],
"source": [
"all_expanded_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "limiting-gazette",
"metadata": {},
"outputs": [],
"source": [
"all_expanded_df.columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "impaired-growing",
"metadata": {},
"outputs": [],
"source": [
"import bson.json_util as bju"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "civil-strike",
"metadata": {},
"outputs": [],
"source": [
"all_expanded_df.to_csv(\"/tmp/federated_trip_only_dataset.csv\")"
"all_expanded_df.to_json(\"/tmp/federated_trip_only_dataset.json\", orient=\"records\", default_handler=bju.default)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "critical-shepherd",
"id": "contained-banner",
"metadata": {},
"outputs": [],
"source": []
Expand Down
10 changes: 10 additions & 0 deletions tour_model_eval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
This directory contains the ipython notebooks used to tune and evaluate the
first round cluster algorithm. It uses an O(n^2) algorithm to iterate over a
set of n trips, and cluster them into bins based on proximity of start and end
points.

To understand the evolution of this process, including a comparison of this
algorithm with DBSCAN, please see
https://github.com/e-mission/e-mission-eval-private-data/pull/28

which includes explanations and intermediate results
3 changes: 0 additions & 3 deletions tour_model_eval/Radius selection exploration unrolled.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -832,7 +832,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "closed-azerbaijan",
"metadata": {},
Expand All @@ -843,7 +842,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "naval-assignment",
"metadata": {},
Expand All @@ -852,7 +850,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "designed-capture",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
"metadata": {},
"outputs": [],
"source": [
"participant_uuid_obj = list(edb.get_profile_db().find({\"install_group\": \"participant\"}, {\"user_id\": 1, \"_id\": 0}))\n",
"participant_uuid_obj = list(edb.get_profile_db().find({}, {\"user_id\": 1, \"_id\": 0}))\n",
"all_users = [u[\"user_id\"] for u in participant_uuid_obj]"
]
},
Expand Down

0 comments on commit b8368d9

Please sign in to comment.