Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Evaluate the similarity code with multiple parameters and regimes
@corinne-hcr more examples of what the initial analysis might have looked like. Build similarity models for: - 100, 300m, 500m - all combinations of filtering (yes/no) and cutoffs (yes/no) Generate labels for all labeled trips Determine ground truth by looking at: unique tuples and unique values for each of the user inputs Use these models to compute the metrics (homogeneity score and request %) for all combinations, along with a few other metrics like the number of unique tuples, cluster_trip_pct, etc. At this point, we are focusing on ground truth from tuples since the homogeneity score is already fairly high. What we really need to do is to bring down the request %, or determine *why* the user % is so high so that we can fix it (e.g. polygon). Some results in: #28 (comment)
- Loading branch information