Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clusters topic guide #1883

Merged
merged 55 commits into from
Apr 4, 2024
Merged

Clusters topic guide #1883

merged 55 commits into from
Apr 4, 2024

Conversation

zslade
Copy link
Contributor

@zslade zslade commented Jan 24, 2024

Type of PR

  • BUG
  • FEAT
  • MAINT
  • DOC

Is your Pull Request linked to an existing Issue or Pull Request?

Give a brief description for the solution you have provided

Note to reviewers: This is a work in progress and may undergo some significant changes in light of developing plots for metrics and doing more thinking about how they should best be used!

Topic guide for evaluating clusters. Consists of:

  • An overview motivating the use of graphs and graph metrics to evaluate the results of data linking and discussing what good looks like
  • A run through of graph metrics currently available with Splink and how they apply to evaluating linked data
  • Instructions on how to compute graph metrics using the new functionality (more details to be added to a future iteration)

At present, there isn't a focussed discussion on the various levers that can be pulled to change clusters (e.g. threshold vs constraints of comparisons vs blocking rules) which might be useful. Could wait for future iteration though.

PR Checklist

  • Added documentation for changes
  • Added feature to example notebooks or tutorial (if appropriate)
  • Added tests (if appropriate)
  • Updated CHANGELOG.md (if appropriate)
  • Made changes based off the latest version of Splink
  • Run the linter

Copy link
Contributor

@ADBond ADBond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks in good shape, thanks for all your work in this! Think you have done a good job explaining complicated things clearly, think this will be a really good addition to the docs

docs/topic_guides/evaluation/clusters/overview.md Outdated Show resolved Hide resolved
docs/topic_guides/evaluation/clusters/overview.md Outdated Show resolved Hide resolved
docs/topic_guides/evaluation/clusters/overview.md Outdated Show resolved Hide resolved
docs/topic_guides/evaluation/clusters/overview.md Outdated Show resolved Hide resolved
docs/topic_guides/evaluation/clusters/graph_metrics.md Outdated Show resolved Hide resolved
docs/comparison_level_composition.md Outdated Show resolved Hide resolved
Copy link
Contributor

@RossKen RossKen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go from my side now - @zslade please merge whenever you are happy

@RossKen RossKen merged commit 164be44 into master Apr 4, 2024
5 checks passed
@RossKen RossKen deleted the clusters_topic_guide branch April 4, 2024 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants