Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-8694vbw6y k-fold stats Standard Deviation #459

Merged
merged 3 commits into from
Jul 18, 2024
Merged

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Jul 4, 2024

Optionally add standard deviation to k-fold metrics output.

The PR simplifies the mean calculation to allow simpler standard deviation calculations as well.

PS:
Had some weird (at least to me) issues with mypy that I didn't really understand.

The details

Here's the errors it shows:

medcat/stats/kfold.py:435: error: Incompatible types in assignment (expression has type "int | float", target has type "tuple[int, float]")  [assignment]
medcat/stats/kfold.py:435: error: Incompatible types in assignment (expression has type "int | float", target has type "tuple[float, float]")  [assignment]
medcat/stats/kfold.py:435: error: Argument 1 to "sum" has incompatible type "list[int | float]"; expected "Iterable[bool]"  [arg-type]
medcat/stats/kfold.py:445: error: Incompatible types in assignment (expression has type "tuple[Any, Any]", target has type "int")  [assignment]
medcat/stats/kfold.py:445: error: Incompatible types in assignment (expression has type "tuple[Any, Any]", target has type "float")  [assignment]

It's weird that it assumes the wrong type on lines 435 and 445 since the target is the same. If mypy assumed reversed target types, it should work just fine.
And really not sure about the sum method expecting an iterable of bools either.

@tomolopolis
Copy link
Member

Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - comment around if the std_dev calc is needed


def _update_all_add(joined: List[Dict[str, int]], single: List[Dict[str, int]]) -> None:
def get_std(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed given this: https://docs.python.org/3.11/library/statistics.html#statistics.pstdev
you can specify mu=get_mean().

Also in 3.11, statistics.fmean supports a weigthing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation expects to work for the weighted case (it's just that in some cases the weights are all equal (to 1)).
As far as I can tell, the other options do not (at least currently) support weighted means or standard deviations.

Plus, we'd need something that would support 3.8 through to 3.11.

@mart-r mart-r merged commit 018fd7a into master Jul 18, 2024
8 checks passed
mart-r added a commit that referenced this pull request Jul 18, 2024
* CU-8694vbw6y: Update k-fold metrics to allow including standard deviation in results

* CU-8694vbw6y: Add tests for new parts of k-fold metrics (e.g standard deviation)

* CU-8694vbw6y: Fix typing issues k-fold metrics and standard deviation
@mart-r mart-r deleted the CU-8694vbw6y-k-fold-std branch August 12, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants