Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions #102649

Merged
merged 11 commits into from
Mar 14, 2023

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Mar 13, 2023

  • Use sumprod() which is faster, simpler, and more accurate than rounding each multiplication before summation.
  • For an additional speed-up and simplification, compute the (x_xi - bar) only once instead of multiple times.
  • For Spearman's rank correlation, we can skip the (x_xi - bar) step because the ranks are centered around zero.

Baseline timing

% ./python.exe -m timeit -r11 -s 'from random import expovariate as r' -s 'from statistics import correlation' -s 'n=100' -s 'data = [r() for i in range(n)]' -s 'weights = [r() for i in range(n)]' 'correlation(data, weights)'
10000 loops, best of 11: 21 usec per loop

Timing with PR

% ./python.exe -m timeit -r11 -s 'from random import expovariate as r' -s 'from statistics import correlation' -s 'n=100' -s 'data = [r() for i in range(n)]' -s 'weights = [r() for i in range(n)]' 'correlation(data, weights)'
50000 loops, best of 11: 8.92 usec per loop

@rhettinger rhettinger added performance Performance or resource usage skip issue skip news 3.12 bugs and security fixes labels Mar 13, 2023
@rhettinger rhettinger changed the title Use sumprod() to simplify, speed up, and improve accuracy of statistics functions GH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions Mar 13, 2023
@rhettinger rhettinger merged commit 457e4d1 into python:main Mar 14, 2023
carljm added a commit to carljm/cpython that referenced this pull request Mar 14, 2023
* main: (50 commits)
  pythongh-102674: Remove _specialization_stats from Lib/opcode.py (python#102685)
  pythongh-102660: Handle m_copy Specially for the sys and builtins Modules (pythongh-102661)
  pythongh-102354: change python3 to python in docs examples (python#102696)
  pythongh-81057: Add a CI Check for New Unsupported C Global Variables (pythongh-102506)
  pythonGH-94851: check unicode consistency of static strings in debug mode (python#102684)
  pythongh-100315: clarification to `__slots__` docs. (python#102621)
  pythonGH-100227: cleanup initialization of global interned dict (python#102682)
  doc: Remove a duplicate 'versionchanged' in library/asyncio-task (pythongh-102677)
  pythongh-102013: Add PyUnstable_GC_VisitObjects (python#102014)
  pythonGH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (pythonGH-102649)
  pythongh-102627: Replace address pointing toward malicious web page (python#102630)
  pythongh-98831: Use DECREF_INPUTS() more (python#102409)
  pythongh-101659: Avoid Allocation for Shared Exceptions in the _xxsubinterpreters Module (pythongh-102659)
  pythongh-101524: Fix the ChannelID tp_name (pythongh-102655)
  pythongh-102069: Fix `__weakref__` descriptor generation for custom dataclasses (python#102075)
  pythongh-98169 dataclasses.astuple support DefaultDict (python#98170)
  pythongh-102650: Remove duplicate include directives from multiple source files (python#102651)
  pythonGH-100987: Don't cache references to the names and consts array in `_PyEval_EvalFrameDefault`. (python#102640)
  pythongh-87092: refactor assemble() to a number of separate functions, which do not need the compiler struct (python#102562)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102631)
  ...
@@ -1,4 +1,4 @@
"""Test suite for statistics module, including helper NumericTestCase and
x = """Test suite for statistics module, including helper NumericTestCase and
Copy link

@alexanderGerbik alexanderGerbik Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo or some kind of black magic?

Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this pull request Mar 27, 2023
warsaw pushed a commit to warsaw/cpython that referenced this pull request Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants