Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-37884: Optimize Fraction() and statistics.mean() #15329

Closed

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Aug 19, 2019

self._numerator, self._denominator = math._as_integer_ratio(numerator)
return self
except TypeError:
raise TypeError("argument should be a string or a number, "
Copy link
Contributor

@aeros aeros Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the name of the encompassing type is "numeric" rather than "number", can we adjust the error message?

Suggested change
raise TypeError("argument should be a string or a number, "
raise TypeError("argument type should be str or numeric, "

Source: https://docs.python.org/3.9/library/stdtypes.html#numeric-types-int-float-complex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But an instance of a numeric type is a number, is not?

And if use "argument type", it should be "str", not "string".

Copy link
Contributor

@aeros aeros Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But an instance of a numeric type is a number, is not?

It is, but I think that it's a bit more useful to users to specify the actual types in this case since it's a TypeError. Also, when searching the docs for "number", the relevant documentation page ("Built-in Types") does not come up as a suggestion, instead they'll likely encounter the page for the "numbers" module (which would not be relevant to the error). When searching for "numeric", more relevant results are found, including "Built-in Types".

And if use "argument type", it should be "str", not "string".

I'll update the suggestion accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a standard message used in many sites. For example:

>>> float([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: float() argument must be a string or a number, not 'list'

If you want to change it, please open a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to change it, please open a separate issue.

Alright, I'll take a look at what other areas use this error message and consider whether or not it should be addressed. That would definitely go outside of the scope of this PR.


Return integer ratio.

Return a pair of integers, whose ratio is exactly equal to the original
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify that the pair of integers returned are contained within a tuple?

Suggested change
Return a pair of integers, whose ratio is exactly equal to the original
Return a tuple containing a pair of integers, whose ratio is exactly equal to the original

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from the docstring of float.as_integer_ratio. Other as_integer_ratio methods have similar wording.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion was based on the docstring Raymond recently created for Fraction.as_integer_ratio(), I made a similar suggestion that he added to the PR.

If it would be helpful, I could create a separate PR to make a similar change to float.as_integer_ratio().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tuple is the pair, so I find "tuple containing a pair" a bit confusing. That almost sounds like ((numerator, denominator),) or something. Why not be explicit and write "a tuple (numerator, denominator)"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not be explicit and write "a tuple (numerator, denominator)"?

That would be an improvement. I mostly just wanted to specify that the function returned a tuple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Pair" is a synonym of 2-element tuple. If you think that this term is incorrect, please open a separate issue and analyze all uses of in in the code and in the documentation (there are a lot of occurrences).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Pair" is a synonym of 2-element tuple.

Either is probably fine, I just figured it was worth trying to be more technically descriptive. Thanks.

@jdemeyer
Copy link
Contributor

Am I right that this is the same as #15210 but with a private function instead?

Copy link
Contributor

@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I performed the same tests mentioned in the bpo issue to verify the results and compared between the latest commit to master (master:24fe46081b) to the PR branch (math-as_integer_ratio2:e658d19083).

OS: Arch Linux 5.2.8
CPU: Intel i5-4460

./python -m timeit -s "from fractions import Fraction as F" "F(123)"
200000 loops, best of 5
Master: 1.42 usec
PR: 1.55 usec

./python -m timeit -s "from fractions import Fraction as F" "F(1.23)"
100000 loops, best of 5
Master: 2.92 usec per loop
PR: 2.14 usec

 ./python -m timeit -s "from fractions import Fraction as F; f = F(22, 7)" "F(f)"
100000 loops, best of 5
Master: 2.47 usec
PR: 1.93 usec

./python -m timeit -s "from statistics import mean; a = [1]*1000" "mean(a)"
500 loops, best of 5
Master: 930 usec
PR: 640 usec

./python -m timeit -s "from statistics import mean; a = [1.23]*1000" "mean(a)"
200 loops, best of 5
Master: 1.31 msec
PR: 1.34 msec

./python -m timeit -s "from statistics import mean; from fractions import Fraction as F; a = [F(22, 7)]*1000" "mean(a)"
200 loops, best of 5
Master: 1.31 msec
PR: 1.09 msec

./python -m timeit -s "from statistics import mean; from decimal import Decimal as D; a = [D('1.23')]*1000" "mean(a)"
100 loops, best of 5
Master: 2.08 msec
PR: 2 msec

@aeros
Copy link
Contributor

aeros commented Aug 19, 2019

Am I right that this is the same as #15210 but with a private function instead?

It also looks like the Travis doctest and docbuild was failing on the other PR. As far as functionality goes, the changes to the code in mathmodule.c look to be identical, other than the additional underscore in the function name to denote it as private.

Copy link
Contributor

@jdemeyer jdemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting arbitrary objects with an as_integer_ratio method is a non-trivial change and should be documented in a NEWS entry.

@serhiy-storchaka
Copy link
Member Author

Am I right that this is the same as #15210 but with a private function instead?

Yes, it is.

Supporting arbitrary objects with an as_integer_ratio method is a non-trivial change and should be documented in a NEWS entry.

Agreed.

return PyTuple_Pack(2, x, _PyLong_One);
}

if (_PyObject_LookupAttrId(x, &PyId_as_integer_ratio, &as_integer_ratio) < 0) {
Copy link
Contributor

@aeros aeros Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the context, I can roughly tell what this conditional is doing. From my understanding, _PyObject_LookupAttrId() is assessing whether or not the PyObject x contains an as_integer_ratio attribute. If a value less than zero is returned (usually -1), it does not contain that attribute.

However, I'm not certain that I understand where PyId_as_integer_ratio is coming from or how PyId actually works. I was unable to find any documentation on PyId or _Py_IDENTIFIER(), so I'm guessing it's an internal part of the C-API (since it's prefixed with an underscore).

My best guess is that a reference to PyId_as_integer_ratio was created when _Py_IDENTIFIER(as_integer_ratio) was used.

I'm fairly new to the C-API, so I'm trying to learn more about it so that I can be more helpful in PR reviews that involve it. Particularly the internal implementation details that aren't in the documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is described in the header: Include/cpython/object.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for letting me know where to look. The code comments there addressed my question:

PyId_foo is a static variable, either on block level or file level. On first usage, the string "foo" is interned, and the structures are linked.

@serhiy-storchaka serhiy-storchaka deleted the math-as_integer_ratio2 branch August 24, 2019 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants