Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No recursion error for cycles in repr #358

Merged
merged 6 commits into from
Mar 17, 2018
Merged

No recursion error for cycles in repr #358

merged 6 commits into from
Mar 17, 2018

Conversation

glyph
Copy link
Contributor

@glyph glyph commented Mar 16, 2018

Fixes #95.

  • Added tests for changed code.

Done.

There are no tests for the multithreaded case because it's literally impossible to test that, you just have to think really hard. I could try harder if the reviewer thinks that this implementation strategy is risky but it seems pretty boring; just using threading.local to avoid sharing state between threads.

I don't think I need to do this here because I don't see anything relevant, but, I am not a Hypothesis expert so I might be missing something.

  • Updated documentation for changed code.

It's a bugfix, so I don't think it calls for updated docs.

  • Changes (and possible deprecations) have news fragments in changelog.d.

Done.

@codecov
Copy link

codecov bot commented Mar 16, 2018

Codecov Report

Merging #358 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #358   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           9      9           
  Lines         806    825   +19     
  Branches      171    173    +2     
=====================================
+ Hits          806    825   +19
Impacted Files Coverage Δ
src/attr/_make.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 780e1e5...cfc376a. Read the comment docs.

@glyph glyph changed the title No recursion error No recursion error for cycles in repr Mar 16, 2018
@Tinche
Copy link
Member

Tinche commented Mar 16, 2018

This is going to significantly slow down repr, isn't it? Usually people don't repr on a hot path, but I log attrs classes all the time.

@hynek
Copy link
Member

hynek commented Mar 16, 2018

If it gets prohibitively expensive, we can always make it an option.

@asvetlov
Copy link

Is repr speed the bottleneck?
Usually repr is used for logging, errors handling etc. but never for not fast paths if successful code execution.

Guys, it is your library, you establish the rule.
But for example if comprehensive repr in aiohttp takes extra time -- I don't care, correctness first.

@glyph
Copy link
Contributor Author

glyph commented Mar 16, 2018

I'd like to see some perf numbers before doing anything drastic :) but on the perf front we could also disable this for frozen classes. You can participate in a cycle as a frozen object, but you can't be the cause of the cycle, so it's the (mutable) cyclic object that should be responsible for the ....

@glyph
Copy link
Contributor Author

glyph commented Mar 16, 2018

Re: logging attrs classes: yes, this is precisely why I really want this to happen :). Sometimes having repr in a hot logging loop be kinda slow is vastly preferable to sometimes entirely losing a log message because an object got into a cycle somehow.

@glyph
Copy link
Contributor Author

glyph commented Mar 16, 2018

Anyone I can motivate to give me a review? :)

Copy link

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glyph
Don't know is my voice weight something but your PR looks correct and safe.

@glyph
Copy link
Contributor Author

glyph commented Mar 16, 2018

@asvetlov Thanks for your input, your review is appreciated! But per project policy, I do need another project member to sign off :).

Copy link
Member

@mahmoud mahmoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen a couple approaches to this in the past, but this matches the better of the two. The other is OrderedDict's, in case you'd like to contrast it. LGTM.

@Tinche
Copy link
Member

Tinche commented Mar 16, 2018

So show us some performance numbers?

@Tinche
Copy link
Member

Tinche commented Mar 16, 2018

BTW I log attrs classes all the time on the happy path. Not in hot loops of course, but once per request for sure. A slow repr isn't prohibitive, but maybe it should be opt-in. Personally I have never had a cycle, why pay the price?

@glyph
Copy link
Contributor Author

glyph commented Mar 16, 2018

Preliminary testing suggests a 20% slowdown on both CPython and PyPy.

Test:

import attr
from textwrap import dedent

@attr.s
class Foo(object):
    bar = attr.ib()
    baz = attr.ib()
    boz = attr.ib()

import timeit
print(timeit.timeit(
    'repr(foo)',
    dedent('''
    from __main__ import Foo
    foo = Foo(1, "hello", ["some", "values"])
    '''),
    number=1000000
))

@glyph
Copy link
Contributor Author

glyph commented Mar 16, 2018

@Tinche On the one hand, I like the philosophy of "don't pay for what you don't use". On the other hand, what would you call this if you had to opt in to it? dont_crash_on_repr? I don't like "behave correctly" flags. Not sure how to resolve that here, and on balance I feel like this is acceptable overhead for starters, and if we wanted to optimize repr there are a few other things which might be a good idea to investigate as well.

@glyph
Copy link
Contributor Author

glyph commented Mar 16, 2018

How do you feel about frozen= being the opt-in?

@glyph
Copy link
Contributor Author

glyph commented Mar 16, 2018

Actually I guess since frozen=False is the default that would make it an opt-out not an opt-in.

@glyph
Copy link
Contributor Author

glyph commented Mar 17, 2018

@Tinche With the micro-optimization I just committed, this change is about 25% faster on PyPy and about 10% faster on CPython.

It's true that if we removed the code that makes it correct now it could probably be even faster, but presumably if the previous performance was acceptable, faster performance is also acceptable :).

@glyph
Copy link
Contributor Author

glyph commented Mar 17, 2018

(25%/10% faster than master, I mean, not than the previous implementation of this fix)

@mahmoud
Copy link
Member

mahmoud commented Mar 17, 2018

Restraining the optimization golfer in me, I think eliminating the .format() call and generator expression is a cromulent and uncontroversial way to address repr performance concerns. +1

@hynek
Copy link
Member

hynek commented Mar 17, 2018

Let’s get some numbers.

Hynek’s Molasses

➤ pyperf timeit --rigorous -s 'import attr; C = attr.make_class("C", ["a", "b", "c"]); x = C(1, "2", 3.0)' 'repr(x)'
.........................................
Mean +- std dev: 3.22 us +- 0.09 us

Glyph’s Vroom-Vroom

➤ pyperf timeit --rigorous -s 'import attr; C = attr.make_class("C", ["a", "b", "c"]); x = C(1, "2", 3.0)' 'repr(x)'
.........................................
Mean +- std dev: 3.12 us +- 0.08 us

Hawkie’s Turbo

However, looking at the code reminded me of @hawkowl ’s blog post on string concats (well, she wrote about byte strings but I figured the lessons are transferable) and I did some more micro optimizations that even made the code shorter and nicer. And lo and behold:

pyperf timeit --rigorous -s 'import attr; C = attr.make_class("C", ["a", "b", "c"]); x = C(1, "2", 3.0)' 'repr(x)'
.........................................
Mean +- std dev: 2.99 us +- 0.07 us

So I’m not sure what the etiquette here is but I’ve just pushed my changes against Glyph’s branch. If you (Glyph) are OK with it, we can merge it as soon as CI passes.

@@ -3,6 +3,7 @@
import hashlib
import linecache
import sys
import threading

This comment was marked as spam.

This comment was marked as spam.

@hynek
Copy link
Member

hynek commented Mar 17, 2018

Although I have one last question: does this work with greenlets too? Or do we have to do something like I did in structlog?

@glyph
Copy link
Contributor Author

glyph commented Mar 17, 2018

It should work well enough with greenlets - in principle it's possible to suspend and reentrantly repr something in a different greenlet but the same thread but:

  • you'd have to do something truly horrible to make that happen
  • if you're doing that reentrantly in a paused greenlet then you might have the same issues with recursion anyway so this logic is not wrong exactly.

@glyph
Copy link
Contributor Author

glyph commented Mar 17, 2018

If we're happy with the perf numbers, I'm going to go ahead and merge.

@glyph glyph merged commit 5e46afd into master Mar 17, 2018
@glyph glyph deleted the no-recursion-error branch March 17, 2018 08:32
@hawkowl
Copy link
Member

hawkowl commented Mar 17, 2018

I helped, without even having to help. Woo!

@hynek
Copy link
Member

hynek commented Mar 17, 2018

“Hawkie’s Turbo” is a thing now

@njsmith
Copy link

njsmith commented Mar 17, 2018

In 3.7 you could use contextvars instead, to avoid saying the Forbidden Word :-).

(I haven't managed to convince Yury to make a backport yet though.)

@Tinche
Copy link
Member

Tinche commented Mar 17, 2018

Yay, everyone's happy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infinite Recursion in repr with bi-directional relationships
7 participants