Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-120437: Fix _CHECK_STACK_SPACE optimization problems introduced in gh-118322 #120712

Merged
merged 2 commits into from
Jun 19, 2024

Conversation

Zheaoli
Copy link
Contributor

@Zheaoli Zheaoli commented Jun 18, 2024

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Jun 18, 2024

@Fidget-Spinner Thanks for the help, sir. I found the root cause and fix it. BTW I might have another question how to generate some opcode unit test to trigger the old bug so that I can confirm this issue has been solved? For now, I can only test the origin issue and confirm it. I think this is unacceptable for us.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jun 19, 2024

Please dont call me sir I'm still too young for that :). I think this looks right but I need to compare it with the old code prior to the commit. Please give me a moment. Thanks for your help!

@Fidget-Spinner
Copy link
Member

Do you know specifically which macro-op is affected by this? (The hint is to see where which opcode in _PyEval_EvalFrameDefault it segfaults). I don't think we should be adding the check here as it slows down all other opcodes which might not need the check.

I'm also confused by the diff. The diff uses _PyEvalFramePushAndInit not _PyFrame_PushUnchecked. So it should still check for stack space. I'm confused why that doesn't work.

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Jun 19, 2024

The crash task is here

#1  0x00007ffff7d3eeb3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff7ce6a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7cce4c3 in __GI_abort () at abort.c:79
#4  0x00007ffff7cce3df in __assert_fail_base (fmt=0x7ffff7e59b68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55555591a150 "tstate->datastack_top < tstate->datastack_limit", 
    file=file@entry=0x555555901138 "./Include/internal/pycore_frame.h", line=line@entry=284, function=function@entry=0x555555977030 <__PRETTY_FUNCTION__.30> "_PyFrame_PushUnchecked") at assert.c:94
#5  0x00007ffff7cdec67 in __assert_fail (assertion=assertion@entry=0x55555591a150 "tstate->datastack_top < tstate->datastack_limit", file=file@entry=0x555555901138 "./Include/internal/pycore_frame.h", line=line@entry=284, 
    function=function@entry=0x555555977030 <__PRETTY_FUNCTION__.30> "_PyFrame_PushUnchecked") at assert.c:103
#6  0x000055555578ec88 in _PyFrame_PushUnchecked (tstate=tstate@entry=0x555555d9e0c0 <_PyRuntime+293952>, func=<optimized out>, null_locals_from=null_locals_from@entry=3) at ./Include/internal/pycore_frame.h:284
#7  0x00005555557b8c51 in _PyEval_EvalFrameDefault (tstate=0x555555d9e0c0 <_PyRuntime+293952>, frame=0x7ffff7f98e58, throwflag=0) at Python/executor_cases.c.h:3326
#8  0x00005555557bc37e in _PyEval_EvalFrame (tstate=tstate@entry=0x555555d9e0c0 <_PyRuntime+293952>, frame=<optimized out>, throwflag=throwflag@entry=0) at ./Include/internal/pycore_ceval.h:118
#9  0x00005555557bc4a4 in _PyEval_Vector (tstate=0x555555d9e0c0 <_PyRuntime+293952>, func=0x7ffff6fe10d0, locals=locals@entry=0x0, args=0x7fffffff15a0, argcount=2, kwnames=0x0) at Python/ceval.c:1818
#10 0x00005555556728e4 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:413
#11 0x0000555555672c54 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x555555d9e0c0 <_PyRuntime+293952>, callable=callable@entry=<function at remote 0x7ffff6fe10d0>, args=args@entry=0x7fffffff15a0, nargsf=nargsf@entry=2, 
    kwnames=kwnames@entry=0x0) at ./Include/internal/pycore_call.h:168
#12 0x0000555555673b8c in object_vacall (tstate=tstate@entry=0x555555d9e0c0 <_PyRuntime+293952>, base=base@entry=0x0, callable=<function at remote 0x7ffff6fe10d0>, vargs=vargs@entry=0x7fffffff1620) at Objects/call.c:819
#13 0x0000555555673cea in PyObject_CallMethodObjArgs (obj=0x0, name=<optimized out>) at Objects/call.c:880
#14 0x00005555557fb230 in import_find_and_load (tstate=tstate@entry=0x555555d9e0c0 <_PyRuntime+293952>, abs_name=abs_name@entry='_winapi') at Python/import.c:3080
#15 0x00005555557feb3a in PyImport_ImportModuleLevelObject (name=name@entry='_winapi', globals=<optimized out>, 

It's crash in _INIT_CALL_PY_EXACT_ARGS_2

I think maybe there is a pass to specialize CALL_PY_GENERAL in 1ab6356 to _INIT_CALL_PY_EXACT_ARGS? I guess

@Zheaoli Zheaoli force-pushed the manjusaka/fix-gh120437 branch from cc306fe to c79f850 Compare June 19, 2024 09:31
@Fidget-Spinner
Copy link
Member

The bug is not here. It's indeed in the _CHECK_STACK_SPACE pass.

If you comment out this line here https://github.com/python/cpython/blob/main/Python/optimizer_bytecodes.c#L702, the code works.

@Fidget-Spinner
Copy link
Member

Nevermind I can't reproduce the bug anymore on my machine. Can you try my solution and report back if it passes please?

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Jun 19, 2024

Yep, this solution(I mean disable _CHECK_STACK_SPACE pass) the can fix this bug too.

I'm a little confuse about the root cause..

@Fidget-Spinner
Copy link
Member

Ok I recommend taking a look at Slide 17 onwards of my PyCon US 2024 presentation here https://docs.google.com/presentation/d/e/2PACX-1vTzJf_PrBzcSRtgMdbM18xXXMVjsYdJIqzNHk0WATT5ZRhjzIyGLtHCH0MXGONpcpOItjfcw63uGoFX/pub?start=false&loop=false&delayms=60000

It explains how traces work, how the optimization passes work.
For _CHECK_STACK_SPACE op you can view the original issue #116168

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Jun 19, 2024

Ok I recommend taking a look at Slide 17 onwards of my PyCon US 2024 presentation here https://docs.google.com/presentation/d/e/2PACX-1vTzJf_PrBzcSRtgMdbM18xXXMVjsYdJIqzNHk0WATT5ZRhjzIyGLtHCH0MXGONpcpOItjfcw63uGoFX/pub?start=false&loop=false&delayms=60000

It explains how traces work, how the optimization passes work. For _CHECK_STACK_SPACE op you can view the original issue #116168

Thanks for the reference. I have tested the #116168 (the commit is 1c43468)

The bug can't be reproduced in this branch.

The 1ab6356 is the first commit to reproduced the code (after the 1c43468). I think I still missing something important here= =

@Fidget-Spinner
Copy link
Member

What I'm trying to say is that it's likely the _CHECK_STACK_SPACE are being wrongly removed. There is likely nothing wrong with the first broken commit itself 1ab6356. Instead, that commit likely broke the calculations being done by the _CHECK_STACK_SPACE pass.

@Fidget-Spinner
Copy link
Member

Can you try removing the line at 1ab6356#diff-e5bd2b14b0b10f0f47786e26306d689ed1361c3dc3b11dcc3ea52b8a2422ff64R637 and tell me if it fixes the bug?

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Jun 19, 2024

After dive deeper, I think the root cause is here https://github.com/python/cpython/pull/118322/files#diff-e5bd2b14b0b10f0f47786e26306d689ed1361c3dc3b11dcc3ea52b8a2422ff64R637. I belive it will broken the trace pass.

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Jun 19, 2024

Can you try removing the line at 1ab6356#diff-e5bd2b14b0b10f0f47786e26306d689ed1361c3dc3b11dcc3ea52b8a2422ff64R637 and tell me if it fixes the bug?

Yeah! I found the same code here.

@Fidget-Spinner
Copy link
Member

The TLDR of why it breaks can be understood by reading https://github.com/python/cpython/blob/main/Python/optimizer_analysis.c#L471.

Long story short, at the end of the trace optimization pass, it merges _CHECK_STACK_SPACE to _CHECK_STACK_SPACE_OPERAND. The code breaks this by removing the link to the first _CHECK_STACK_SPACE, so what happens is the first _CHECK_STACK_SPACE_OPERAND doesn't actually check for enough space.

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Jun 19, 2024

The TLDR of why it breaks can be understood by reading https://github.com/python/cpython/blob/main/Python/optimizer_analysis.c#L471.

Long story short, at the end of the trace optimization pass, it merges _CHECK_STACK_SPACE to _CHECK_STACK_SPACE_OPERAND. The code breaks this by removing the link to the first _CHECK_STACK_SPACE, so what happens is the first _CHECK_STACK_SPACE_OPERAND doesn't actually check for enough space.

Thanks for the guide. I have noticed the reason by following the issue you reference to me. Help me a lot!

@Zheaoli Zheaoli force-pushed the manjusaka/fix-gh120437 branch from c79f850 to d520f93 Compare June 19, 2024 13:31
@Zheaoli Zheaoli marked this pull request as ready for review June 19, 2024 13:31
@Zheaoli Zheaoli requested a review from Fidget-Spinner as a code owner June 19, 2024 13:31
@Zheaoli Zheaoli force-pushed the manjusaka/fix-gh120437 branch from c9c5064 to dff937b Compare June 19, 2024 13:37
@Zheaoli Zheaoli changed the title gh-120437: drop specialization when the code frame is exceed current stack pythongh-120437: fix _CHECK_STACK_SPACE invalid after pythongh-118322 Jun 19, 2024
@Zheaoli Zheaoli force-pushed the manjusaka/fix-gh120437 branch from 8a98321 to 55c0e85 Compare June 19, 2024 13:51
@Zheaoli Zheaoli changed the title pythongh-120437: fix _CHECK_STACK_SPACE invalid after pythongh-118322 Fix _CHECK_STACK_SPACE optimization problems after introduced in gh-118322 Jun 19, 2024
@Zheaoli Zheaoli changed the title Fix _CHECK_STACK_SPACE optimization problems after introduced in gh-118322 Fix _CHECK_STACK_SPACE optimization problems after introduced in gh-118322 Jun 19, 2024
@Fidget-Spinner Fidget-Spinner changed the title Fix _CHECK_STACK_SPACE optimization problems after introduced in gh-118322 Fix _CHECK_STACK_SPACE optimization problems introduced in gh-118322 Jun 19, 2024
@Fidget-Spinner Fidget-Spinner changed the title Fix _CHECK_STACK_SPACE optimization problems introduced in gh-118322 gh-120437: Fix _CHECK_STACK_SPACE optimization problems introduced in gh-118322 Jun 19, 2024
@Fidget-Spinner
Copy link
Member

In the future, please help me label the title with issue gh-XXXX . Thanks!

Zheaoli and others added 2 commits June 19, 2024 22:27
… introduced in pythongh-118322

Co-Authored-By: Ken Jin <kenjin@python.org>
Signed-off-by: Manjusaka <me@manjusaka.me>
@Zheaoli Zheaoli force-pushed the manjusaka/fix-gh120437 branch from ce4e96e to 05f2d2c Compare June 19, 2024 14:27
@Fidget-Spinner
Copy link
Member

@Zheaoli please don't force-push. I will squash merge when merging so there's no need to do anything on your end :).

@Fidget-Spinner Fidget-Spinner added the needs backport to 3.13 bugs and security fixes label Jun 19, 2024
@Fidget-Spinner
Copy link
Member

@Zheaoli your fix seemingly fixes the 4 consistently failing Aarch64 JIT tests. Wow that's awesome!

@Fidget-Spinner Fidget-Spinner merged commit f385d99 into python:main Jun 19, 2024
57 checks passed
@miss-islington-app
Copy link

Thanks @Zheaoli for the PR, and @Fidget-Spinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @Zheaoli and @Fidget-Spinner, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker f385d99f57773e48285e0bcdbcd66dcbfdc647b3 3.13

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jun 19, 2024

@Zheaoli could you please create a backport PR targetting 3.13 for this?

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Jun 19, 2024

@Zheaoli could you please create a backport PR targetting 3.13 for this?

Sure!

@bedevere-app
Copy link

bedevere-app bot commented Jun 19, 2024

GH-120747 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jun 19, 2024
Fidget-Spinner added a commit that referenced this pull request Jun 19, 2024
…roduced in gh-118322 (GH-120712) (#120747)

[3.13] gh-120437: Fix `_CHECK_STACK_SPACE` optimization problems introduced in gh-118322 (GH-120712)

Signed-off-by: Manjusaka <me@manjusaka.me>
Co-authored-by: Ken Jin <kenjin4096@gmail.com>
mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants