-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diagnostic code for fixed bug still present in v4.x's V8 #7574
Comments
Indeed, I was missing something. The fix for the original issue was integrated into V8's 4.5.103.24 release with v8/v8@ef23c3a. I'm still wondering why the diagnostic code is still there. |
/cc @ofrobots @nodejs/v8 @nodejs/lts should we be manually backporting misterdjules@0ee8690 to v4.x-staging? |
Seems fine to me. It should bump V8_PATCH_LEVEL in deps/v8/include/v8-version.h though. |
For posterity, to check whether a particular V8 commit has been cherry-picked back to release lines, you can use
It seems that the fix was merged into V8 4.5 as part of the above cherry-pick. However, the diagnostic code wasn't removed until October 23, by which time V8 4.5 was already a dead branch:
|
@ofrobots Thank you for the info on Do you have any opinion on question #2 in my comment above:
? |
The diagnostic code is using unsigned arithmetic, so yes, on x86-64, it is possible for an address to be larger than 1 << 48 unsigned (although I am not sure whether linux grants those mappings to user space. No idea about other OSes). x86-64 uses 48-bit sign extended virtual address space. @misterdjules: What OS and architecture are your core dumps from? If x86-64, I would expect |
Right, my question was more specific to V8's JS heap management than about addresses in any process' native heap. If valid JS objects can have addresses
x86-64, the diagnostic code is only enabled on that target architecture.
Why? |
If the diagnostic code is asserting, then you have an address larger than 1 << 48. Legal addresses larger than 1 << 48 would have the top 16 bits set. I would be a concerned if that wasn't the case. What is the actual value you see? |
@ofrobots Can you quote the parts of the comments you're responding to, otherwise it's difficult to know which part you're responding to, and the discussion becomes confusing to me.
Is it in response to:
? If so, I thought that by "I would expect heap_obj->address() to have 2 most significant bytes to be 0xffff", you meant that JS heap objects whose address have their 2 most significant bytes to be 0xffff are valid. Which seems contradictory to the diagnostic code that crashes when such an object is found. |
Yes, I agree with this. I am trying get a better understanding of the situation that you are describing from your core dumps. That is why I was asking about the actual value of the address (or just the top 2 bytes, if you are paranoid) you see on which you hit the assert. Here are possible scenarios:
|
This is a bit tricky due to the fact that a lot of this diagnostic code and its callers are inlined.
These cores were generated on SmartOS. It turns out that on SmartOS (and probably most OSes derived from Solaris) mmapped pages can be mapped at addresses with the two most significant bytes set. That was what I was missing. The output of
In the output above, we can see that some
So it seems the diagnostic code was probably triggering false positives on SmartOS (and probably most OSes derived from Solaris). Thus, I would think it is safe to remove it, since these crashes in that diagnostic would not indicate that the original bug wasn't fixed, but instead that the diagnostic code was not properly working on Solaris/SmartOS. @ofrobots Does that sound reasonable to you? |
Thanks for posting the pmap output; based on that I agree that the dumps look like false positives caused by imperfect diagnostic code. |
Lately, I came across several occurrences of crashes that look like duplicates of #3898. Here's the callstack of one of these occurrences as displayed by
mdb_v8
's::jsstack
command:Unfortunately, I can't reproduce that crash and I can't share the core files that were provided to me.
My understanding was that, as mentioned in #3898, that crash was triggered by some diagnostic code that was added in order to investigate another V8 issue.
What surprised me is that it seems that the fix for the original issue was integrated in the v4.x branch as part of the upgrade to V8 4.5.103.24 with c431725 but the diagnostic code is still there.
The commit that upgrade V8 to version 4.5.103.24 in node v4.x doesn't show the fix for the original issue (v8/v8@8606664) in the GitHub web UI, but using
git show c43172578e3e10c2de84fc2ce0d6a7fb02a440d8
in a local checkout of thev4.x-staging
branch shows that the fix was indeed integrated into thev4.x
andv4.x-staging
branches.Another thing that surprised me is that c431725 represents the upgrade of V8 to version 4.5.103.24, but the fix for the original issue (v8/v8@8606664) doesn't seem to be part of V8's 4.5.103.24 release:
It's very likely that I'm missing something about how release branches/tags are organized in V8's repository though.
Now that raises a few questions:
> 1 << 48
. Is it possible for a valid JS object to have an address that is >1 << 48
for a 64bits process? If so, does that mean that the diagnostic code was overly conservative and triggered false positives? Basically, I'm trying to determine whether it makes sense to remove that diagnostic code by back porting https://codereview.chromium.org/1420253002 to v4.x-staging now that the original bug has been fixed.FWIW, I tested a back port of https://codereview.chromium.org/1420253002 to v4.x-staging with
make check
andmake test-v8
and it doesn't trigger any regression./cc @nodejs/v8.
The text was updated successfully, but these errors were encountered: