-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perl 5.32 segfaults on Linux/m68k (regression) #17871
Comments
On 6/18/20 6:10 AM, John Paul Adrian Glaubitz wrote:
Hi!
Perl 5.32 has introduced a regression which results in |miniperl|
segfaulting during build.
The backtrace with |gdb| looks like this:
***@***.***:/srv/perl2/perl-5.32.0~rc1/build-static# gdb --args
./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c
'echo >&2 Failed to build miniperl. Please run make mint/Exporter/lib
-MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please
run make mininiperl. Please run make minitminiperl. Please run make
minitest; exit 1' GNU gdb (Debian 9.2-1) 9.2 Copyright (C) 2020 Free
Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html> This is free software: you are free
to change and redistribute it. There is NO WARRANTY, to the extent
permitted by law. Type "show copying" and "show warranty" for details.
This GDB was configured as "m68k-linux-gnu". Type "show configuration"
for configuration details. For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other
documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>. For help, type "help".
Type "apropos word" to search for commands related to "word"... Reading
symbols from ./miniperl... (gdb) r Starting program:
/srv/perl2/perl-5.32.0~rc1/build-static/miniperl -w -Ilib
-Idist/Exporter/lib -MExporter -e \<\?\> [Thread debugging using
libthread_db enabled] Using host libthread_db library
"/lib/m68k-linux-gnu/libthread_db.so.1". Program received signal
SIGSEGV, Segmentation fault. S_link_freed_op (o=0x803120aa,
slab=<optimized out>, slab=<optimized out>, my_perl=0x8030a190) at
op.c:269 269 if (!slab->opslab_freed) { (gdb) bt #0 S_link_freed_op
(o=0x803120aa, slab=<optimized out>, slab=<optimized out>,
my_perl=0x8030a190) at op.c:269 #1 0x8000901e in Perl_Slab_Free
(op=0x80312136, my_perl=0x8030a190) at op.c:505 #2 Perl_Slab_Free
(my_perl=0x8030a190, op=0x803120aa) at op.c:484 #3 0x8000fae6 in
S_op_destroy (o=<optimized out>, my_perl=<optimized out>) at op.c:6416
#4 Perl_op_append_list (type=<optimized out>, last=<optimized out>,
first=0x80312136, my_perl=<optimized out>) at op.c:6416 #5
Perl_op_append_list (my_perl=<optimized out>, type=<optimized out>,
first=<optimized out>, last=<optimized out>) at op.c:6397 #6 0x800625c8
in Perl_yyparse (my_perl=0x8030a190, gramtype=258) at perly.y:239 #7
0x800321fe in S_parse_body (xsinit=0x801885f8 <xs_init>, env=0x0,
my_perl=0x8030a190) at perl.c:2576 #8 perl_parse (my_perl=0x8030a190,
xsinit=0x801885f8 <xs_init>, argc=7, argv=0xeffffc14, env=0x0) at
perl.c:1871 #9 0x800050ae in main (argc=<optimized out>, argv=<optimized
out>, env=<optimized out>) at miniperlmain.c:132 (gdb) |
A wild guess is that might be an alignment issue as the natural
alignment for m68k is 16 bits, not 32 bits.
CC @AndreasSchwab <https://github.com/AndreasSchwab> @karcherm
<https://github.com/karcherm>
Please supply the way you invoked 'sh ./Configure' to achieve these
results. Example:
#####
sh ./Configure -des -Dusedevel -Duseithreads
#####
Thank you very much.
Jim Keenan
|
Hi Jim! Thanks for your answer. The full build log can be found in: https://buildd.debian.org/status/fetch.php?pkg=perl&arch=m68k&ver=5.32.0%7Erc1-1&stamp=1592359772&raw=0 From the Debian process, it looks like the configure script is invoked as:
I'm currently bisecting the git tree from upstream now to see if I can find the commit in question that broke the build. |
On 6/18/20 6:10 AM, John Paul Adrian Glaubitz wrote:
Hi!
Perl 5.32 has introduced a regression which results in |miniperl|
segfaulting during build.
The backtrace with |gdb| looks like this:
***@***.***:/srv/perl2/perl-5.32.0~rc1/build-static# gdb --args
./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c
'echo >&2 Failed to build miniperl. Please run make mint/Exporter/lib
-MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please
run make mininiperl. Please run make minitminiperl. Please run make
minitest; exit 1' GNU gdb (Debian 9.2-1) 9.2 Copyright (C) 2020 Free
Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html> This is free software: you are free
to change and redistribute it. There is NO WARRANTY, to the extent
permitted by law. Type "show copying" and "show warranty" for details.
This GDB was configured as "m68k-linux-gnu". Type "show configuration"
for configuration details. For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other
documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>. For help, type "help".
Type "apropos word" to search for commands related to "word"... Reading
symbols from ./miniperl... (gdb) r Starting program:
/srv/perl2/perl-5.32.0~rc1/build-static/miniperl -w -Ilib
-Idist/Exporter/lib -MExporter -e \<\?\> [Thread debugging using
libthread_db enabled] Using host libthread_db library
"/lib/m68k-linux-gnu/libthread_db.so.1". Program received signal
SIGSEGV, Segmentation fault. S_link_freed_op (o=0x803120aa,
slab=<optimized out>, slab=<optimized out>, my_perl=0x8030a190) at
op.c:269 269 if (!slab->opslab_freed) { (gdb) bt #0 S_link_freed_op
(o=0x803120aa, slab=<optimized out>, slab=<optimized out>,
my_perl=0x8030a190) at op.c:269 #1 0x8000901e in Perl_Slab_Free
(op=0x80312136, my_perl=0x8030a190) at op.c:505 #2 Perl_Slab_Free
(my_perl=0x8030a190, op=0x803120aa) at op.c:484 #3 0x8000fae6 in
S_op_destroy (o=<optimized out>, my_perl=<optimized out>) at op.c:6416
#4 Perl_op_append_list (type=<optimized out>, last=<optimized out>,
first=0x80312136, my_perl=<optimized out>) at op.c:6416 #5
Perl_op_append_list (my_perl=<optimized out>, type=<optimized out>,
first=<optimized out>, last=<optimized out>) at op.c:6397 #6 0x800625c8
in Perl_yyparse (my_perl=0x8030a190, gramtype=258) at perly.y:239 #7
0x800321fe in S_parse_body (xsinit=0x801885f8 <xs_init>, env=0x0,
my_perl=0x8030a190) at perl.c:2576 #8 perl_parse (my_perl=0x8030a190,
xsinit=0x801885f8 <xs_init>, argc=7, argv=0xeffffc14, env=0x0) at
perl.c:1871 #9 0x800050ae in main (argc=<optimized out>, argv=<optimized
out>, env=<optimized out>) at miniperlmain.c:132 (gdb) |
A wild guess is that might be an alignment issue as the natural
alignment for m68k is 16 bits, not 32 bits.
CC @AndreasSchwab <https://github.com/AndreasSchwab> @karcherm
<https://github.com/karcherm>
Thank you for this report and especially for the backtrace.
When I first read this ticket, I had no idea what "Linux/m68k" was.
Eventually I found this page, http://www.linux-m68k.org/index.html,
which informed me that this was a project to port Linux to the Motorola
68000 series of microprocessors. That page appears to have been last
updated around 2002. Searching our issue queue for
https://github.com/Perl/perl5/issues?q=m68k+in%3Atitle, it appears this
is the first bug report we've had for this platform since 2003. And,
AFAICT, we have never received a smoke-test report for m68k at
http://perl5.test-smoke.org/.
Questions:
* Probably the only way we could identify the commit which broke this
build would be to run Porting/bisect.pl on the machine where you
observed this failure. Is that at all feasible?
* To continue to support perl on this platform going forward, we would
need to have regular smoke-test reports? Would that be feasible?
* Does this architecture have sufficient use in production to warrant an
attempt to maintain perl on it?
Thank you very much.
Jim Keenan
|
We are maintaining it in Debian, see: https://buildd.debian.org/status/architecture.php?a=m68k&suite=sid
It might just have worked well all the time :).
Sure. It's also reproducible in QEMU. You can easily install Debian/m68k on QEMU, see: https://wiki.debian.org/M68k/QemuSystemM68k
Depends on what you mean with regular. People are maintaining Linux (and NetBSD) on classic computers in their free time. So if you are asking for a weekly CI run, it might be a bit too much.
It's used by the community who are maintaining Linux and NetBSD for older architectures. In any case, Perl doesn't have any architecture-specific bits, does it? And it just seems there is a minor regression to me, so I don't think it would take much effort to fix this problem. |
I would guess it to be 7b85c12 (only change in that area at first glance), @iabyn do you have a clue? |
I have bisected the problematic commit to be 0bd6eef. Reverting it, fixes Perl for me on m68k. |
It's the alignment of This fixes the problem for me: diff --git a/op.h b/op.h
index fc21f03cda..480c95245b 100644
--- a/op.h
+++ b/op.h
@@ -698,7 +698,7 @@ struct opslot {
U16 opslot_size; /* size of this slot (in pointers) */
U16 opslot_offset; /* offset from start of slab (in ptr units) */
OP opslot_op; /* the op itself */
-};
+} __attribute__ ((aligned (4)));
struct opslab {
OPSLAB * opslab_next; /* next slab */ The native alignment on m68k is 16 bits. |
Alternatively, one should be able to add a padding variable of type |
On Thu, Jun 18, 2020 at 12:37:29PM -0700, John Paul Adrian Glaubitz wrote:
It's the alignment of ```opslot``` that causes the segmentation fault.
In what way exactly is the compiler getting the alignment wrong?
…--
Fire extinguisher (n) a device for holding open fire doors.
|
On 6/18/20 3:18 PM, John Paul Adrian Glaubitz wrote:
I have bisected the problematic commit to be 0bd6eef
<0bd6eef>.
Reverting it, fixes Perl for me on m68k.
In the core distribution I have created branch
'smoke-me/jkeenan/gh-17871-m68k' to make this patch available for
smoke-testing.
Glaubnitz, if you could provide us (off-list) with an appropriate
email-address, that would facilitate our listing you properly in AUTHORS.
Thank you very much.
Jim Keenan
|
The native alignment on m68k is 16 bits, not 32 bits, i.e. what you would expect from a 32-bits architecture. |
@tonycoz Can you take a look? Thanks. |
One of the structs used will align to 16 bits on m68k, while it will align to 32 bits on any other 32-bit architecture and to 64 bits on 64-bit targets. I tried to use padding bytes, but so far without success. Normally, padding bytes would have the same effect as using the I will continue to figure that out tomorrow, it's getting late here. But if you have found a better solution in the mean time, let me know. So far, many thanks for the friendly discussion. |
Early smoke-test results (e.g., http://perl5.test-smoke.org/report/114741) suggest that this patch will cause perl not to build on Windows. |
@jkeenan Yes, that's not surprising as Microsoft's Visual C/C++ compiler does not understand the directive |
On 6/18/20 5:36 PM, James E Keenan wrote:
Early smoke-test results (e.g.,
http://perl5.test-smoke.org/report/114741) suggest that this patch will
cause perl not to build on Windows.
|
Good morning! After some debugging, I figured out where the padding is needed. Updated patch below which fixes the build for me. From 3585dd865baaa3c631964fbba758491b26ef954a Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 07:42:08 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
proper alignment
On architectures where the native alignment is less than 4 bytes such as
Motorola 68000 (m68k), we need an additional padding of 2 bytes such that
the offset of the slots inside the slabs is correct.
---
op.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/op.h b/op.h
index fc21f03cda..fb9f538e23 100644
--- a/op.h
+++ b/op.h
@@ -714,6 +714,7 @@ struct opslab {
# ifdef PERL_DEBUG_READONLY_OPS
bool opslab_readonly;
# endif
+ U16 opslab_padding; /* padding to ensure proper alignment */
OPSLOT opslab_slots; /* slots begin here */
};
--
2.27.0 |
An alternative description for the commit would be:
|
I think it would be wise to also mention that m68k is big-endian |
Hmm, is that actually relevant? Here's an updated patch with the improvement commit message: From 8cd015a5cc230105bdcb8c33db80ef9f06b1c670 Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 08:33:33 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
proper alignment
Perl crashes with a segmentation fault on m68k due to an alignment issue.
On m68k, the natural alignment is 16 bits which causes the opslot
member of "struct opslab" to be aligned at a 16-bit offset.
On other 32-bit and 64-bit architectures, the natural alignment is at
least 32 bits, so the offset is always guaranteed to be 32-bit aligned.
Fix this by adding an additional 16 bits padding before the opslot member
which causes the offset of oplab_slots to be 32-bit aligned.
On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment, offsets or struct size.
---
op.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/op.h b/op.h
index fc21f03cda..fb9f538e23 100644
--- a/op.h
+++ b/op.h
@@ -714,6 +714,7 @@ struct opslab {
# ifdef PERL_DEBUG_READONLY_OPS
bool opslab_readonly;
# endif
+ U16 opslab_padding; /* padding to ensure proper alignment */
OPSLOT opslab_slots; /* slots begin here */
};
--
2.27.0 |
Btw, don't mind the formatting issues, the diff output just interprets the tabs stops differently. @jkeenan Could you smoke-test the last patch? Thanks. |
Please hold off with the patch, we're not quite happy yet because it's not clear what happens when |
Okay, this should be correct now. Improved the description again: From 65bed710c6b990285d7feae63e8e9b8c6f8ab8ec Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 12:30:57 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
proper alignment
On m68k, the natural alignment is 16 bits which causes the opslab_opslot
member of "struct opslab" to be aligned at a 16-bit offset. Other 32-bit
and 64-bit architectures have a natural alignment of at least 32 bits, so
the offset is always guaranteed to be at least 32-bit-aligned.
Fix this by adding additional padding bytes before the opslab_opslot
member, both for cases when PERL_DEBUG_READONLY_OPS defined and not
defined to ensure the offset of oplab_slots is always 32-bit-aligned.
On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment, offsets or struct size.
---
op.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/op.h b/op.h
index fc21f03cda..b620983d38 100644
--- a/op.h
+++ b/op.h
@@ -713,7 +713,9 @@ struct opslab {
units) */
# ifdef PERL_DEBUG_READONLY_OPS
bool opslab_readonly;
+ U8 opslab_padding1[3]; /* padding to ensure that opslab_slots is always */
# endif
+ U16 opslab_padding2; /* located at an offset with 32-bit alignment */
OPSLOT opslab_slots; /* slots begin here */
};
--
2.27.0 |
I'm having considerable problems applying this patch.
Can I ask that you:
Alternatively, create a pull request. Thank you very much. |
@jkeenan I might need to rebase the patch, I'll do that. In the meantime, we have come up with an even better variant on the Debian/m68k mailing list ;). I'll open a PR later today. Thanks so much for your patience ;). |
PR opened as #17875. |
The error is at:
Line 269 is:
but Also, OPSLOT is:
and OP is (expanding BASEOP):
which contains pointers, which your compiler should be aligning to 32-bits if pointers require alignment, and hence the OP/OPSLAB should be aligned to 32-bits inside of OPSLAB and have a size that's a a multiple of 32-bits (so arrays work). I don't see how you're getting mis-aligned access, unless calloc is returning a mis-aligned pointer, or there's some compiler option that's disabling required padding. |
@tonycoz The natural alignment on m68k is 16 bits, not 32 bits. See: https://lwn.net/Articles/790735/ and http://www.catb.org/esr/structure-packing/#_alignment_requirements |
You keep repeating that but I don't think it explains anything. The compiler will always align all structures properly (that is, in a correct but not necessarily the most optimal way). In the standard C there's no such thing as a misaligned struct. The only thing that can be misaligned is a pointer (memory address). So if that issue really was caused by a misaligned read, my question is: where exactly did that misaligned address come from? Usually that happens because of a cast to an incompatible type, often paired with pointer arithmetic. |
GCC aligns on 16-bit boundaries on m68k. For example, @xenu At some point, the Perl code casts the pointer to @geertu Maybe you can explain it better as you have been doing m68k development much longer than I have. |
Here is one such case of casts: /* the first (head) opslab of the chain in which this op is allocated */
# define OpSLAB(o) \
(((OPSLAB*)( (I32**)OpSLOT(o) - OpSLOT(o)->opslot_offset))->opslab_head) And the C specification here (http://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf) states:
The compiler cannot guarantee alignment if you are casting pointer types. It's also a common problem on SPARC where this causes unaligned access which results in crashes with |
Yes, indeed that code seems to assume that |
PR was merged. Thank you. |
Both sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void *) (which is 4 bytes, like any other pointer on 68k), since both structures contain pointers. If they didn't the odd elements in an array of such structures would be mis-aligned. |
There is no such rule. On the platforms with 16-bit alignment,
|
That would make a pointer to bar misaligned, even without an array being involved. |
While it's usually the case, there's no rule that says that the platform's native alignment must be the same as the pointer size. On 68k reads of all sizes from addresses divisible by 2 are always legal (i.e. properly aligned). On x86 that number is '4', while on amd64 it's '8' (let's ignore SIMD instructions). BTW, if you want to play with a 68k compiler, I've just found an instance of Compiler Explorer with 68k gcc installed: http://brownbot.mooo.com/. As you can see in the assembly output here, the compiler behaves exactly as I said. |
Another example of a platform where the native alignment is not the same as the pointer size is X32 Linux ABI. Despite the fact that |
This is a common misconception: sizeof(struct) is not a multiple of max(sizeof(struct.member)), |
Yeah, I think I misunderstood the problem. I thought we were seeing alignment errors (given the patch description) on access, but the actual problem isn't alignment on access, but that the assumption that This makes the I'll work on a less fragile fix for this that doesn't require manually managing alignment. |
Fixes Perl#17871 The op slab allocator code made the assumption that since OP and hence OPSLOT contain pointers, the base of each of those would be an integral number of sizeof(pointer) (pointer units) from the beginning of OPSLAB. This assumption is non-portable, and broke calculating the location of the slab based on the address of the op slot and the op slot offset on m68k platforms. To avoid that, this change now stores the opslot_offset as the offset in pointer units from the beginning of opslab_slots rather than from the beginning of the slab. If alignment on a pointer boundary for OPs is required, the compiler will align opslab_opslots and since we work in pointer units from there, any allocated op slots will also be aligned. If we assume PADOFFSET is no larger than a pointer and requires no stricter alignment and structures in themselves have no stricter alignment requirements then since we work in pointer units all core OP structures should have sufficient alignment (if this isn't true, then it's not a new problem, and not the problem I'm trying to solve here.) I haven't been able to test this on m68k hardware (the emulator I tried to use can't maintain a network connection.)
Fixes #17871 The op slab allocator code made the assumption that since OP and hence OPSLOT contain pointers, the base of each of those would be an integral number of sizeof(pointer) (pointer units) from the beginning of OPSLAB. This assumption is non-portable, and broke calculating the location of the slab based on the address of the op slot and the op slot offset on m68k platforms. To avoid that, this change now stores the opslot_offset as the offset in pointer units from the beginning of opslab_slots rather than from the beginning of the slab. If alignment on a pointer boundary for OPs is required, the compiler will align opslab_opslots and since we work in pointer units from there, any allocated op slots will also be aligned. If we assume PADOFFSET is no larger than a pointer and requires no stricter alignment and structures in themselves have no stricter alignment requirements then since we work in pointer units all core OP structures should have sufficient alignment (if this isn't true, then it's not a new problem, and not the problem I'm trying to solve here.) I haven't been able to test this on m68k hardware (the emulator I tried to use can't maintain a network connection.)
Hi!
Perl 5.32 has introduced a regression which results in
miniperl
segfaulting during build.The backtrace with
gdb
looks like this:A wild guess is that might be an alignment issue as the natural alignment for m68k is 16 bits, not 32 bits.
CC @AndreasSchwab @karcherm
The text was updated successfully, but these errors were encountered: