Perl 5.32 segfaults on Linux/m68k (regression) #17871

glaubitz · 2020-06-18T10:09:52Z

Hi!

Perl 5.32 has introduced a regression which results in miniperl segfaulting during build.

The backtrace with gdb looks like this:

root@pacman:/srv/perl2/perl-5.32.0~rc1/build-static# gdb --args ./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl.  Please run make mint/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl.  Please run make mininiperl.  Please run make minitminiperl.  Please run make minitest; exit 1'


GNU gdb (Debian 9.2-1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "m68k-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./miniperl...
(gdb) r
Starting program: /srv/perl2/perl-5.32.0~rc1/build-static/miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e \<\?\>
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, 
    my_perl=0x8030a190) at op.c:269
269         if (!slab->opslab_freed) {
(gdb) bt
#0  S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, 
    my_perl=0x8030a190) at op.c:269
#1  0x8000901e in Perl_Slab_Free (op=0x80312136, my_perl=0x8030a190)
    at op.c:505
#2  Perl_Slab_Free (my_perl=0x8030a190, op=0x803120aa) at op.c:484
#3  0x8000fae6 in S_op_destroy (o=<optimized out>, my_perl=<optimized out>)
    at op.c:6416
#4  Perl_op_append_list (type=<optimized out>, last=<optimized out>, 
    first=0x80312136, my_perl=<optimized out>) at op.c:6416
#5  Perl_op_append_list (my_perl=<optimized out>, type=<optimized out>, 
    first=<optimized out>, last=<optimized out>) at op.c:6397
#6  0x800625c8 in Perl_yyparse (my_perl=0x8030a190, gramtype=258)
    at perly.y:239
#7  0x800321fe in S_parse_body (xsinit=0x801885f8 <xs_init>, env=0x0, 
    my_perl=0x8030a190) at perl.c:2576
#8  perl_parse (my_perl=0x8030a190, xsinit=0x801885f8 <xs_init>, argc=7, 
    argv=0xeffffc14, env=0x0) at perl.c:1871
#9  0x800050ae in main (argc=<optimized out>, argv=<optimized out>, 
    env=<optimized out>) at miniperlmain.c:132
(gdb)

A wild guess is that might be an alignment issue as the natural alignment for m68k is 16 bits, not 32 bits.

CC @AndreasSchwab @karcherm

The text was updated successfully, but these errors were encountered:

jkeenan · 2020-06-18T12:36:45Z

On 6/18/20 6:10 AM, John Paul Adrian Glaubitz wrote: Hi! Perl 5.32 has introduced a regression which results in |miniperl| segfaulting during build. The backtrace with |gdb| looks like this: ***@***.***:/srv/perl2/perl-5.32.0~rc1/build-static# gdb --args ./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make mint/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make mininiperl. Please run make minitminiperl. Please run make minitest; exit 1' GNU gdb (Debian 9.2-1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "m68k-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./miniperl... (gdb) r Starting program: /srv/perl2/perl-5.32.0~rc1/build-static/miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e \<\?\> [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, my_perl=0x8030a190) at op.c:269 269 if (!slab->opslab_freed) { (gdb) bt #0 S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, my_perl=0x8030a190) at op.c:269 #1 0x8000901e in Perl_Slab_Free (op=0x80312136, my_perl=0x8030a190) at op.c:505 #2 Perl_Slab_Free (my_perl=0x8030a190, op=0x803120aa) at op.c:484 #3 0x8000fae6 in S_op_destroy (o=<optimized out>, my_perl=<optimized out>) at op.c:6416 #4 Perl_op_append_list (type=<optimized out>, last=<optimized out>, first=0x80312136, my_perl=<optimized out>) at op.c:6416 #5 Perl_op_append_list (my_perl=<optimized out>, type=<optimized out>, first=<optimized out>, last=<optimized out>) at op.c:6397 #6 0x800625c8 in Perl_yyparse (my_perl=0x8030a190, gramtype=258) at perly.y:239 #7 0x800321fe in S_parse_body (xsinit=0x801885f8 <xs_init>, env=0x0, my_perl=0x8030a190) at perl.c:2576 #8 perl_parse (my_perl=0x8030a190, xsinit=0x801885f8 <xs_init>, argc=7, argv=0xeffffc14, env=0x0) at perl.c:1871 #9 0x800050ae in main (argc=<optimized out>, argv=<optimized out>, env=<optimized out>) at miniperlmain.c:132 (gdb) | A wild guess is that might be an alignment issue as the natural alignment for m68k is 16 bits, not 32 bits. CC @AndreasSchwab <https://github.com/AndreasSchwab> @karcherm <https://github.com/karcherm>

Please supply the way you invoked 'sh ./Configure' to achieve these results. Example: ##### sh ./Configure -des -Dusedevel -Duseithreads ##### Thank you very much. Jim Keenan

glaubitz · 2020-06-18T12:48:49Z

Hi Jim! Thanks for your answer.

The full build log can be found in: https://buildd.debian.org/status/fetch.php?pkg=perl&arch=m68k&ver=5.32.0%7Erc1-1&stamp=1592359772&raw=0

From the Debian process, it looks like the configure script is invoked as:

/bin/sh debian/config.debian --static

I'm currently bisecting the git tree from upstream now to see if I can find the commit in question that broke the build.

jkeenan · 2020-06-18T12:58:02Z

On 6/18/20 6:10 AM, John Paul Adrian Glaubitz wrote: Hi! Perl 5.32 has introduced a regression which results in |miniperl| segfaulting during build. The backtrace with |gdb| looks like this: ***@***.***:/srv/perl2/perl-5.32.0~rc1/build-static# gdb --args ./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make mint/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make mininiperl. Please run make minitminiperl. Please run make minitest; exit 1' GNU gdb (Debian 9.2-1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "m68k-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./miniperl... (gdb) r Starting program: /srv/perl2/perl-5.32.0~rc1/build-static/miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e \<\?\> [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, my_perl=0x8030a190) at op.c:269 269 if (!slab->opslab_freed) { (gdb) bt #0 S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, my_perl=0x8030a190) at op.c:269 #1 0x8000901e in Perl_Slab_Free (op=0x80312136, my_perl=0x8030a190) at op.c:505 #2 Perl_Slab_Free (my_perl=0x8030a190, op=0x803120aa) at op.c:484 #3 0x8000fae6 in S_op_destroy (o=<optimized out>, my_perl=<optimized out>) at op.c:6416 #4 Perl_op_append_list (type=<optimized out>, last=<optimized out>, first=0x80312136, my_perl=<optimized out>) at op.c:6416 #5 Perl_op_append_list (my_perl=<optimized out>, type=<optimized out>, first=<optimized out>, last=<optimized out>) at op.c:6397 #6 0x800625c8 in Perl_yyparse (my_perl=0x8030a190, gramtype=258) at perly.y:239 #7 0x800321fe in S_parse_body (xsinit=0x801885f8 <xs_init>, env=0x0, my_perl=0x8030a190) at perl.c:2576 #8 perl_parse (my_perl=0x8030a190, xsinit=0x801885f8 <xs_init>, argc=7, argv=0xeffffc14, env=0x0) at perl.c:1871 #9 0x800050ae in main (argc=<optimized out>, argv=<optimized out>, env=<optimized out>) at miniperlmain.c:132 (gdb) | A wild guess is that might be an alignment issue as the natural alignment for m68k is 16 bits, not 32 bits. CC @AndreasSchwab <https://github.com/AndreasSchwab> @karcherm <https://github.com/karcherm>

Thank you for this report and especially for the backtrace. When I first read this ticket, I had no idea what "Linux/m68k" was. Eventually I found this page, http://www.linux-m68k.org/index.html, which informed me that this was a project to port Linux to the Motorola 68000 series of microprocessors. That page appears to have been last updated around 2002. Searching our issue queue for https://github.com/Perl/perl5/issues?q=m68k+in%3Atitle, it appears this is the first bug report we've had for this platform since 2003. And, AFAICT, we have never received a smoke-test report for m68k at http://perl5.test-smoke.org/. Questions: * Probably the only way we could identify the commit which broke this build would be to run Porting/bisect.pl on the machine where you observed this failure. Is that at all feasible? * To continue to support perl on this platform going forward, we would need to have regular smoke-test reports? Would that be feasible? * Does this architecture have sufficient use in production to warrant an attempt to maintain perl on it? Thank you very much. Jim Keenan

glaubitz · 2020-06-18T13:04:37Z

When I first read this ticket, I had no idea what "Linux/m68k" was.
Eventually I found this page, http://www.linux-m68k.org/index.html,
which informed me that this was a project to port Linux to the Motorola
68000 series of microprocessors. That page appears to have been last
updated around 2002.

We are maintaining it in Debian, see: https://buildd.debian.org/status/architecture.php?a=m68k&suite=sid

Searching our issue queue for
https://github.com/Perl/perl5/issues?q=m68k+in%3Atitle, it appears this
is the first bug report we've had for this platform since 2003.

It might just have worked well all the time :).

Probably the only way we could identify the commit which broke this
build would be to run Porting/bisect.pl on the machine where you
observed this failure. Is that at all feasible?

Sure. It's also reproducible in QEMU. You can easily install Debian/m68k on QEMU, see: https://wiki.debian.org/M68k/QemuSystemM68k

To continue to support perl on this platform going forward, we would
need to have regular smoke-test reports? Would that be feasible?

Depends on what you mean with regular. People are maintaining Linux (and NetBSD) on classic computers in their free time. So if you are asking for a weekly CI run, it might be a bit too much.

Does this architecture have sufficient use in production to warrant an
attempt to maintain perl on it?

It's used by the community who are maintaining Linux and NetBSD for older architectures.

In any case, Perl doesn't have any architecture-specific bits, does it? And it just seems there is a minor regression to me, so I don't think it would take much effort to fix this problem.

Leont · 2020-06-18T14:59:26Z

From the Debian process, it looks like the configure script is invoked as:

./perl -Ilib -V:config_args should also give the required information if you can get that far. Otherwise you could peek into lib/Config_heavy.pl and look for "config_args".

I'm currently bisecting the git tree from upstream now to see if I can find the commit in question that broke the build.

I would guess it to be 7b85c12 (only change in that area at first glance), @iabyn do you have a clue?

glaubitz · 2020-06-18T19:17:53Z

I have bisected the problematic commit to be 0bd6eef.

Reverting it, fixes Perl for me on m68k.

glaubitz · 2020-06-18T19:37:15Z

It's the alignment of opslot that causes the segmentation fault.

This fixes the problem for me:

diff --git a/op.h b/op.h
index fc21f03cda..480c95245b 100644
--- a/op.h
+++ b/op.h
@@ -698,7 +698,7 @@ struct opslot {
     U16         opslot_size;        /* size of this slot (in pointers) */
     U16         opslot_offset;      /* offset from start of slab (in ptr units) */
     OP         opslot_op;              /* the op itself */
-};
+} __attribute__ ((aligned (4)));
 
 struct opslab {
     OPSLAB *   opslab_next;            /* next slab */

The native alignment on m68k is 16 bits.

glaubitz · 2020-06-18T19:38:25Z

Alternatively, one should be able to add a padding variable of type U16 in the struct.

iabyn · 2020-06-18T21:02:29Z

On Thu, Jun 18, 2020 at 12:37:29PM -0700, John Paul Adrian Glaubitz wrote: It's the alignment of ```opslot``` that causes the segmentation fault.

In what way exactly is the compiler getting the alignment wrong?

…

-- Fire extinguisher (n) a device for holding open fire doors.

jkeenan · 2020-06-18T21:08:45Z

On 6/18/20 3:18 PM, John Paul Adrian Glaubitz wrote: I have bisected the problematic commit to be 0bd6eef <0bd6eef>. Reverting it, fixes Perl for me on m68k.

In the core distribution I have created branch 'smoke-me/jkeenan/gh-17871-m68k' to make this patch available for smoke-testing. Glaubnitz, if you could provide us (off-list) with an appropriate email-address, that would facilitate our listing you properly in AUTHORS. Thank you very much. Jim Keenan

glaubitz · 2020-06-18T21:08:55Z

The native alignment on m68k is 16 bits, not 32 bits, i.e. what you would expect from a 32-bits architecture.

jkeenan · 2020-06-18T21:09:53Z

@tonycoz Can you take a look? Thanks.

glaubitz · 2020-06-18T21:13:28Z

One of the structs used will align to 16 bits on m68k, while it will align to 32 bits on any other 32-bit architecture and to 64 bits on 64-bit targets.

I tried to use padding bytes, but so far without success. Normally, padding bytes would have the same effect as using the __attribute((aligned(4))) built-in, but for that one needs to calculate the proper padding so that the resulting struct has the correct size.

I will continue to figure that out tomorrow, it's getting late here. But if you have found a better solution in the mean time, let me know.

So far, many thanks for the friendly discussion.

jkeenan · 2020-06-18T21:36:10Z

Early smoke-test results (e.g., http://perl5.test-smoke.org/report/114741) suggest that this patch will cause perl not to build on Windows.

glaubitz · 2020-06-18T21:46:08Z

@jkeenan Yes, that's not surprising as Microsoft's Visual C/C++ compiler does not understand the directive __attribute__((aligned(X))). That's why I prefer using padding bytes which work on any platform and compiler. I just haven't found the correct padding yet.

jkeenan · 2020-06-18T22:08:01Z

On 6/18/20 5:36 PM, James E Keenan wrote: Early smoke-test results (e.g., http://perl5.test-smoke.org/report/114741) suggest that this patch will cause perl not to build on Windows.

Also: https://github.com/Perl/perl5/runs/786025468

glaubitz · 2020-06-19T05:46:29Z

Good morning! After some debugging, I figured out where the padding is needed. Updated patch below which fixes the build for me.

From 3585dd865baaa3c631964fbba758491b26ef954a Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 07:42:08 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
 proper alignment

On architectures where the native alignment is less than 4 bytes such as
Motorola 68000 (m68k), we need an additional padding of 2 bytes such that
the offset of the slots inside the slabs is correct.
---
 op.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/op.h b/op.h
index fc21f03cda..fb9f538e23 100644
--- a/op.h
+++ b/op.h
@@ -714,6 +714,7 @@ struct opslab {
 # ifdef PERL_DEBUG_READONLY_OPS
     bool       opslab_readonly;
 # endif
+    U16         opslab_padding;                /* padding to ensure proper alignment */
     OPSLOT     opslab_slots;           /* slots begin here */
 };
 
-- 
2.27.0

glaubitz · 2020-06-19T06:28:45Z

An alternative description for the commit would be:

Perl crashes with a segmentation fault on Linux/m68k due to an alignment
issue. On m68k, the natural alignment is 16 bits which causes the opslot
member of "struct opslab" to be aligned at a 16-bit offset.

On other 32-bit and 64-bit architectures, the alignment is at least
32 bits, so the offset is always guaranteed to be 32-bit aligned. Fix
this by adding an additional 16 bits padding before the opslot member
which causes the offset of oplab_slots to be 32-bit aligned.

On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment or struct size.

Tux · 2020-06-19T06:33:43Z

I think it would be wise to also mention that m68k is big-endian

glaubitz · 2020-06-19T06:34:38Z

Hmm, is that actually relevant?

Here's an updated patch with the improvement commit message:

From 8cd015a5cc230105bdcb8c33db80ef9f06b1c670 Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 08:33:33 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
 proper alignment

Perl crashes with a segmentation fault on m68k due to an alignment issue.
On m68k, the natural alignment is 16 bits which causes the opslot
member of "struct opslab" to be aligned at a 16-bit offset.

On other 32-bit and 64-bit architectures, the natural alignment is at
least 32 bits, so the offset is always guaranteed to be 32-bit aligned.
Fix this by adding an additional 16 bits padding before the opslot member
which causes the offset of oplab_slots to be 32-bit aligned.

On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment, offsets or struct size.
---
 op.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/op.h b/op.h
index fc21f03cda..fb9f538e23 100644
--- a/op.h
+++ b/op.h
@@ -714,6 +714,7 @@ struct opslab {
 # ifdef PERL_DEBUG_READONLY_OPS
     bool       opslab_readonly;
 # endif
+    U16         opslab_padding;                /* padding to ensure proper alignment */
     OPSLOT     opslab_slots;           /* slots begin here */
 };

-- 
2.27.0

glaubitz · 2020-06-19T06:35:55Z

Btw, don't mind the formatting issues, the diff output just interprets the tabs stops differently.

@jkeenan Could you smoke-test the last patch? Thanks.

glaubitz · 2020-06-19T09:17:42Z

Please hold off with the patch, we're not quite happy yet because it's not clear what happens when PERL_DEBUG_READONLY_OPS is set.

glaubitz · 2020-06-19T10:36:01Z

Okay, this should be correct now. Improved the description again:

From 65bed710c6b990285d7feae63e8e9b8c6f8ab8ec Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 12:30:57 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
 proper alignment

On m68k, the natural alignment is 16 bits which causes the opslab_opslot
member of "struct opslab" to be aligned at a 16-bit offset. Other 32-bit
and 64-bit architectures have a natural alignment of at least 32 bits, so
the offset is always guaranteed to be at least 32-bit-aligned.

Fix this by adding additional padding bytes before the opslab_opslot
member, both for cases when PERL_DEBUG_READONLY_OPS defined and not
defined to ensure the offset of oplab_slots is always 32-bit-aligned.
On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment, offsets or struct size.
---
 op.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/op.h b/op.h
index fc21f03cda..b620983d38 100644
--- a/op.h
+++ b/op.h
@@ -713,7 +713,9 @@ struct opslab {
                                            units) */
 # ifdef PERL_DEBUG_READONLY_OPS
     bool       opslab_readonly;
+    U8          opslab_padding1[3];    /* padding to ensure that opslab_slots is always */
 # endif
+    U16         opslab_padding2;       /* located at an offset with 32-bit alignment */
     OPSLOT     opslab_slots;           /* slots begin here */
 };
 
-- 
2.27.0

CC @geertu @vivier

jkeenan · 2020-06-19T11:55:22Z

Okay, this should be correct now. Improved the description again:

From 65bed710c6b990285d7feae63e8e9b8c6f8ab8ec Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 12:30:57 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
 proper alignment

On m68k, the natural alignment is 16 bits which causes the opslab_opslot
member of "struct opslab" to be aligned at a 16-bit offset. Other 32-bit
and 64-bit architectures have a natural alignment of at least 32 bits, so
the offset is always guaranteed to be at least 32-bit-aligned.

Fix this by adding additional padding bytes before the opslab_opslot
member, both for cases when PERL_DEBUG_READONLY_OPS defined and not
defined to ensure the offset of oplab_slots is always 32-bit-aligned.
On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment, offsets or struct size.
---
 op.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/op.h b/op.h
index fc21f03cda..b620983d38 100644
--- a/op.h
+++ b/op.h
@@ -713,7 +713,9 @@ struct opslab {
                                            units) */
 # ifdef PERL_DEBUG_READONLY_OPS
     bool       opslab_readonly;
+    U8          opslab_padding1[3];    /* padding to ensure that opslab_slots is always */
 # endif
+    U16         opslab_padding2;       /* located at an offset with 32-bit alignment */
     OPSLOT     opslab_slots;           /* slots begin here */
 };
 
-- 
2.27.0

CC @geertu @vivier

I'm having considerable problems applying this patch.

$ git checkout blead
Switched to branch 'blead'
Your branch is up to date with 'origin/blead'.
[perl] 565 $ git am < ~/learn/perl/p5p/gh-2nd-17871.diff
Applying: op.h: Add additional padding to struct opslab to ensure proper alignment
error: patch failed: op.h:713
error: op.h: patch does not apply
Patch failed at 0001 op.h: Add additional padding to struct opslab to ensure proper alignment
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Can I ask that you:

Do a clean checkout of the blead branch.
Create a local branch: git checkout -b gh-17181-2nd
Apply your patch to the local branch.
Run Configure, make, make test. (For now, ignore errors in t/porting/*.t pertaining to authorship.)
Create a patch for submission: git format-patch -o /tmp HEAD^ and then paste in ticket.

Alternatively, create a pull request.

Thank you very much.
Jim Keenan

glaubitz · 2020-06-19T12:49:10Z

@jkeenan I might need to rebase the patch, I'll do that. In the meantime, we have come up with an even better variant on the Debian/m68k mailing list ;).

I'll open a PR later today. Thanks so much for your patience ;).

glaubitz · 2020-06-19T12:59:18Z

PR opened as #17875.

tonycoz · 2020-06-20T02:15:03Z

The error is at:

#0  S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, 
    my_perl=0x8030a190) at op.c:269

Line 269 is:

    if (!slab->opslab_freed) {

but opslab_freed is part of the base structure, so I'm confused that your suggested patch can make any difference.

Also, OPSLOT is:

struct opslot {
    U16         opslot_size;        /* size of this slot (in pointers) */
    U16         opslot_offset;      /* offset from start of slab (in ptr units) */
    OP		opslot_op;		/* the op itself */
};

and OP is (expanding BASEOP):

struct op {
    OP*		op_next;		\
    OP*		op_sibparent;		\
    OP*		(*op_ppaddr)(pTHX);	\
    PADOFFSET	op_targ;		\
    PERL_BITFIELD16 op_type:9;		\
    PERL_BITFIELD16 op_opt:1;		\
    PERL_BITFIELD16 op_slabbed:1;	\
    PERL_BITFIELD16 op_savefree:1;	\
    PERL_BITFIELD16 op_static:1;	\
    PERL_BITFIELD16 op_folded:1;	\
    PERL_BITFIELD16 op_moresib:1;       \
    PERL_BITFIELD16 op_spare:1;		\
    U8		op_flags;		\
    U8		op_private;

};

which contains pointers, which your compiler should be aligning to 32-bits if pointers require alignment, and hence the OP/OPSLAB should be aligned to 32-bits inside of OPSLAB and have a size that's a a multiple of 32-bits (so arrays work).

I don't see how you're getting mis-aligned access, unless calloc is returning a mis-aligned pointer, or there's some compiler option that's disabling required padding.

glaubitz · 2020-06-20T05:28:12Z

@tonycoz The natural alignment on m68k is 16 bits, not 32 bits.

See: https://lwn.net/Articles/790735/ and http://www.catb.org/esr/structure-packing/#_alignment_requirements

xenu · 2020-06-20T06:09:54Z

You keep repeating that but I don't think it explains anything. The compiler will always align all structures properly (that is, in a correct but not necessarily the most optimal way). In the standard C there's no such thing as a misaligned struct.

The only thing that can be misaligned is a pointer (memory address). So if that issue really was caused by a misaligned read, my question is: where exactly did that misaligned address come from? Usually that happens because of a cast to an incompatible type, often paired with pointer arithmetic.

glaubitz · 2020-06-20T06:13:53Z

GCC aligns on 16-bit boundaries on m68k. For example, sizeof(struct opslab) returns 46 on m68k, not 48 as on i386. If you want, I can run a small C test program you send me to demonstrate that.

@xenu At some point, the Perl code casts the pointer to (INT32 *) if I remember correctly, let me check.

@geertu Maybe you can explain it better as you have been doing m68k development much longer than I have.

glaubitz · 2020-06-20T06:20:59Z

Here is one such case of casts:

/* the first (head) opslab of the chain in which this op is allocated */
# define OpSLAB(o) \
    (((OPSLAB*)( (I32**)OpSLOT(o) - OpSLOT(o)->opslot_offset))->opslab_head)

And the C specification here (http://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf) states:

All declarations that refer to the same object or function shall have compatible type;otherwise, the behavior is undefined.

The compiler cannot guarantee alignment if you are casting pointer types. It's also a common problem on SPARC where this causes unaligned access which results in crashes with Bus error.

xenu · 2020-06-20T07:13:04Z

Yes, indeed that code seems to assume that sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void*). Now I understand the problem.

xsawyerx · 2020-06-20T13:36:05Z

PR was merged. Thank you.

tonycoz · 2020-06-20T14:02:20Z

Yes, indeed that code seems to assume that sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void*). Now I understand the problem.

Both sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void *) (which is 4 bytes, like any other pointer on 68k), since both structures contain pointers.

If they didn't the odd elements in an array of such structures would be mis-aligned.

xenu · 2020-06-20T19:20:23Z

There is no such rule. On the platforms with 16-bit alignment, sizeof of the following struct will be '6' (assuming sizeof(void*) == 4):

struct {
    uint16_t foo;
    void* bar;
}

tonycoz · 2020-06-20T22:41:22Z

That would make a pointer to bar misaligned, even without an array being involved.

xenu · 2020-06-20T23:13:41Z

While it's usually the case, there's no rule that says that the platform's native alignment must be the same as the pointer size. On 68k reads of all sizes from addresses divisible by 2 are always legal (i.e. properly aligned). On x86 that number is '4', while on amd64 it's '8' (let's ignore SIMD instructions).

BTW, if you want to play with a 68k compiler, I've just found an instance of Compiler Explorer with 68k gcc installed: http://brownbot.mooo.com/. As you can see in the assembly output here, the compiler behaves exactly as I said.

xenu · 2020-06-20T23:35:25Z

Another example of a platform where the native alignment is not the same as the pointer size is X32 Linux ABI. Despite the fact that sizeof(void*) == 4, the native alignment is 8 bytes: https://gcc.godbolt.org/z/ZaiVRo

geertu · 2020-06-21T08:12:53Z

Both sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void *) (which is 4 bytes, like any other pointer on 68k), since both structures contain pointers.

This is a common misconception: sizeof(struct) is not a multiple of max(sizeof(struct.member)),
but rather max(alignof(struct.member)).
alignof(void *), alignof(int), and alignof(long) are 2 bytes on m68k (and cris, a.o.).

tonycoz · 2020-06-22T00:12:48Z

Yeah, I think I misunderstood the problem. I thought we were seeing alignment errors (given the patch description) on access, but the actual problem isn't alignment on access, but that the assumption that offsetof(struct opslab, opslab_slots) % sizeof(I32 *) == 0 is incorrect.

This makes the OpSLAB() macro return an incorrect pointer to the slab and everything breaks from there.

I'll work on a less fragile fix for this that doesn't require manually managing alignment.

Fixes Perl#17871 The op slab allocator code made the assumption that since OP and hence OPSLOT contain pointers, the base of each of those would be an integral number of sizeof(pointer) (pointer units) from the beginning of OPSLAB. This assumption is non-portable, and broke calculating the location of the slab based on the address of the op slot and the op slot offset on m68k platforms. To avoid that, this change now stores the opslot_offset as the offset in pointer units from the beginning of opslab_slots rather than from the beginning of the slab. If alignment on a pointer boundary for OPs is required, the compiler will align opslab_opslots and since we work in pointer units from there, any allocated op slots will also be aligned. If we assume PADOFFSET is no larger than a pointer and requires no stricter alignment and structures in themselves have no stricter alignment requirements then since we work in pointer units all core OP structures should have sufficient alignment (if this isn't true, then it's not a new problem, and not the problem I'm trying to solve here.) I haven't been able to test this on m68k hardware (the emulator I tried to use can't maintain a network connection.)

Fixes #17871 The op slab allocator code made the assumption that since OP and hence OPSLOT contain pointers, the base of each of those would be an integral number of sizeof(pointer) (pointer units) from the beginning of OPSLAB. This assumption is non-portable, and broke calculating the location of the slab based on the address of the op slot and the op slot offset on m68k platforms. To avoid that, this change now stores the opslot_offset as the offset in pointer units from the beginning of opslab_slots rather than from the beginning of the slab. If alignment on a pointer boundary for OPs is required, the compiler will align opslab_opslots and since we work in pointer units from there, any allocated op slots will also be aligned. If we assume PADOFFSET is no larger than a pointer and requires no stricter alignment and structures in themselves have no stricter alignment requirements then since we work in pointer units all core OP structures should have sufficient alignment (if this isn't true, then it's not a new problem, and not the problem I'm trying to solve here.) I haven't been able to test this on m68k hardware (the emulator I tried to use can't maintain a network connection.)

glaubitz added the Needs Triage label Jun 18, 2020

jkeenan added distro-Linux and removed Needs Triage labels Jun 18, 2020

xsawyerx closed this as completed Jun 20, 2020

tonycoz reopened this Jun 22, 2020

tonycoz self-assigned this Jun 22, 2020

tonycoz mentioned this issue Jun 28, 2020

a more robust fix #17911

Merged

khwilliamson closed this as completed in #17911 Jul 30, 2020

Perl 5.32 segfaults on Linux/m68k (regression) #17871

Perl 5.32 segfaults on Linux/m68k (regression) #17871

Comments

glaubitz commented Jun 18, 2020

jkeenan commented Jun 18, 2020 via email

glaubitz commented Jun 18, 2020

jkeenan commented Jun 18, 2020 via email

glaubitz commented Jun 18, 2020

Leont commented Jun 18, 2020 • edited Loading

glaubitz commented Jun 18, 2020

glaubitz commented Jun 18, 2020

glaubitz commented Jun 18, 2020

iabyn commented Jun 18, 2020 via email

jkeenan commented Jun 18, 2020 via email

glaubitz commented Jun 18, 2020

jkeenan commented Jun 18, 2020

glaubitz commented Jun 18, 2020

jkeenan commented Jun 18, 2020

glaubitz commented Jun 18, 2020

jkeenan commented Jun 18, 2020 via email

glaubitz commented Jun 19, 2020

glaubitz commented Jun 19, 2020 • edited Loading

Tux commented Jun 19, 2020

glaubitz commented Jun 19, 2020

glaubitz commented Jun 19, 2020

glaubitz commented Jun 19, 2020

glaubitz commented Jun 19, 2020

jkeenan commented Jun 19, 2020

glaubitz commented Jun 19, 2020

glaubitz commented Jun 19, 2020

tonycoz commented Jun 20, 2020

glaubitz commented Jun 20, 2020

xenu commented Jun 20, 2020 • edited Loading

glaubitz commented Jun 20, 2020 • edited Loading

glaubitz commented Jun 20, 2020 • edited Loading

xenu commented Jun 20, 2020

xsawyerx commented Jun 20, 2020

tonycoz commented Jun 20, 2020

xenu commented Jun 20, 2020

tonycoz commented Jun 20, 2020

xenu commented Jun 20, 2020 • edited Loading

xenu commented Jun 20, 2020 • edited Loading

geertu commented Jun 21, 2020

tonycoz commented Jun 22, 2020

Leont commented Jun 18, 2020 •

edited

Loading

glaubitz commented Jun 19, 2020 •

edited

Loading

xenu commented Jun 20, 2020 •

edited

Loading

glaubitz commented Jun 20, 2020 •

edited

Loading

glaubitz commented Jun 20, 2020 •

edited

Loading

xenu commented Jun 20, 2020 •

edited

Loading

xenu commented Jun 20, 2020 •

edited

Loading