Skip to content
This repository has been archived by the owner on May 8, 2020. It is now read-only.

chromium build issues #10

Open
therealkenc opened this issue Oct 19, 2016 · 23 comments
Open

chromium build issues #10

therealkenc opened this issue Oct 19, 2016 · 23 comments

Comments

@therealkenc
Copy link

therealkenc commented Oct 19, 2016

Trying out the chromium patches, in the tcmalloc third party library I hit some usual sys/types.h silliness, followed by a mmap() redefinition. I can fix things up easy enough, but I am thinking I have probably dropped a stitch along the way in getting the environment right. I am using musl-1.1.15-r2.ebuild from the overlay. Any guesses on what I might have missed is appreciated.

[edit] It looks like I have something set up wrong for sure, because HAVE_SYS_CDEFS_H is ending up defined somehow, and of course cdefs.h which doesn't exist on musl also fails.

In file included from ../../third_party/tcmalloc/chromium/src/malloc_hook_mmap_linux.h:51:0,
                 from ../../third_party/tcmalloc/chromium/src/malloc_hook.cc:698:
../../third_party/tcmalloc/chromium/src/base/linux_syscall_support.h:1932:37: error: ‘__off64_t’ has not been declared

[...]

malloc_hook_mmap_linux.h:194:18: error: redefinition of ‘void* mmap(void*, size_t, int, int, int, off_t)’
 extern "C" void* mmap(void *start, size_t length, int prot, int flags,
                  ^
In file included from ../../third_party/tcmalloc/chromium/src/malloc_hook.cc:40:0:
../../third_party/tcmalloc/chromium/src/malloc_hook_mmap_linux.h:180:18: note: ‘void* mmap(void*, size_t, int, int, int, __off64_t)’ previously defined here
 extern "C" void* mmap64(void *start, size_t length, int prot, int flags,
                  ^
@lluixhi
Copy link
Owner

lluixhi commented Oct 20, 2016

http://www.openwall.com/lists/musl/2014/08/08/11

As far as I know, musl still doesn't support alternative malloc implementations, so you have to disable tcmalloc.

... which is why i didn't look into a patch to fix it.

@therealkenc
Copy link
Author

therealkenc commented Oct 21, 2016

Appreciate the link; that explains everything. I naively thought it would build out of the box like firefox. Go ahead and close this issue out if you like.

Great patches by the way. I hope some of this work eventually drifts all the way upstream.

@lluixhi lluixhi closed this as completed Oct 21, 2016
@lluixhi
Copy link
Owner

lluixhi commented Oct 21, 2016

They're actually mostly the same as what voidlinux and alpinelinux have in their patchsets -- I haven't started opening issues on the chromium side because I can't get chromium to run yet. There's some kind of segmentation fault going on which I think is still related to threading.

@therealkenc
Copy link
Author

Okay thanks for the tip. I might install Alpine just to take a look at what they've done. I gather chromium runs (to some reasonable degree) on Alpine, including whatever dependent packages they might have patched differently? Or when you say "open issues on the chromium side" are you implying the thread related problem is (or could be) more fundamental? Chromium is typically statically linked, so with possible exception of some externalities like dbus and namespaces in kernel for sandbox, it shouldn't much care what musl distro it's running on. Is that about right?

@lluixhi
Copy link
Owner

lluixhi commented Oct 21, 2016

I'm assuming that chromium runs on Voidlinux and Alpinelinux, but I'm also not 100% sure -- the main reason for that being that I'm using a similar patchset and it doesn't work, and they only recently added a patch (https://github.com/lluixhi/musl-extras/blob/master/www-client/chromium/files/musl/06_all_fix-stack.patch#L1-L12) that's critical in order to keep chromium from crashing after loading any page. (They added it as of chromium 53, but it has been an issue for a much longer period of time -- at least since chromium 45, probably earlier)

The above problem is because musl makes the default thread stack size 80KiB (which is a good default in 99% of cases), while glibc sets the default thread stack size to 1MiB, which appears to be convention in solaris, OSX, and other libc implementations, but because chromium (as well as webkit) assumes the 1MiB restriction, we run out of stack space on new threads. This is also a problem in webkit, where i have a patch which prevents JavascriptCore from crashing, but neither Alpine nor Void have picked it up yet, so I can only assume that it's broken on their end.. (https://github.com/lluixhi/musl-extras/blob/master/dev-qt/qtwebkit/files/qtwebkit-5.5.1-fix-stack-size-musl.patch)

Anyway, I think the issue exists somewhere between chromium 45 and 49, because qtwebengine-5.6.x (based off of chromium 45) works but qtwebengine-5.7.x (based off of chromium 49) does not.
One thing that changed was that the seccomp-bpf syscall sandboxing was enabled or made stricter, and because musl uses different syscalls as compared to glibc, without https://github.com/lluixhi/musl-extras/blob/master/dev-qt/qtwebengine/files/qtwebengine-5.7.0-musl-sandbox.patch
we segmentation fault because we use incompatible syscalls or syscalls with incompatible options.
There seems to be some other issue regarding threading (last time I checked we still crashed during pthread_clone) but it's kind of difficult to debug because chromium mixes green threads with posix threads and multiple processes (chrome zygote), and it crashes before I can connect gdb.

What i mean by opening issues on the chromium side is that I'm not comfortable submitting patches upstream to the chromium project if they don't work. I also need to build chromium with the patchset on glibc to make sure I don't break anything there.

Then there's the issue that GN, the new build system for chromium that replaces GYP in chromium 54+, needs to be patched because they use a memory allocator hack that's incompatible with musl.

And yes, because it's statically linked, chromium should run about the same on different musl-based linux distros. Alpine will probably be a better test because they also use grsecurity patches in their kernel.

@jirutka
Copy link

jirutka commented Oct 21, 2016

IIRC @ncopa has great experience with solving Chromium issues on Alpine (musl), let's ask him.

@therealkenc
Copy link
Author

I've got Alpine up and their binary seems stable enough in a short test. YouTube videos play anyway, They are on 53.0.2785.143 currently. The thread stack problem has a small patch, and 53 is still on GYP. I'll look out for your sandbox issue and much appreciate the heads up.

@lluixhi lluixhi changed the title chromium build issues related to sys/types.h and mmap() chromium build issues Oct 31, 2016
@lluixhi lluixhi reopened this Oct 31, 2016
@lluixhi
Copy link
Owner

lluixhi commented Oct 31, 2016

Alright, so the current segmentation fault in qtwebengine is because of a SIGILL ILLOPN. I wonder whether this could be fixed by using an older version of sys-devel/gcc (I'm using gcc-6.2.0 right now, and there might be some kind of codegen bug.)

@xhebox
Copy link

xhebox commented Nov 13, 2016

@lluixhi If you're still trapped in chromium's segfault, i'm sure that you need this to get it worked(maybe qtwebengine need this, too):
https://git.archlinux.org/svntogit/packages.git/tree/trunk/chromium-52.0.2743.116-unset-madv_free.patch?h=packages/chromium

I'm running chromium(53.0.2785.143) well with your patches, with this patch and stack_size patch.(compiled by gcc620)

EDIT: There's a dirty fix patch for ppapi plugins: https://raw.githubusercontent.com/xhebox/noname-linux/master/ports/chromium/musl-ppapi-nosandbox-and-fixdlopen.patch. I first made it because i want to use ppapi flash under musl. Then i found that all ppapi plugins(which need to use dlopen to load) will be blocked by sandbox. The fisrt part of this patch(before 'flash' appeared) should solve this, the second part is for closed source flash not for this issue.

@lluixhi
Copy link
Owner

lluixhi commented Nov 16, 2016

@xhebox Thanks!
qtwebengine appears to be working now, though I'm going to do some more testing.

@lluixhi
Copy link
Owner

lluixhi commented Nov 16, 2016

Alright, seems not everything is quite right, at least in qtwebengine. Im still getting segmentation faults on some web pages. For instance, when trying to open the chat on GMail/Hangouts, and in some other cases.

I think that this is possibly a JavaScript issue.

@xhebox
Copy link

xhebox commented Nov 17, 2016

@lluixhi

Did you apply this? I mean this cflag.

  # Work around bug in blink in which GCC 6 optimizes away null pointer checks
  # https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833524
  # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68853#c2
  sed -i '/config("compiler")/ a cflags_cc = [ "-fno-delete-null-pointer-checks" ]' \
    build/config/linux/BUILD.gn

@lluixhi
Copy link
Owner

lluixhi commented Nov 17, 2016

Locally, yes.

That's an issue that's not musl specific, though.

@xhebox
Copy link

xhebox commented Nov 17, 2016

When using chromium, i got segfaults, either. But no effects to use.

Received signal 11 SEGV_MAPERR 000000000208
  r8: 0000562574cae5a8  r9: 0000000000004375 r10: 0000562574938898 r11: 0000000000000000
 r12: 00005625780665e0 r13: 0000562578067ce0 r14: 0000000000000000 r15: 0000562578064738
  di: 0000562578072288  si: 0000000000000032  bp: 0000562578068160  bx: 0000562578072288
  dx: 0000000000000001  ax: 0000000000000000  cx: 0000000000000000  sp: 00007ffe0f1984c0
  ip: 00007f19bb00ccef efl: 0000000000010246 cgf: 002b000000000033 erf: 0000000000000004
 trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000208
[end of stack trace]

Also i get these two:

getrlimit(RLIMIT_NOFILE) failed

[1:14:1117/213721:ERROR:ffmpeg_demuxer.cc(1492)] OnReadFrameDone result=-541478725 IsMaxMemoryUsageReached=0

@lluixhi
Copy link
Owner

lluixhi commented Nov 22, 2016

@xhebox
It seems to me that qtwebengine-5.7.0 is stable as long as you're using ffmpeg-2.x -- when using ffmpeg-3.x, there are segmentation faults when the page has video -- I think this is another non-musl bug.

@xhebox
Copy link

xhebox commented Nov 22, 2016

@lluixhi I'll take a look on it, and good news, i got 540 compiled and worked.

The main changes are:

  1. pthread_setname_np is added in musl, since i'm using the patchset of alpine. So i removed the stub patch of chromium. More here

  2. One RTLD flags which musl does not support, here.

  3. Most important, allocator_shim.patch. Make all Glibc*** directly invoke the real malloc instead of __libc***, so we can compile it. Remove the overrides of malloc, so it won't be a dead loop(Glibc*** -> alloc -> Glibc).

  4. And... I found it can't found the correct path for pkg-config to output, always -I../../include/glib-2.0(Should be /include/***). I've got a xhebox.patch, and using sed to correct it. But i think this is my own issue, maybe you can try 540 and find out more about this(whether you will get the same result)? Thx.

Detailed build file here

@lluixhi
Copy link
Owner

lluixhi commented Nov 22, 2016

Thanks! Some comments:

  1. We won't actually need the pthread-setname_np patch because https://github.com/lluixhi/musl-extras/blob/master/www-client/chromium/files/musl/09_all_no-pthread-setname.patch is actually from upstream. I think we can just wait until musl-1.1.16
  2. I'll probably modify that to instead define RTLD_DEEPBIND to 0, which is what alpine and voidlinux appear to be doing in audacity and docker (it's also more portable).
  3. Yeah, I was attempting to make a similar patch and didn't get around to it. Thanks!
  4. Hmm. I'll look into that.

@xhebox
Copy link

xhebox commented Nov 27, 2016

UPDATE: CONFIRMED

--- chromium-54.0.2840.100/content/common/sandbox_linux/bpf_gpu_policy_linux.cc	2016-11-10 20:02:14.000000000 +0000
+++ chromium-54.0.2840.100/content/common/sandbox_linux/bpf_gpu_policy_linux.cc	2016-11-10 20:02:14.000000000 +0000
@@ -337,6 +337,7 @@
   static const char kNvidiaParamsPath[] = "/proc/driver/nvidia/params";
 
   static const char kDevShm[] = "/dev/shm/";
+  static const char kDevShm2[] = "/run/shm/";
 
   CHECK(broker_process_ == NULL);
 
@@ -349,6 +350,8 @@
     // For shared memory.
     permissions.push_back(
         BrokerFilePermission::ReadWriteCreateUnlinkRecursive(kDevShm));
+    permissions.push_back(
+        BrokerFilePermission::ReadWriteCreateUnlinkRecursive(kDevShm2));
     // For multi-card DRI setups. NOTE: /dev/dri/card0 was already added above.
     for (int i = 1; i <= 9; ++i) {
       permissions.push_back(BrokerFilePermission::ReadWrite(

I've got the segfaults clearly, chromium failed to launch GPU process. Syscall trace there, it tried to open /run/shm, but it's not in the whitelist of sandbox. I've made a patch, testing. I want to know if your chromium open /run/shm, too? This could be a portability-patch for systems that let chromium open /run instead of /dev.

14:36:26.504754 memfd_create("xshmfence", MFD_CLOEXEC|MFD_ALLOW_SEALING) = -1 EPERM (Operation not permitted)
14:36:26.504780 open("/run/shm/shmfd-apMfAa", O_RDWR|O_CREAT|O_EXCL, 0600) = 2
14:36:26.504806 --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_errno=ENOENT, si_call_addr=0x7fc9f59711bc, si_syscall=__NR_open, si_arch=AUDIT_ARCH_X86_64} ---
14:36:26.504822 rt_sigreturn({mask=[]}) = -1 EPERM (Operation not permitted)

Received signal 11 SEGV_MAPERR 000000000208
  r8: 0000563d8c3c9fc8  r9: 00000000000061f9 r10: 0000563d8c209860 r11: 0000000000000005
 r12: 0000563d8c584d00 r13: 0000563d8c5846c0 r14: 0000000000000000 r15: 0000563d8c5812a0
  di: 0000563d8c597028  si: 0000000000000032  bp: 0000563d8c584dc0  bx: 0000563d8c597028
  dx: 0000000000000001  ax: 0000000000000000  cx: 0000000000000000  sp: 00007ffde724e1f0
  ip: 00007efc5f1fb94f efl: 0000000000010246 cgf: 002b000000000033 erf: 0000000000000004
 trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000208
[end of stack trace]
Received signal 11 SEGV_MAPERR 000000000208
  r8: 00005635db287948  r9: 0000000000006347 r10: 00005635db0d5860 r11: 000000000000000f
 r12: 00005635dbe816a0 r13: 00005635dc29ca00 r14: 0000000000000000 r15: 00005635dc2a17c0
  di: 00005635dc2a2fa8  si: 0000000000000032  bp: 00005635dc1ebec0  bx: 00005635dc2a2fa8
  dx: 0000000000000001  ax: 0000000000000000  cx: 0000000000000000  sp: 00007fff0c1f3a10
  ip: 00007fb9947ce94f efl: 0000000000010246 cgf: 002b000000000033 erf: 0000000000000004
 trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000208
[end of stack trace]
Received signal 11 SEGV_MAPERR 000000000208
  r8: 000055e1559dd948  r9: 0000000000006430 r10: 000055e15582b860 r11: 000000000000000f
 r12: 000055e1564ea460 r13: 000055e1564e9d80 r14: 0000000000000000 r15: 000055e1564e7b80
  di: 000055e1567d2028  si: 0000000000000032  bp: 000055e1564ea520  bx: 000055e1567d2028
  dx: 0000000000000001  ax: 0000000000000000  cx: 0000000000000000  sp: 00007ffeface4db0
  ip: 00007f66d168094f efl: 0000000000010246 cgf: 002b000000000033 erf: 0000000000000004
 trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000208
[end of stack trace]
[24954:25042:1127/172614:ERROR:browser_gpu_channel_host_factory.cc(113)] Failed to launch GPU process.
getrlimit(RLIMIT_NOFILE) failed
[24954:25042:1127/172614:ERROR:browser_gpu_channel_host_factory.cc(113)] Failed to launch GPU process.

@xhebox
Copy link

xhebox commented Nov 27, 2016

Found ffmpeg error in three bug reports in chromium, but seems it did not attract developer's attention. Maybe this is an 'excepted' error for chromium?

@lluixhi
Copy link
Owner

lluixhi commented Nov 27, 2016

About the ffmpeg bug:

I fixed it with https://github.com/lluixhi/gentoo/blob/5a2e2775c6ebcab54d8cb88700bc585eed7853f1/dev-qt/qtwebengine/files/qtwebengine-5.7.0-fix-system-ffmpeg.patch
in qtwebengine, and it's fixed by the chromium-system-ffmpeg patches already in Gentoo.

I don't appear to have the /dev/shm /run/shm issue you're mentioning. I'm going to try chromium without a hardened kernel next to see if PaX is disabling something that is not covered by paxmarking.

@xhebox
Copy link

xhebox commented Nov 28, 2016

Strange, i'm using the same patch for ffmpeg....Maybe it's because of init's bug i solved yesterday. And thx for your feedbacks about shm, configure.ac said that xorg-server fallbacks by default to /run when building without specific configure.

@xhebox xhebox mentioned this issue Feb 10, 2017
15 tasks
@stefson
Copy link
Contributor

stefson commented Sep 8, 2017

Can you please give chromium a revbump? I would like to give it another try :)

@xhebox
Copy link

xhebox commented Sep 9, 2017

@stefson @lluixhi chromium needs a fix patch with sandbox on 60 version:

--- ./sandbox/linux/seccomp-bpf-helpers/syscall_sets.cc.orig
+++ ./sandbox/linux/seccomp-bpf-helpers/syscall_sets.cc
@@ -373,6 +373,7 @@
 #if defined(__i386__)
     case __NR_waitpid:
 #endif
+    case __NR_set_tid_address:
       return true;
     case __NR_clone:  // Should be parameter-restricted.
     case __NR_setns:  // Privileged.
@@ -385,7 +386,6 @@
 #if defined(__i386__) || defined(__x86_64__) || defined(__mips__)
     case __NR_set_thread_area:
 #endif
-    case __NR_set_tid_address:
     case __NR_unshare:
 #if !defined(__mips__) && !defined(__aarch64__)
     case __NR_vfork:

if who needs patches and configurations that could pass the build directly: https://github.com/xhebox/noname-linux/blob/master/ports/chromium/Pkgfile

be careful

  1. gn_fix_for_noname.patch is for my system only, skip that patch
  2. vaapi is not a essential patch

algitbot pushed a commit to alpinelinux/aports that referenced this issue Sep 15, 2017
to fix tab crashes, use patch from
lluixhi/musl-extras#10 (comment)

use various patches from fedora
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants