Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chrome/Chromium does not work (pread64: Input/output error (5)) if home directory is mounted via NFS #326

Closed
wapsi opened this issue Sep 5, 2023 · 5 comments

Comments

@wapsi
Copy link

wapsi commented Sep 5, 2023

This happens on Zen kernels, if ~/.config/chromium|chrome directory is under NFS mount:

user@xxx:~$ chromium
[0905/204352.786436:ERROR:process_memory_linux.cc(49)] pread64: Input/output error (5)
Bus error (core dumped)
user@xxx:~$ google-chrome
[0905/204157.440930:ERROR:process_memory_linux.cc(49)] pread64: Input/output error (5)
Bus error (core dumped)

This is somehow related to my /home NFS mount, because as a local user Chrome/Chromium starts just fine. And they start OK as "NFS home mount" user if I symlink my ~/.config/chromium to a directory that resides on my local SSD.

So I guess it's an issue with the Zen/Liquorix kernel, because Chrome/Chromium works on Debian stable/sid kernels just fine, even the ~/.config/chromium directory resides on a NFS share.

In strace, there is this happening:

--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7fc58e200000} ---
gettid()                                = 10629
prctl(PR_GET_DUMPABLE)                  = 1 (SUID_DUMP_USER)
prctl(PR_SET_PTRACER, 10644)            = 0
rt_sigprocmask(SIG_BLOCK, [CONT], [BUS], 8) = 0
sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\250I\207\224\305\177\0\0\10\3463\0d/\0\0\0\0\0\0\0\0\0\0"..., iov_len=40}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 40
rt_sigtimedwait([CONT], [0905/204405.563561:ERROR:scoped_ptrace_attach.cc(27)] ptrace: Operation not permitted (1)
{si_signo=SIGCONT, si_code=SI_TKILL, si_pid=10644, si_uid=549401103}, {tv_sec=5, tv_nsec=0}, 8) = 18 (SIGCONT)
rt_sigprocmask(SIG_SETMASK, [BUS], NULL, 8) = 0
futex(0x2f640033e628, FUTEX_WAKE_PRIVATE, 2147483647) = 0
rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fc592c5afd0}, NULL, 8) = 0
getpid()                                = 10629
gettid()                                = 10629
rt_tgsigqueueinfo(10629, 10629, SIGBUS, {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7fc58e200000}) = 0
rt_sigreturn({mask=[]})                 = 140486469746688
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7fc58e200000} ---
+++ killed by SIGBUS (core dumped) +++
Bus error (core dumped)

I also tried to reset Chromium settings by renaming .config/chromium => .config/chromium.bak, and starting Chromium again, but it still crashes.

Kernel version:
Linux 6.4.14-1-liquorix-amd64 #1 ZEN SMP PREEMPT liquorix 6.4-18.1~bookworm (2023-09-02) x86_64 GNU/Linux

I've the following sysctl parameters applied, nothing else (I don't believe that these cause the issue):

# NFS optimizations
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 32768 262144 16777216
net.ipv4.tcp_wmem = 32768 262144 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 0
@damentz
Copy link
Member

damentz commented Sep 5, 2023

Ok a few questions for you.

  1. Are there any logs on your NFS server when this issue occurs?
  2. Are there any kernel logs that emit as a result of this issue?
  3. The strace log you gave I think is after the incident, what was chrome trying to do before the SIGBUS error?

And last, just an observation of mine, NFS is not a great idea to link the .config/chromium directory. Both Chrome and Firefox use a database there and NFS doesn't really do well with synchronous activity that a database needs. Not to mention, if you had systems using the same linked folder at the same time, nothing is stopping the browsers from causing rampant data corruption there as they clobber each others writes.

@wapsi
Copy link
Author

wapsi commented Sep 6, 2023

  1. No, nothing at the server side.
  2. Dmesg shows nothing when the issue occurs.
  3. I attached the whole log of stracing starting of Chrome:
    strace-chrome-crash.log

And last, just an observation of mine, NFS is not a great idea to link the .config/chromium directory. Both Chrome and Firefox use a database there and NFS doesn't really do well with synchronous activity that a database needs. Not to mention, if you had systems using the same linked folder at the same time, nothing is stopping the browsers from causing rampant data corruption there as they clobber each others writes.

Yes, I'm fully aware of your concern, and that's why I've separate controls implemented that prevents parallel graphical logons on different or on the same workstations in my LAN.

I also noticed that there's something weird happeing also with Gnome Web/Epiphany on Zen kernel: Opening of MS Teams will lead to infinite login loop, but on Debian sid kernel (6.4.13-1) it works just fine.

In the past I have had some weird NFS issues if the file locking is not working properly on NFS mounts. I don't know if that's the issue now, I doubt that.

I compared the Debian 6.4.13-1 and 6.4.14-1-liquorix-amd64 kernel configs and filtered all NFS related settings from there. They're quite the same, the only one which seems a bit weird is that 6.4.14-1-liquorix-amd64 has CONFIG_NFS_V4_2_READ_PLUS set to "y", while it's not set in Debian kernel. I checked from the kernel documentation what that is:

config NFS_V4_2_READ_PLUS
	bool "NFS: Enable support for the NFSv4.2 READ_PLUS operation"
	depends on NFS_V4_2
	default n
	help
	 This is intended for developers only. The READ_PLUS operation has
	 been shown to have issues under specific conditions and should not
	 be used in production.

BTW: My NFS mount settings are:

x.y.z:/home on /home type nfs4 (rw,nosuid,nodev,noatime,nodiratime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=xxx,local_lock=none,addr=x.y.z)

So NFSv4.2 is actived and supported on both sides (client & server), and it's definitely in use, because server-side copying is working in my setup, and that requires NFSv4.2.

damentz added a commit to damentz/liquorix-package that referenced this issue Sep 6, 2023
 - Enable UDP support
 - Disable v4.2 read plus
 - Disable NFSD flex file layout

Related issues:
 - zen-kernel/zen-kernel#326
damentz added a commit to damentz/liquorix-package that referenced this issue Sep 7, 2023
Another experimental option that is marked as not production ready.  Turn
off since it could be related to github issue [1].

[1] zen-kernel/zen-kernel#326
@wapsi
Copy link
Author

wapsi commented Sep 7, 2023

Those changes fixed the issue(s), thank you!

@wapsi wapsi closed this as completed Sep 7, 2023
@damentz damentz reopened this Sep 7, 2023
@damentz
Copy link
Member

damentz commented Sep 7, 2023

I'm reopening temporarily, the changes I made were just for Liquorix.

@heftig please review the commits I made for NFS to Liquorix as they need to be applied to linux-zen's config (also stock Arch?) where missing:

  1. damentz/liquorix-package@72747f5
  2. damentz/liquorix-package@cb3a2f1

@damentz
Copy link
Member

damentz commented Sep 9, 2023

Ah nevermind, it appears this issue was only tested on Debian with stock and Liquorix. Closing out.

@damentz damentz closed this as completed Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants