Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pressing w in lf crashes the terminal on macOS #480

Closed
jeffs opened this issue Oct 7, 2020 · 21 comments
Closed

Pressing w in lf crashes the terminal on macOS #480

jeffs opened this issue Oct 7, 2020 · 21 comments

Comments

@jeffs
Copy link

jeffs commented Oct 7, 2020

Other key bindings work fine, but pressing 'w' kills the parent terminal: Tmux pane, or iTerm2 or Terminal.app window. The effect is the same under Bash or Zsh. I was unable to reproduce the issue on Ubuntu. Tbh, I'm kind of impressed.

The culprit is line app.go:404, with cmd as shown below. The lf process somehow survives the terminal crash, and the subsequent error check reports "running shell: exit status 1"; however, the terminal dies immediately on cmd.Run().

cmd = &exec.Cmd{
	Path:"/bin/sh",
	Args:[]string{"sh", "-c", "$SHELL", "--"},
	Env:[]string(nil),
	Dir:"",
	Stdin:(*os.File)(0xc0000b8000),
	Stdout:(*os.File)(0xc0000b8008),
	Stderr:(*os.File)(0xc0000b8010),
	ExtraFiles:[]*os.File(nil),
	SysProcAttr:(*syscall.SysProcAttr)(nil),
	Process:(*os.Process)(nil),
	ProcessState:(*os.ProcessState)(nil),
	ctx:context.Context(nil),
	lookPathErr:error(nil),
	finished:false,
	childFiles:[]*os.File(nil),
	closeAfterStart:[]io.Closer(nil),
	closeAfterWait:[]io.Closer(nil),
	goroutine:[]func() error(nil),
	errch:(chan error)(nil),
	waitDone:(chan struct {})(nil)
}

err = &exec.ExitError{
	ProcessState:(*os.ProcessState)(0xc0005a00c0),
	Stderr:[]uint8(nil)
}
@gokcehan
Copy link
Owner

gokcehan commented Oct 8, 2020

@jeffs It sounds like a strange issue. Unfortunately I don't have a Mac to reproduce this. A couple of questions that may or may not be useful:

  • Does it happen with an empty lfrc?
  • Does your log file have anything relevant after the crash?
  • What is the value of your $SHELL variable in commands inside lf? (e.g. !echo $SHELL)
  • Do you have /bin/sh executable in your machine?
  • Does the following commands work in your shell without lf?
sh -c $SHELL
/bin/sh -c $SHELL
/bin/sh -c 'echo "example args: $1 $2 $3"' -- one two three

@gokcehan gokcehan added the bug label Oct 8, 2020
@jeffs
Copy link
Author

jeffs commented Oct 9, 2020

  1. Yes, I have no lfrc.

  2. Where can I find the log file?

  3. /bin/zsh

  4. Yes:

     $ /bin/sh --version
     GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)
     Copyright (C) 2007 Free Software Foundation, Inc.
    
  5. Yes:

     $ </dev/null sh -c $SHELL; echo $?
     0
    
     $ </dev/null /bin/sh -c $SHELL; echo $?
     0
    
     $ </dev/null /bin/sh -c 'echo "example args: $1 $2 $3"' -- one two three; echo $?
     example args: one two three
     0
    

The results are identical from /bin/zsh.

@thezeroalpha
Copy link

thezeroalpha commented Oct 9, 2020

I have a related issue, but for me (macOS, iTerm2, zsh), the terminal doesn't crash. Rather, it spawns a nested shell, due to the default mapping map w $$SHELL (I suspect @jeffs that this might be what's happening for you too since you have no lfrc, check the variable $SHLVL after pressing w in lf).

My problem is that I have no way of going back to the original lf window; entering "exit" as a command puts me back in the parent shell with the lf process as a suspended background job, and trying to resume it with 'fg' terminates it. This does not happen when I manually call the shell from lf with !$SHELL. I guess the reason for this is the difference in the semantics of the ! and $ prefixes; perhaps it might be better to have the default mapping set to map w !$SHELL?

@gokcehan
Copy link
Owner

@jeffs Log file should be in /tmp/lf.${USER}.${id}.log but note that the file is removed automatically when lf quits so you can only find this file when lf crashes or it is currently running.

By the way, what version are you running? I have now pushed a new release r17. If you are using r16 before, can you try with the latest version? We have changed our terminal ui library so there is a chance this might be resolved on its own.

@thezeroalpha When you exit from the spawned shell, you should return to the running lf without needing to fg the process, at least that is the behavior I have in my windows and linux machines. I don't have a Mac to test this behavior. Can you also try it with r17 to see if things work differently?

@thezeroalpha
Copy link

thezeroalpha commented Oct 12, 2020

@gokcehan I still have an issue on r17, but now the behavior is different. When I open lf and press w, I'm dropped into the shell, but when I start typing, the shell crashes with the message "zsh: error on TTY read: Input/output error". This closes both the spawned and the parent shell.

The same issue happens with Bash, but there is no error message (the version doesn't make a difference, it's the same with Bash 5). Instead, after doing the same steps, the output reads:

[1]+  Stopped                 lf
bash-3.2$ exit
There are stopped jobs.
bash-3.2$ exit

[Process completed]

(I didn't type any of those exit commands myself). Sometimes, the crash is instant, and sometimes it only crashes after user input.

The terminal emulator doesn't make a difference either, the behavior is the same in iTerm2 and Terminal.app.

I can't find a logfile in /tmp even when lf is running in another window, so unfortunately I can't provide any log output.

I'm using the Homebrew version of lf.

@jeffs
Copy link
Author

jeffs commented Oct 13, 2020

@gokcehan I was using r16 from brew, but reproduced from HEAD (6a28667).  I still see the problem with r17 (228a20f).

The log files show up under /var/folders/bb/fr0f0jts4cd922xv60gr9fr80000gn/T/ on my Mac. From lf.jeff.1005.log:

2020/10/13 14:38:44 hi!
2020/10/13 14:38:44 loading files: []
2020/10/13 14:38:45 shell: ${{ $SHELL }} -- []
2020/10/13 14:38:45 error: running shell: exit status 1
2020/10/13 14:38:45 initializing screen: open /dev/tty: device not configured

@gokcehan
Copy link
Owner

@jeffs Sorry for misleading you with the log file location. I guess that's the path Mac uses for temp files.

@jeffs @thezeroalpha It is unfortunate that I don't have a Mac to investigate this issue. I have marked the issue with help wanted tag to hopefully attract a Mac developer to work on this. In the meantime, you may try to get used to lfcd and quit workflow rather than spawning a shell inside lf.

Your issues sound similar in some ways but also different in some others. It might be the case that you both have different MacOS versions and there might have been tty related changes recently in MacOS. We don't directly manipulate tty in lf so it is highly possible that this is a termbox/tcell related problem. If you have Go installed, I may try to come up with a small toy program for you to run and confirm that this is the case, and then we can report the issue back to tcell repo, where devs are more likely to have a better understanding of the problem.

@jeffs
Copy link
Author

jeffs commented Oct 13, 2020

@gokcehan Thanks for looking into it! What you're saying makes sense. Feel free to send code to try on Mac as needed.

@thezeroalpha
Copy link

Thanks @gokcehan. I managed to find the log file, it looks very similar to @jeffs', with a slightly different message:
Also it might be worth noting that now the crash happens regardless of whether the command is $$SHELL or !$SHELL.

2020/10/14 14:09:49 shell: ${{ $SHELL }} -- []
2020/10/14 14:09:51 error: running shell: signal: hangup
2020/10/14 14:09:51 initializing screen: open /dev/tty: device not configured

If you have anything you'd like tested, I'd be happy to help too.

@gokcehan
Copy link
Owner

@jeffs @thezeroalpha Alright, below there are two small test programs. You can copy these to a file with a .go suffix and then run it with go run (e.g. go run asd.go). For the second example, you may need to download tcell first if you don't have it in your go path (e.g. go get -u github.com/gdamore/tcell).

The following example is to test spawning a shell with Go without tcell:

package main

import (
	"fmt"
	"os"
	"os/exec"
)

func main() {
	cmd := exec.Command("sh", "-c", "$SHELL", "--")

	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	fmt.Println("spawning a shell... (try a few commands and then 'exit')")

	if err := cmd.Run(); err != nil {
		panic(err)
	}

	fmt.Println("shell is properly finished.")
}

The following is something similar with tcell:

package main

import (
	"fmt"
	"log"
	"os"
	"os/exec"

	"github.com/gdamore/tcell"
)

func emit(screen tcell.Screen, x, y int, msg string) {
	for _, c := range msg {
		screen.SetContent(x, y, c, nil, tcell.StyleDefault)
		x++
	}
}

func display(msg string) {
	var screen tcell.Screen
	var err error

	if screen, err = tcell.NewScreen(); err != nil {
		log.Fatal("creating screen: %s", err)
	} else if err = screen.Init(); err != nil {
		log.Fatal("initializing screen: %s", err)
	}

	screen.Clear()
	emit(screen, 1, 1, msg)
	screen.Show()

loop:
	for {
		switch screen.PollEvent().(type) {
		case *tcell.EventKey:
			break loop
		}
	}

	screen.Fini()
}

func main() {
	cmd := exec.Command("sh", "-c", "$SHELL", "--")

	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	display("press any key to spawn a shell")

	fmt.Println("spawning a shell... (try a few commands and then 'exit')")

	if err := cmd.Run(); err != nil {
		panic(err)
	}

	fmt.Println("shell is properly finished.")

	display("press any key to quit")
}

@thezeroalpha
Copy link

@gokcehan Test results:

The first example works as expected, the spawned shell works perfectly with everything I tried.

The second example crashes in the same way it did before, with the error zsh: error on TTY read: Input/output error. When I first create a subshell, and run the example from there, after the crash I see the running example as a suspended job in the parent shell: [1] + 36061 suspended (tty input) go run ./example2.go (from jobs). When I try to fg, I get the error:

[1]  + continued  go run ./example2.go
panic: exit status 1

goroutine 1 [running]:
main.main()
	~/tests/example2.go:56 +0x28c
exit status 2

@jeffs
Copy link
Author

jeffs commented Oct 15, 2020

@gokcehan @thezeroalpha The first example works for me, too. The second crashes in cmd.Run(); Run doesn't even return, it just brings the whole process down immediately.

Tcell seems to be leaving the terminal in some wacky state. The program below crashes in cmd.Run(); but if I comment out the call to tcell, it works fine. (I'm printing to a file instead of stdout/stderr because the crash kills my terminal. To see the output, watch cat /tmp/lf-408.out in another terminal.)

package main

import (
	"fmt"
	"log"
	"os"
	"os/exec"

	"github.com/gdamore/tcell"
)

func clear() {
	screen, err := tcell.NewScreen()
	if err != nil {
		log.Fatal("creating screen: %s", err)
	}

	if err = screen.Init(); err != nil {
		log.Fatal("initializing screen: %s", err)
	}

	defer screen.Fini()

	screen.Clear()
}

func run(script string) error {
	cmd := exec.Command("sh", "-c", script, "--")
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr
	return cmd.Run()
}

func main() {
	f, err := os.Create("/tmp/lf-408.log")
	if err != nil {
		panic(err)
	}

	defer f.Close()

	clear() // Removing this line avoids the crash.

	fmt.Fprintln(f, "BEFORE") // always reached

	if err := run("$SHELL"); err != nil {
		fmt.Fprintf(f, "error: %#v\n", err) // never reached
		panic(err)
	}

	fmt.Fprintln(f, "AFTER") // reached iff clear() is removed
}

@gdamore
Copy link

gdamore commented Oct 16, 2020

So the issue is that Screen.Fini closes it's terminal handles. That function was never intended for suspending things -- it's meant to deal with application shut down.

That closing is the source of your problems.

@MunifTanjim
Copy link
Contributor

Hey @gokcehan, have you tried this? https://github.com/fastai/fastmac

@gokcehan
Copy link
Owner

@MunifTanjim I have seen it before but I think it wasn't clear whether it is against TOS or not. In any case, I think the discussion on tcell side is going in the direction to fix the issue in a different way.

@gokcehan
Copy link
Owner

This issue has been bothering me a lot lately. Unfortunately, there are no good news so far. But I felt like I should give a technical update on the issue.

There has been changes in tcell to use stdin/stdout instead of /dev/tty recently. I was hoping that could solve our issue but apparently not. With this recent change, shell commands are now broken on linux as well. The issue this time seems to be that input is consumed by tcell even after Screen.Fini is called. I have mentioned a small change in gdamore/tcell#394 to quit tcell input loop when the screen is closed. However, the first character is still lost with this change which would be annoying to most users after a while. We might also need further changes to make this work as proper ui pause/resume. gdamore/tcell#394 is now closed and at this point it's not clear there will be any further changes to support such ui pause/resume behavior. There is also gdamore/tcell#194 which is an older and more popular issue and it is also likely related to our issue but there hasn't been any discussion there regarding the recent change yet.

So far I have failed to fix the lost first character issue with the most recent tcell. I have been trying non-blocking IO for stdin in tcell, which is difficult but not impossible in Go. The idea is to first syscall.Dup the stdin, then set syscall.SetNonblock the new file descriptor, and then create a new file handler with the descriptor using os.NewFile. Then it is possible to use os.File.SetDeadline in the new file to do non-blocking IO for stdin. I felt like this should work but it didn't. If someone can make it work somehow, please let me know. Alternatively, if you can fix the issue on MacOS with the old /dev/tty tcell version, also let me know and we can maintain a separate patched fork with the old /dev/tty interface.

As a backup plan, I have also been thinking of adding another UI backend to lf as an option. We could have tried adding termbox back since we haven't diverged much so far but discussion here suggests that this issue also existed with termbox (e.g. r16 and before). I don't know any other pure Go terminal library besides tcell and termbox. Another strong alternative would be to use ncurses with cgo. As far as I know, terminal pause/resume behavior is explicitly supported in ncurses as mentioned here.

I have been trying shell commands with small toy ncurses programs. The following should work with go run foo.go if you have a c compiler and ncurses development files installed:

package main

// #include <ncurses.h>
// #cgo LDFLAGS: -lncurses
// static void nprintw(char* s) { printw(s); }
import "C"

import (
	"fmt"
	"os"
	"os/exec"
)

func shellout() {
	C.endwin()

	cmd := exec.Command("sh", "-c", "$SHELL", "--")
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr
	fmt.Println("spawning a shell... (try a few commands and then 'exit')")
	if err := cmd.Run(); err != nil {
		fmt.Println(err)
	} else {
		fmt.Println("shell is properly finished.")
	}

	C.initscr()
}

func draw() {
	C.move(1, 1)
	C.nprintw(C.CString("press 'w' to spawn a shell and 'q' to quit"))
}

func main() {
	C.initscr()
	draw()

loop:
	for {
		ch := C.getch()
		switch ch {
		case 'q':
			break loop
		case 'w':
			shellout()
		}
		draw()
	}

	C.endwin()
}

If you don't have ncurses development files installed, the following python script should do something similar:

#!/usr/bin/env python

import curses
import subprocess

stdscr = curses.initscr()
stdscr.addstr(1, 1, "press 'w' to spawn a shell and 'q' to quit")

while True:
    key = stdscr.getkey()
    if key == 'q':
        break
    elif key == 'w':
        curses.endwin()
        subprocess.run(["sh", "-c", "$SHELL", "--"])
        stdscr = curses.initscr()
    stdscr.addstr(1, 1, "press 'w' to spawn a shell and 'q' to quit")

curses.endwin()

Some of the advantages of a possible ncurses backend are as follows:

  1. Curses is part of POSIX standard as found here and it should come installed in most systems. I also expect it to be much more stable than any other terminal ui library out there. Since it is a commonly used library, people would be hesitant to make any changes in the system that may break curses programs.
  2. Curses API might be more suitable for lf. For example with termbox/tcell API, we have to interpret color/attribute escape codes manually ourselves. Curses might allow additional terminal specific escape codes to work without any effort on our part. For example sixel graphics might simply work, which could be a nice alternative for true image previews.

And some disadvantages are as follows:

  1. Curses requires cgo though this might already be inevitable. One of the things Go has done since the beginning was to avoid libc and have system calls implemented in the language. But this has been fading away recently. As far as I understand, on MacOS Go binaries are already expected to implement system calls through a library like libSystem. I have seen news that OpenBSD will also only allow system calls through libc for security reasons from Go1.16 onwards. Other BSDs might follow OpenBSD in the future. Most people expect Linux to stay the way it is. I'm not sure about Windows and others.
  2. Curses might break static binaries. Currently we compile static binaries by disabling cgo. I think there are other methods to do this with build tags for pure Go. I'm not sure if it's possible to statically link ncurses to Go binaries.
  3. Curses might break our prebuilt binaries. One of the best things about Go is to be able to cross-compile for all supported systems from the same machine which we added to our continuous integration system to automatically build prebuilt binaries for each release. Adding cgo might prevent cross-compilation.
  4. Curses may not work on Windows. I remember a windows port for curses but I don't know the situtation nowadays.
  5. Curses backend requires a significant effort to implement. To give an idea, our ui.go is over 1200 lines and we have some further tcell related code in other places such as colors.go. Maybe about 1/3 of this code might be considered ui independent which can be refactored out. Maybe 1/3 might require slight changes to make them ui independent. And the rest should be implemented specifically for curses. Unfortunately I can only work on lf occasionally these days so it may not be possible for me to work on this anytime soon but maybe contributors can help with this instead if we actually decide in favor of this addition.
  6. Maintaining multiple UIs might be difficult. They will probably not have feature parity. They will also likely double our ui related issues.
  7. Curses might have some quirks regarding input handling and unicode support.

Note that I'm actually considering curses as an optional build feature using build tags (e.g. go build +curses) so disadvantages 1, 2, 3, and 4 may not actually be much of an issue. Disadvantages 5, 6, and 7 are more serious issues to be considered. Feel free to leave feedback about curses backend for discussion.

@gdamore
Copy link

gdamore commented Feb 13, 2021 via email

@gokcehan
Copy link
Owner

Good news, suspend/resume api has been added to tcell yesterday. @gdamore Thank you very much for this addition.

We have now changed to tcell v.2.2.0 in our master branch. It does seem to work well on linux and windows. It would be nice if people here can confirm it is working for MacOS and others as well.

I have also made a somewhat related change. "Press any key to continue" for shell-wait commands are now implemented properly with golang.org/x/term package instead of hardcoded stty (unix) and pause (windows) shell commands. Tcell now also uses x/term so hopefully this change should not break anything. With this change, requirement for a POSIX compatible shell for shell option might be dropped and other shells might be usable.

I'm planning to wait for a while to make sure things are stable. Then we can tag a new release.

@gdamore
Copy link

gdamore commented Feb 21, 2021

Glad it helped! The work to address that yesterday fixed a very long standing bug in tcell -- I recall being stumped by the fact that blocking reads from stdin (or /dev/tty) were non-interruptible in macOS years ago. I honestly consider the fact that close() does not cause the read to wake up to be a driver bug.

The trick was to set the file descriptor to non-blocking mode, and to also use termios to set both VMIN and VMAX to zero. We don't want that all the time, but when we're trying to shut down (or suspend) we can do that to break into it. (Note that Windows has a more elegant solution under the hood, with waitformultipleobjects as an api that I use with a cancellation object.)

Anyway, hopefully you can close this bug now. ;-) Other projects have had similar complaints, so hopefully this will make life better for all of them.

@MunifTanjim
Copy link
Contributor

MunifTanjim commented Feb 21, 2021

I can confirm that #480 #554 #555 are fixed on macOS. 🎉 🎉 🎉

Thanks to both of you ❤️ @gokcehan @gdamore

@gokcehan
Copy link
Owner

Nobody has reported anything so I have made a new release r21 with the fix.

There is an EventError fired after cmd.Run which shows up in our log files as:

EventError: 'read /dev/stdin: resource temporarily unavailable'

But it seems harmless. We could disable event error logging to get rid of it but it could be useful elsewhere so I haven't changed anything.

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants