Skip to content

PHP: Make proc_open work without requiring an explicit sleep() #951

@adamziel

Description

@adamziel

Problem

Playground supports proc_open() through a custom function called js_open_process() added in #596. Instead of deferring to OS process opening functions, like PHP does by default, it calls a user-defined callback that spawns a new process and returns a ChildProcess object.

js_open_process then writes any stdin data using cp.stdin.write(dataStr); and captures the output using cp.stdout.on('data', function (data) {. The stdout event listener is asynchronous and may receive data after 1ms or 100ms – we don't really know.

What does native PHP do?

Let's consider the following script:

<?php
$descriptorspec = array(
    0 => array("pipe", "r"),  // stdin
    1 => array("pipe", "w"),  // stdout
    2 => array("pipe", "w")   // stderr
);
$process = proc_open('less', $descriptorspec, $pipes);

if (is_resource($process)) {
    fwrite($pipes[0], "Hello world!\n");
    fclose($pipes[0]);

    echo stream_get_contents($pipes[1]);
    fclose($pipes[1]);

    proc_close($process);
}

It will output "Hello world!\n". However, there are some nuances.

  • stream_get_contents expects the input pipe to close. If we comment fclose($pipes[0]); in that script above, it will hand indefinitely. I'm not sure where that behavior comes from, but both _php_stream_fill_read_buffer and php_stream_eof sound like interesting candidates.
  • fread() reads whatever is available and returns. If we replace stream_get_contents() with fread($pipes[1], 1024); and comment the fclose() call, the script will finish and output Hello world!\n

If I adjust the proc_open() call to be proc_open('sleep 1; less'), then the fread() call takes more than 1 second which tells me something in it is blocking after all.

Similarly:

proc_open('sleep 1; echo "Hi"; sleep 1; echo "There";', /* ...args */);

// This call takes around 1s and outputs "Hi":
echo fread($pipes[1], 5);

// This call takes around 1s and outputs "There":
echo fread($pipes[1], 5);

Which makes it seem like fread waits for any output, regardless of the buffer size. In this case, we could yield back in _php_stream_read() and until the process fd has some data available.

If, however, I call stream_set_blocking($pipes[1], 0);, then the fread() call returns instantly. For plain streams, that call is translated to flags |= O_NONBLOCK; fcntl(fd, F_SETFL, flags). Cool! We're getting somewhere!

Here's some more resources:

https://bugs.php.net/bug.php?id=47918

I have encountered a number of applications that will cause PHP hang on
fread() until the process closes (regardless of whether or not the buffer has filled).
I have to disappoint here, the anonymous pipes are plain file descriptors

Possible Solutions

A proper fix would yield back to the event loop in the same place where PHP waits for data.

Idea 1 – Async-compatible libc implementation

Playground currently patches PHP with custom implementations of functions like select(2) in an attempt to make the synchronous C code work in an asynchronous JavaScript runtime. These patches target PHP whereas in reality their goal is to replace blocking syscalls with async-compatible syscalls. Instead of PHP, we should be patching the syscalls library.

To illustrate the issue, here's the wasm_select implementation:

EMSCRIPTEN_KEEPALIVE int wasm_select(int max_fd, fd_set * read_fds, fd_set * write_fds, fd_set * except_fds, struct timeval * timeouttv) {
	emscripten_sleep(0); // always yield to JS event loop
	int timeoutms = php_tvtoto(timeouttv);
	int n = 0;
	for (int i = 0; i < max_fd; i++)
	{
		if (FD_ISSET(i, read_fds)) {
			n += wasm_poll_socket(i, POLLIN | POLLOUT, timeoutms);
		} else if (FD_ISSET(i, write_fds)) {
			n += wasm_poll_socket(i, POLLOUT, timeoutms);
		}
	}
	return n;
}

If the relevant syscalls knew how to wait for asynchronous events, we wouldn't need that at all.

Here's the Emscripten-provided library_syscall.js file that handles syscalls:

https://github.com/emscripten-core/emscripten/blob/29b0eaacfda55c5c034453a880609cca91f39d8e/src/library_syscall.js

Functions like __syscall__newselect or __syscall_poll could return Asyncify.sleep() whenever required, handle the asynchronous data flow, and call wakeUp() whenever a regular OS would.

Idea 2 – Handle EWOULDBLOCK in _fd_read

I patched _fd_read as follows:

function _fd_read(fd, iov, iovcnt, pnum) {
	try {
		var stream = SYSCALLS.getStreamFromFD(fd);
		//  console.log({ stream });
		var num = doReadv(stream, iov, iovcnt);
		HEAPU32[pnum >> 2] = num;
		return 0;
	} catch (e) {
		console.error(e);
		console.trace();
		if (typeof FS == 'undefined' || !(e.name === 'ErrnoError')) throw e;
		return e.errno;
	}
}

And got an error with errno=6. It seems like in Emscripten that means EWOULDBLOCK, which makes sense. I found this related Emscripten issue:

pipe() doesn't create a blocking pipe

It seems like we could detect that error and return Asyncify.sleep() instead.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions