Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fflush doesn't work on WASI #3790

Closed
ktock opened this issue Apr 20, 2023 · 17 comments · Fixed by #3820
Closed

fflush doesn't work on WASI #3790

ktock opened this issue Apr 20, 2023 · 17 comments · Fixed by #3820

Comments

@ktock
Copy link

ktock commented Apr 20, 2023

Describe the bug

fflush provided by wasi-sdk doesn't work on wasmer.

wasmer -vV; rustc -vV
wasmer 3.2.0 (8a343d0 2023-04-18)
binary: wasmer-cli
commit-hash: 8a343d0f9a1d3d4b4bfe622ed345253d780a3d76
commit-date: 2023-04-18
host: x86_64-unknown-linux-gnu
compiler: singlepass,cranelift,llvm
sh: 1: rustc: not found

Steps to reproduce

sample C program:

#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
  char buf[5];
  int len = 5, ret;

  printf("input something\n");
  printf("> ");
  fflush(stdout);
  ret = read(0, buf, len);
  if (ret > 0) {
    printf("got %.*s (first 5 chars)\n", ret, buf);
  }
  return 0;
}

I built this using wasi-sdk 19.0. (${WASI_SDK_PATH}/bin/clang --sysroot=${WASI_SDK_PATH}/share/wasi-sysroot -o out.wasm main.c)

On host, prompt (>) is printed by fflush.

$ gcc main.c ; ./a.out
input something
> hi
got hi
 (first 5 chars)

On wasmer, prompt (>) isn't printed (isn't flushed).

$ wasmer /app/out.wasm
input something

Expected behavior

fflush works and buffered strings are printed.

Actual behavior

fflush doesn't work and buffered strings aren't printed.

Additional context

@syrusakbary
Copy link
Member

/bounty $150

@algora-pbc
Copy link

algora-pbc bot commented Apr 23, 2023

💎 $150.00 bounty created by syrusakbary
🙋‍♂️ If you'd like to solve this issue, comment below to get assigned
👉 To claim this bounty, submit a pull request that includes the text /claim #3790 somewhere in its body
📝 Before proceeding, please make sure you can receive payouts in your country
💵 Payment arrives in your account 2-5 days after the bounty is rewarded
💯 You keep 100% of the bounty award
🙏 Thank you for contributing to wasmerio/wasmer!

@mucinoab
Copy link
Contributor

I am working on this.

I was able to reproduce the issue in Rust with

// Compile with "cargo build --target wasm32-wasi --release"
use std::io::{self, Write};

fn main() {
    let mut buffer = String::new();
    let mut stdout = io::stdout().lock();
    let stdin = io::stdin();

    stdout.write(b"Input something\n").unwrap();
    stdout.write(b"> ").unwrap();
    stdout.flush().unwrap();

    stdin.read_line(&mut buffer).unwrap();
}

@ibrahim-akrab
Copy link
Contributor

I am trying to look into this but the codebase is quite large I keep jumping from one file to another. Any directions about where the execution of the wasm file happens or where it is passed to the runtime would be much appreciated.

@mucinoab
Copy link
Contributor

mucinoab commented Apr 24, 2023

This issue is basically the same as #363 but it seem that if I run the above reproduction it does not hit the code path that fixed that issue.

Also, it seems that my explicit flush in the example is not triggering any flush, nor sync syscalls. The only time that we flush is when closing the files in here.

For example

use std::io::{self, Write};

fn main() {
    let mut stdout = io::stdout().lock();
    stdout.flush().unwrap();
    
    stdout.write(b"1").unwrap();
    stdout.flush().unwrap();
    
    stdout.write(b"22").unwrap();
    stdout.flush().unwrap();
    
    stdout.write(b"333").unwrap();
    stdout.flush().unwrap();
    
    stdout.flush().unwrap();
    stdout.flush().unwrap();
}

We get the following trace, again, note the lack of flush or sync.

TRACE ThreadId(01) mio::poll: registering event source with poller: token=Token(2147483649), interests=READABLE    
DEBUG ThreadId(01) wasmer_wasix::fs: Initializing WASI filesystem
TRACE ThreadId(01) wasmer_wasix::state::func_env: module::export - memory
TRACE ThreadId(01) wasmer_wasix::state::func_env: module::export - _start
TRACE ThreadId(01) wasmer_wasix::state::func_env: module::export - __main_void
TRACE ThreadId(01) wasmer_wasix::state::func_env: module::import - wasi_snapshot_preview1::fd_write
TRACE ThreadId(01) wasmer_wasix::state::func_env: module::import - wasi_snapshot_preview1::environ_get
TRACE ThreadId(01) wasmer_wasix::state::func_env: module::import - wasi_snapshot_preview1::environ_sizes_get
TRACE ThreadId(01) wasmer_wasix::state::func_env: module::import - wasi_snapshot_preview1::proc_exit
TRACE ThreadId(01) fd_write: wasmer_wasix::syscalls::wasi::fd_write: return=Errno::success fd=1 nwritten=1
TRACE ThreadId(01) fd_write: wasmer_wasix::syscalls::wasi::fd_write: close time.busy=335µs time.idle=40.3µs fd=1 nwritten=1
TRACE ThreadId(01) fd_write: wasmer_wasix::syscalls::wasi::fd_write: return=Errno::success fd=1 nwritten=2
TRACE ThreadId(01) fd_write: wasmer_wasix::syscalls::wasi::fd_write: close time.busy=136µs time.idle=22.7µs fd=1 nwritten=2
TRACE ThreadId(01) fd_write: wasmer_wasix::syscalls::wasi::fd_write: return=Errno::success fd=1 nwritten=3
TRACE ThreadId(01) fd_write: wasmer_wasix::syscalls::wasi::fd_write: close time.busy=157µs time.idle=9.96µs fd=1 nwritten=3
TRACE ThreadId(01) wasmer_wasix::state::func_env: wasi[1:1]::cleanup - destroying local thread variables
TRACE ThreadId(01) wasmer_wasix::state::env: wasi[1]:: cleaning up open file handles
122333 TRACE ThreadId(01) wasmer_wasix::fs: closing file descriptor fd=1 inode=3 ref_cnt=1
TRACE ThreadId(01) wasmer_wasix::fs: closing file descriptor fd=0 inode=2 ref_cnt=1
TRACE ThreadId(01) wasmer_wasix::fs: closing file descriptor fd=3 inode=1 ref_cnt=2
TRACE ThreadId(01) wasmer_wasix::fs: closing file descriptor fd=2 inode=4 ref_cnt=1

Is the compiler not generating the flush call? or is getting optimized away? or are we not picking up the call?

Using compiler-explorer I can see that the call is there. See: https://godbolt.org/z/bxhG66o6T

@syrusakbary what is your intuition about all this or who could may know? Thanks.

@mucinoab
Copy link
Contributor

@ibrahim-akrab Turn on the tracing logs to get some idea.

wasmer -d program.wasm

@syrusakbary
Copy link
Member

syrusakbary commented Apr 24, 2023

The Wasm file is likely properly generated. So it's likely is an issue in the system call layer (wasmer wasix)

@ibrahim-akrab
Copy link
Contributor

Also, it seems that my explicit flush in the example is not triggering any flush, nor sync syscalls. The only time that we flush is when closing the files in here.

The problem kind of lies here. Wasmer waits for sync syscall to flush the output but the sync syscall doesn't get generated even when using native binary. And we shouldn't see any flush call either since all file operations are translated to syscalls and the flush is simply a way to tell libc to execute the write syscall now with the data present in the buffer. Wasmer should behave similarly by flushing on every fd_write syscall.

When tracing the program execution with strace we see the same file syscalls across the native binary and the wasmer executed wasm executed in the same order. The only difference is that when wasmer executes the fd_write syscall, it doesn't print the output due to the previously mentioned missing flush.

I fixed the issue by adding

handle.flush().await.map_err(map_io_error)?;

before the Ok in the match arm:

Kind::File { handle, .. } => {
if let Some(handle) = handle {
let handle = handle.clone();
drop(guard);
let written = wasi_try_ok!(__asyncify_light(
env,
if fd_entry.flags.contains(Fdflags::NONBLOCK) {
Some(Duration::ZERO)
} else {
None
},
async {
let mut handle = handle.write().unwrap();
if !is_stdio {
handle
.seek(std::io::SeekFrom::Start(offset as u64))
.await
.map_err(map_io_err)?;
}
let mut written = 0usize;
for iovs in iovs_arr.iter() {
let buf = WasmPtr::<u8, M>::new(iovs.buf)
.slice(&memory, iovs.buf_len)
.map_err(mem_error_to_wasi)?
.access()
.map_err(mem_error_to_wasi)?;
let local_written = match handle.write(buf.as_ref()).await {
Ok(s) => s,
Err(_) if written > 0 => break,
Err(err) => return Err(map_io_err(err)),
};
written += local_written;
if local_written != buf.len() {
break;
}
}
Ok(written)

In fact, this also fixes #3814 since it is the same problem.

However, I won't be opening a PR for it since @mucinoab did most of the work so it's only fair that he gets the bounty.

@mucinoab
Copy link
Contributor

@ibrahim-akrab Nice work. I will confirm your findings and open the PR later today.

We can share the bounty.

Thanks.

@theduke
Copy link
Contributor

theduke commented Apr 24, 2023

Note that we do not want to just flush on every write, that would penalize performance for the common case.

@ibrahim-akrab
Copy link
Contributor

@theduke The thing is that every write syscall is a flushing write. You can't expect to tell the kernel to write to a file then it keeps that piece of information around until you flush it (somehow).

Writing then flushing is an abstraction layer provided by programming languages in order to have control over when the actual syscall is called.

That's why the wasmer implementation of fd_write should flush on every write in order to behave like native binaries.

@mucinoab
Copy link
Contributor

Writing then flushing is an abstraction layer provided by programming languages in order to have control over when the actual syscall is called.

For example, see this item in Rust:
https://doc.rust-lang.org/std/io/struct.BufWriter.html#

Or how the Python print has a flush argument.
https://docs.python.org/3/library/functions.html?highlight=flush#print

But I think the main problem to solve here is why the fflush / flush is not triggering anything. right?

@ibrahim-akrab
Copy link
Contributor

ibrahim-akrab commented Apr 24, 2023

It is triggering the write syscall at the specific point in program execution. And that's it's sole job, to make sure anything left in the buffer gets passed to a write syscall. So it's working as intended.

What's not working as intended is wasmers fd_write since at the end of a syscall, we should be sure buffer data is all written from memory to the specific file handler. Of course that's not what happens.

The current implementation leaves it to the compiler to decide when to write the data to the file, instead of choosing manually when to write to make it compatible with native binary's execution

@algora-pbc
Copy link

algora-pbc bot commented Apr 25, 2023

💡 mucinoab submitted a pull request that claims the bounty. You can visit your org dashboard to reward.
👉 @mucinoab: To receive payouts, sign up on Algora, link your Github account and connect with Stripe on your dashboard.

@ptitSeb ptitSeb added this to the v3.3 milestone Apr 25, 2023
@ptitSeb
Copy link
Contributor

ptitSeb commented Apr 27, 2023

@mucinoab we are waiting for some testing code. Please add it quickly or we will resolve the ticket ourself and the bounty will not be distributed.

@mucinoab
Copy link
Contributor

@ibrahim-akrab please take over, I am too busy right now to deal with the urgency.

Thanks.

@algora-pbc
Copy link

algora-pbc bot commented May 2, 2023

🎉🎈 @mucinoab has been awarded $150.00! 🎈🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants