-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_spawn_input fails randomly #1320
Comments
testing with kernel 5.11, i've repeated the test for 613 times. still not able to reproduce it. [0] https://discuss.circleci.com/t/linux-machine-executor-images-2022-january-q1-update/42831 |
|
now that a4e1508 has been merged. i am closing this issue. will reopen it if this failure surfaces again. |
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. `spawn_process()` is still marked experimental. so as an intermediate solution, let's just drop this test. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because test_spawn_input fails randomly, and we are not able to fix it in a timely manner. spawn_process() is still marked experimental. so as a temporary measure, let's mark the tests `test_spawn_input` as known failures. so up to 3 failures are allowed when testing it. this should help to unblock some PRs. See also scylladb#1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
… test_spawn_input' from Kefu Chai * testing/entry_point: do not support Boost <= 1.58 * testing: set test name and file in constructor * testing: do not use global vector for collecting tests * testing: add boost unit test decorator support * tests: do not fail spawn_test if up to 3 tests fail See also #1320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #1498 * github.com:scylladb/seastar: tests: do not fail spawn_test if less or equal to 3 tests fail testing: add boost unit test decorator support testing: do not use global vector for collecting tests testing: set line number for each test testing: set test name and file in constructor testing/entry_point: do not support Boost <= 1.58
Hi, kefu. I'm also curious about the 'Broken pipe' issue and have made some investigations last weekend. Since the error is caused by return stdin.write(text).then([&stdin] {
return stdin.flush();
}).finally([&stderr]() {
return stderr.read_up_to(1024).then([](temporary_buffer<char> err) {
seastar_logger.error("read error from /bin/cat: {}", std::string(err.begin(), err.size()));
return make_ready_future<>();
});
}); And luckily I reproduced the issue on my machine(thank God!): % for i in `seq 65535`; do ./build/debug/tests/unit/spawn_test | grep -i 'Broken pipe' && break; done;
WARNING: debug mode. Not for benchmarking or production
INFO 2023-03-13 02:56:52,063 seastar - Reactor backend: linux-aio
WARN 2023-03-13 02:56:52,084 [shard 6] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 0] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 5] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 4] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 2] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 9] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 11] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 10] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 13] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 7] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 1] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 8] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 12] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 3] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 15] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-03-13 02:56:52,084 [shard 14] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO 2023-03-13 02:56:52,118 [shard 0] seastar - Created fair group io-queue-0, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000
INFO 2023-03-13 02:56:52,119 [shard 0] seastar - IO queue uses 0.75ms latency goal for device 0
INFO 2023-03-13 02:56:52,120 [shard 0] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
INFO 2023-03-13 02:56:52,120 [shard 0] seastar - Created io queue dev(0) capacities: 512:2000:2000 1024:3000:3000 2048:5000:5000 4096:9000:9000 8192:17000:17000 16384:33000:33000 32768:65000:65000 65536:129000:129000 131072:257000:257000
ERROR 2023-03-13 02:56:52,264 [shard 0] seastar - read error from /bin/cat: /bin/cat: -: Resource temporarily unavailable
/data/Workspace/iSoft/seastar/tests/unit/spawn_test.cc(111): error: in "test_spawn_input": failed to write to stdin: std::system_error (error system:32, Broken pipe)
*** No errors detected
And after searching google, I found several similar issue reports and this one gave me a hint:
We did make pipe fds non-blocking! Maybe we should set fds used in subprocess blocking by default and leave the right to enable non-blocking to the child itself? int nonblocking = 0;
std::get<pipefd_read_end>(cin_pipe).ioctl(FIONBIO, &nonblocking);
std::get<pipefd_write_end>(cout_pipe).ioctl(FIONBIO, &nonblocking);
std::get<pipefd_write_end>(cerr_pipe).ioctl(FIONBIO, &nonblocking); |
it turns out tools like "cat" expect fd opened blocking mode. while the pipe fds are always created in non-blocking mode. so in order to appease these tools, let's set the fds passed to the spawned process to blocking mode. Fixes scylladb#1320 Signed-off-by: Jianyong Chen <baluschch@gmail.com> Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
@balusch hi Jianyong, thank you very much for looking into this issue! both your analysis and fix make sense to me. so i took the liberty of creating a PR based on your proposal -- the only contribution from me is to verify the fix and adapt your reasoning to a commit message. could you help review it? |
it turns out tools like "cat" expect fd opened blocking mode. while the pipe fds are always created in non-blocking mode. so in order to appease these tools, let's set the fds passed to the spawned process to blocking mode. Fixes scylladb#1320 Signed-off-by: Jianyong Chen <baluschch@gmail.com> Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
* pass fds opened in blocking mode to spawned process * do not tolerate test failures in test_spawn_input it turns out tools like "cat" expect fd opened blocking mode. while the pipe fds are always created in non-blocking mode. so in order to appease these tools, let's set the fds passed to the spawned process to blocking mode. Fixes scylladb#1320 Signed-off-by: Jianyong Chen <baluschch@gmail.com> Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
...
WARN 2023-03-15 11:16:33,958 [shard 1] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARNING: unable to mbind shard memory; performance may suffer: Operation not permitted
WARN 2023-03-15 11:16:34,002 [shard 15] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
random-seed=2719424098
ERROR 2023-03-15 11:16:34,408 [shard 0] seastar - read error from /bin/cat: /bin/cat: -: Resource temporarily unavailable
/data/Workspace/iSoft/seastar/tests/unit/spawn_test.cc(108): error: in "test_spawn_input": failed to write to stdin: std::system_error (error system:32, Broken pipe)
/data/Workspace/iSoft/seastar/tests/unit/spawn_test.cc(117): error: in "test_spawn_input": check sstring(echo.get(), echo.size()) == text has failed [ != hello world
]
*** 2 failures are detected in the test module "Master Test Suite" also reproduced on my machine after an era, but fotunatelly we have some clues this time -- yes, the same error message as before, so I think the problem is still caused by the nonblockingness of pipe fds used by the child, and we just didn't fix it correctly. I start to guess maybe the fds used by child are still nonblocking, which I cannot check since /bin/cat is a utility without source code, so I write my own version: // mycat.c
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
static void test_nonblocking();
int
main(int argc, char **argv)
{
test_nonblocking();
return 0;
}
static void
test_nonblocking()
{
#define TEST(desc, fd) \
do { \
int status = fcntl(fd, F_GETFL, 0); \
assert(status >= 0); \
if (status & O_NONBLOCK) { \
printf(#desc " is nonblocking\n"); \
} else { \
printf(#desc " is blocking\n"); \
} \
} while (0)
TEST(stdin, STDIN_FILENO);
TEST(stdout, STDOUT_FILENO);
TEST(stderr, STDERR_FILENO);
#undef TEST
} it simply prints the blockingness of stdin/stdout/stderr through stdout, without echoing data back. And replaced /bin/cat in /data/Workspace/iSoft/seastar/tests/unit/spawn_test.cc(117): error: in "test_spawn_input": check sstring(echo.get(), echo.size()) == text has failed [stdin is non != hello world although the message read from stdout of the child is incomplete, we still know it's nonblocking! Finally I found the reason: we misused the There are lots of versions of int ioctl(int request) {
return ioctl(request, 0);
}
int ioctl(int request, int value) {
int r = ::ioctl(_fd, request, value);
throw_system_error_on(r == -1, "ioctl");
return r;
}
int ioctl(int request, unsigned int value) {
int r = ::ioctl(_fd, request, value);
throw_system_error_on(r == -1, "ioctl");
return r;
}
template <class X>
int ioctl(int request, X& data) {
int r = ::ioctl(_fd, request, &data);
throw_system_error_on(r == -1, "ioctl");
return r;
}
template <class X>
int ioctl(int request, X&& data) {
int r = ::ioctl(_fd, request, &data);
throw_system_error_on(r == -1, "ioctl");
return r;
} what we need is the last two, and So we could replace previous fix with std::get<pipefd_read_end>(cin_pipe).template ioctl<int>(FIONBIO, 0);
std::get<pipefd_write_end>(cout_pipe).template ioctl<int>(FIONBIO, 0);
std::get<pipefd_write_end>(cerr_pipe).template ioctl<int>(FIONBIO, 0); it should work. |
actually, i was reading cat's source code couple months ago. but ended up being clueless and gave up: https://github.com/coreutils/coreutils/blob/master/src/cat.c |
@balusch hi Jianyolng, could you prepare a patch? i think it's you who did all the heavy lift. |
Sure, my pleasure. |
thank you! |
I intended to read it last weekend, but suddenly thought that cat may leave us some information through stderr. 😃 |
…rom Jianyong Chen It turns out tools like "cat" expect fd opened blocking mode, while the pipe fds are always created in non-blocking mode. So in order to appease these tools, let's set the fds passed to the spawned process to blocking mode. Fixes #1320 Signed-off-by: Kefu Chai kefu.chai@scylladb.com Signed-off-by: Jianyong Chen <baluschch@gmail.com> Closes #1555 * github.com:scylladb/seastar: spawn_test: fix /bin/cat stuck in reading input. reactor: pass fd opened in blocking mode to spawned process
not able to reproduce this failure locally with
The text was updated successfully, but these errors were encountered: