-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: os.File 1.2x slower, no longer parallel, in tip compared to go18 (regression) #19586
Comments
This is almost certainly due to the poller. I'd also add that the new poller integration implicitly serializes access to the file descriptor, which has pretty dramatic performance effects on some applications. Unlike sockets, there are legitimate reasons to have multiple reads and/or writes execute concurrently. Here's some output I saw from
For concurrent calls to |
@philhofer thanks for your feedback. Let me also quote relevant bits about new poller locking here eventhough it was already described in #19563: ---- 8< ----
---- 8< ---- P.S. Someone please add "Performance" label to this issue. Thanks. |
Related to "no longer parallel" for UDP sockets: #17520 |
Voice from poor people: it would be a pity to let this regression go into Go1.9 |
I can speed up |
CL https://golang.org/cl/41670 mentions this issue. |
Since Pread/Pwrite specify a file offset, using incref is sufficient. This permits multiple Pread/Pwrite calls in parallel. Since Pread/Pwrite specify a file offset, it doesn't seem to make sense to use the poller for them, so don't. Updates #19586 Change-Id: I676be16bf519b9a45f8e6b1d991c44f10848bc11 Reviewed-on: https://go-review.googlesource.com/41670 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@ianlancetaylor thanks for feedback, patch and heads up on #7970. I'm not so handy with the poller code, but after briefly looking around there is some feedback on my side. I appologize in advance for maybe some mistakes:
Thanks beforehand, P.S. updated timings on my side (i7-6600U; Linux 4.9.18-1 (2017-03-30) x86_64) are:
so compared to go18 slowdown for ReadAt is ~ 10% (286ns over 260ns). Note also that before patch tip got even more slower compared to original 1.2x regression (ReadAt timing then was 323ns and now it is 376ns) - probably something else slowed down in the poller more. For the reference here are go18 and tip profiles for ReadAt benchmark: ---- 8< ---- (go18)
---- 8< ---- (tip with patch: 34ee8ec)
---- 8< ---- (tip without patch: 8ae60dc)
|
There is a comment saying that using the poller doesn't make sense for
The existing code works because it acquires the
The
I don't know how much that is a real problem. Can you open a separate issue for it? Thanks. |
I'm not sure there is anything else to be done here. I'll leave this open for a few days for comments. |
Thanks for feedback.
so eventhough NFS is not there
---- 8< ---- #include <sys/types.h>
#include <sys/stat.h>
#include <sys/epoll.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
int fd, epollfd;
off_t off;
struct epoll_event ev;
fd = open("/proc/kmsg", O_RDWR);
if (fd == -1) {
perror("open");
abort();
}
off = lseek(fd, 1, SEEK_CUR);
if (off == -1) {
perror("lseek");
abort();
}
epollfd = epoll_create1(0);
if (epollfd == -1) {
perror("epoll_create1");
abort();
}
ev.events = EPOLLIN;
ev.data.fd = fd;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, &ev) == -1) {
perror("epoll_ctl: add");
abort();
}
}
---- 8< ----
static const struct file_operations proc_reg_file_ops = {
.llseek = proc_reg_llseek,
.read = proc_reg_read,
.write = proc_reg_write,
.poll = proc_reg_poll,
.unlocked_ioctl = proc_reg_unlocked_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = proc_reg_compat_ioctl,
#endif
.mmap = proc_reg_mmap,
.get_unmapped_area = proc_reg_get_unmapped_area,
.open = proc_reg_open,
.release = proc_reg_release,
};
So once again this shows that being seekable and being pollable is not generally mutually exclusive. From this point of view I would check the return from https://github.com/golang/go/blob/516e6f6d/src/os/file_unix.go#L173 and put that information into some flag to later know whether file descriptor was set to be non-blocking / registered to epoll. Thanks again, |
For the reference here is what happens with Go on package main
import "os"
func main() {
f, err := os.Open("/proc/kmsg")
if err != nil {
panic(err)
}
buf := make([]byte, 128)
_, err = f.ReadAt(buf, 0)
if err != nil {
panic(err)
}
}
|
Thanks for the example. |
By the way, as far as I can tell |
CL https://golang.org/cl/42144 mentions this issue. |
@ianlancetaylor thanks for feedback. I did not noticed #include <sys/stat.h>
#include <sys/epoll.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#define PATH "/proc/self/mounts"
off_t xlseek(int fd, off_t off, int whence);
int readall(int fd, char *buf, int size);
void preadall(int fd, char *buf, int size);
int main() {
int fd, epollfd, n;
struct epoll_event ev;
fd = open(PATH, O_RDONLY);
if (fd == -1) {
perror("open");
abort();
}
xlseek(fd, 0, SEEK_CUR);
fprintf(stderr, "%s: seekable\n", PATH);
epollfd = epoll_create1(0);
if (epollfd == -1) {
perror("epoll_create1");
abort();
}
ev.events = EPOLLIN;
ev.data.fd = fd;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, &ev) == -1) {
perror("epoll_ctl: add");
abort();
}
fprintf(stderr, "%s: pollable\n", PATH);
#define N 80 // bytes ro read
char buf1[N], buf2[N];
n = readall(fd, buf1, N);
// XXX vvv ignores \0 in buf1
printf("---- 8< ---- (%d bytes via read(2))\n%.*s\n---- 8< ----\n", n, n, buf1);
preadall(fd, buf2, n);
printf("---- 8< ---- (%d bytes via pread(2))\n%.*s\n---- 8< ----\n", n, n, buf2);
}
off_t xlseek(int fd, off_t off, int whence) {
off = lseek(fd, off, whence);
if (off == -1) {
perror("lseek");
abort();
}
return off;
}
// readall reads max size bytes of fd file via read(2) untill either all is read or eof
int readall(int fd, char *buf, int size) {
int haveread, n;
xlseek(fd, 0, SEEK_SET);
haveread = 0;
while (size > 0) {
n = read(fd, buf, size);
if (n == -1) {
perror("read");
abort();
}
if (n == 0) {
break; // eof
}
haveread += n;
size -= n;
buf += n;
}
return haveread;
}
// preadall reads size bytes of fd file via pread(2) one byte at a time backwards
void preadall(int fd, char *buf, int size) {
off_t off;
int n;
for (off = size - 1; off >= 0; off--) {
n = pread(fd, &buf[off], 1, off);
if (n == -1) {
perror("pread");
abort();
}
if (n == 0) {
fprintf(stderr, "@%ld: unexpected EOF", off);
}
}
}
x.log:
however as you say it "always succeed" in a sense that EAGAIN is never returned - the polling is there hooked to triger "readable" only when mounting table changes - e.g. on mount/umount/etc. I've grasped through whole linux/fs/ to check where/how I think part of the story here is that "ready for read" and "ready for write" semantics of poll is tied to streams - the "where" in questions above is implicitly "at the end". That's probably why networked filesystems such as NFS has no good way to tell callers that previously-reported EAGAIN region is now ready to e.g. read - the opration can return EAGAIN if O_NONBLOCK is set but with poll API there is no way to subscribe to it. So they this way they actually never return EAGAIN. I guess the way to handle blocking "regular" IO would be to have some kind of IO requests and after submitting it to kernel be able to subscribe for completion - either waiting directly or via passing it somehow to epoll. I'm not sure this functionality currently exists on Linux though... So about our original topic - EAGAIN being returned to ReadAt caller I think:
Sorry maybe for the noise and not coming up with the doc patch, |
Ping @ianlancetaylor |
Closing this issue per comment on April 25. My conclusion from the follow-on discussion about /proc/kmsg is that nothing need be done. |
Please answer these questions before submitting your issue. Thanks!
What did you do?
Hello up there. During discussion in #19563 it was found that in go tip e.g.
os.File.ReadAt
became ~ 1.2x slower compared to go18. Let me quote #19563 (comment):(updated benchmark, diff)
...
The differencein between
os.File.ReadAt(len(buf)=1)
andsyscall.Pread(len(buf)=1)
was ~ 10ns during go14 - go18. With go tip it raises to ~70ns, so it is indeed 60ns regression foros.File.ReadAt
.What did you expect to see?
os.File.ReadAt
and friends works the same speed or faster compared to go18.What did you see instead?
os.File.ReadAt
became slower by 60ns compared to go18.Does this issue reproduce with the latest release (go1.8)?
No.
This is probably related to recently introduced runtime poller.
Thanks beforehand,
Kirill
/cc @ianlancetaylor
System details
The text was updated successfully, but these errors were encountered: