Skip to content

runtime: epoll scalability problem with 192 core machine and 1k+ ready sockets #65064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
prattmic opened this issue Jan 11, 2024 · 48 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux Performance Scalability Issues related to runtime/application scalability
Milestone

Comments

@prattmic
Copy link
Member

prattmic commented Jan 11, 2024

Split from #31908 (comment) and full write-up at https://jazco.dev/2024/01/10/golang-and-epoll/.

tl;dr is that a program on a 192 core machine with >2500 sockets and with >1k becoming ready at once results in huge costs in netpoll -> epoll_wait (~65% of total CPU).

Most interesting is that sharding these connections across 8 processes seems to solve the problem, implying some kind of super-linear scaling.

That the profile shows the time spent in epoll_wait itself, this may be a scalability problem in the kernel itself, but we may still be able to mitigate.

@ericvolp12, some questions if you don't mind answering:

  • Which version of Go are you using? And which kernel version?
  • Do you happen to have a reproducer for this problem that you could share? (Sounds like no?)
  • On a similar note, do you have a perf profile of this problem that shows where the time in the kernel is spent?
  • The 128 event buffer size is mentioned several times, but it is not obvious to me that increasing this size would actually solve the problem. Did you try increasing the size and see improved results?

cc @golang/runtime

@prattmic prattmic added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 11, 2024
@prattmic prattmic added this to the Backlog milestone Jan 11, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 11, 2024
@prattmic
Copy link
Member Author

There is a small chance that #56424 is related, though it seems unlikely as that was at a much smaller scale.

@ericvolp12
Copy link

ericvolp12 commented Jan 11, 2024

  • Which version of Go are you using? And which kernel version?

We're running on the golang:1.21-bullseye docker image base which is currently using: go version go1.21.6 linux/amd64, kernel version 5.15.0-91-generic on Ubuntu

  • Do you happen to have a reproducer for this problem that you could share? (Sounds like no?)

We don't have a reproducer for this problem right now unfortunately, but our suspicion is that it should be easy to replicate by serving or making hundreds of thousands of fast network requests in a go application using TCP.

  • On a similar note, do you have a perf profile of this problem that shows where the time in the kernel is spent?

We don't have a perf profile unfortunately, most of our discovery was done via pprof profiles from the running binary and testing different configurations (4, then 8 containers per host).

  • The 128 event buffer size is mentioned several times, but it is not obvious to me that increasing this size would actually solve the problem. Did you try increasing the size and see improved results?

We did not try increasing the buffer size, it wasn't apparent there was a way to do that without running a custom build of Go and at the time running more than one container was a more accessible solution for us.

Thanks for looking into this, it was definitely a interesting thing to find in the wild!

@whyrusleeping
Copy link

whyrusleeping commented Jan 11, 2024

For some more context, the EpollWait time in the profile was 2800 seconds on a 30 second profile.

Also I don't necessarily think that the epoll buffer itself is the problem, rather just how epoll works under the hood with thousands of 'ready' sockets and hundreds of threads.

The application under load had around 3500 open sockets, http2 clients making requests to our grpc service on one end and us making requests to scyllaDB on the other.

@prattmic
Copy link
Member Author

Thanks for the details! I'll try to write a reproducer when I have some free time, not sure when I'll get to it.

it wasn't apparent there was a way to do that without running a custom build of Go

Indeed, you'd need to manually modify the runtime. Note that is possible to simply edit the runtime source in GOROOT and rebuild your program (no special steps required for the runtime, it is treated like any other package). But if you build in a Docker container it is probably a pain to edit the runtime source.

@prattmic
Copy link
Member Author

Some thoughts from brainstorming for posterity:

My best theory at the moment (though I'd really like to see perf to confirm) is that ~90 threads are calling epoll_wait at once (probably at this non-blocking netpoll: https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=3230;drc=dcbe77246922fe7ef41f07df228f47a37803f360). The kernel has a mutex around the entire copy-out portion of epoll_wait, so there is probably a lot of time waiting for the mutex. If that is the case, some form of rate-limiting on how many threads make the syscall at once may be effective. N.B. that this non-blocking netpoll is not load-bearing for correctness, so occasionally skipping it would be OK.

@whyrusleeping
Copy link

Yeah, it was the netpoll call inside findRunnable (though i didnt have my source mapping set up at the time to confirm the exact line numbers).
I overwrite the profile i took from the degerate case unfortunately, if helpful we can probably reorient things back down to a single process per machine and run some tests with perf.

I've also got a spare test machine with the same CPU i can use to try out a repro test case as well.

@sschepens
Copy link

is go using the same epoll instance accross all threads? that might be the underlying problem, most high-throughput applications (nginx, envoy, netty) create several instances (usually one per thread together with an event loop) and connections get distributed to all epoll instances some way or another.

@panjf2000
Copy link
Member

panjf2000 commented Jan 12, 2024

is go using the same epoll instance accross all threads? that might be the underlying problem, most high-throughput applications (nginx, envoy, netty) create several instances (usually one per thread together with an event loop) and connections get distributed to all epoll instances some way or another.

Good point! And to answer your question, yes, Go has been using the single (and global) epoll/kqueue/poll instance internally since the day Go netpoll was introduced. I actually had this concern for a few years, but never got a chance to spot that kind of performance bottleneck emerge. What I had in mind is that we can make a transition from single epoll instance to per-P epoll instances, or just multiple global epoll instances simply, which could also help.

From where I stand, I reckon that refactoring the current epoll from a single instance to multiple instances would require much less work than introducing io_uring. What is more, given the current Go codebase, io_uring is better suited for file I/O than for network I/O. Oh boy, I can already imagine now how many obstacles we'll have to go through before io_uring is implemented for network I/O eventually, and also transparently.

To sum up, multiple epoll instances should be able to gain sufficient credits for the performance boost of network I/O, and in consideration of the complexity from introducing io_uring for network I/O, I think the former is more feasible at this stage.

@sschepens
Copy link

sschepens commented Jan 12, 2024

using multiple epoll instances would mean that connections or fds would now be bound to a single thread? does this means that it could be possible for connection imbalances to happen where some threads could be handling many long lived connections while others be mostly idle?

@panjf2000
Copy link
Member

panjf2000 commented Jan 13, 2024

using multiple epoll instances would mean that connections or fds would now be bound to a single thread? does this means that it could be possible for connection imbalances to happen where some threads could be handling many long lived connections while others be mostly idle?

This is one of the potential issues we may encounter and need to resolve if we decide to introduce multiple epoll instances for Go runtime. But I don't think it's going to be our big concern cuz there are ways for us to mitigate that, for instance, the work-stealing mechanism, or just to put surplus tasks in the global run queue.

I actually drafted a WIP implementation of multiple epoll/kqueue/poll instances a long time ago on my local computer, and I can take on this if we eventually decide to introduce multiple netpollers after the root cause of this issue has been revealed.

@panjf2000 panjf2000 added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 13, 2024
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 13, 2024
@panjf2000 panjf2000 added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 13, 2024
@gopherbot gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 13, 2024
@errantmind
Copy link

errantmind commented Jan 13, 2024

A casual observation (not go specific): one reason epoll doesn't scale well when a single epoll instance is shared across threads is the file descriptor table, which is typically shared across the process. This is one of the reasons why, say, 8 separate processes usually performs better than a single process with 8 threads. The impact is present both with multiple epoll instances (per thread), or a single epoll instance shared across threads. The way to circumvent this is to unshare (syscall) the file descriptor table across threads upon thread creation, then create an epoll instance per thread. This yields similar performance to a multi process approach (within 1% in my experience). After that you can distribute the work however you want, maybe with SO_REUSEPORT. Also, be careful unsharing the file descriptor table, it is not appropriate for all situations.

Side note, if you are sharing an epoll instance across threads you should use edge triggered to avoid all threads from being woken up, most unnecessarily.

This is my experience anyway when using a thread per core model, although the principle would apply regardless of the number of threads. I don't know anything about go internals so I'll leave it there.

@bwerthmann
Copy link

bwerthmann commented Jan 15, 2024

I don't want to derail this issue, let me know if I should move this to a separate bug...

We are seeing a similar issue on a system with 128 cores, we're only reading from 96 Unix Sockets, 1 per goroutine. Go was spending much time in netpoll -> epoll_wait and perf top reported much of time in the kernel in osq_lock.

I'm looking for the profiles from the Go App, in the mean time I can share that we reproduced this issue with a simple socat invocation:

image

I wrote a workaround that does not invoke netpoll at all, instead it just makes raw syscalls and throughput improved by 5x (which is the next bottleneck in the App, also related to Mutex being slow). My microbenchmark with raw syscalls just reading from unix sockets performed >10x in terms of bandwidth with the 96 producers.

Let me know if there's anything I can do to help.

@bwerthmann
Copy link

These kernel patches may be of interest:
locking/osq_lock: Fix false sharing of optimistic_spin_node in osq_lock, may not be accepted yet?

https://lore.kernel.org/lkml/20230615120152.20836-1-guohui@uniontech.com/

@panjf2000
Copy link
Member

I wrote a workaround that does not invoke netpoll at all, instead it just makes raw syscalls and throughput improved by 5x

Just to make sure I don't misread what said, you achieved that by using raw syscalls of socket(), bind(), listen(), connect(), read() write(), etc. instead of the APIs provided by std net, right?
@bwerthmann

@bwerthmann
Copy link

I wrote a workaround that does not invoke netpoll at all, instead it just makes raw syscalls and throughput improved by 5x

Just to make sure I don't misread what said, you achieved that by using raw syscalls of socket(), bind(), listen(), connect(), read() write(), etc. instead of the APIs provided by std net, right?
@bwerthmann

Correct. I'll ask today if I can share an example.

@valyala
Copy link
Contributor

valyala commented Jan 17, 2024

I think it would be great if Go runtime could maintain a separate epoll file descriptor (epfd) per each P. Then every P could register file descriptors in its own local epfd and call epoll() on it when its local list of goroutines ready to run becomes empty and it needs to find runnable goroutine. This scheme has the following benefits:

  • Goroutines, which work with network, will tend to stay on the same P, since the file descriptors created by the goroutine are registered in P-local epfd. Even if the goroutine migrates to another P for some reason, it will migrate to the original P after the next network IO. This improves locality of data accessed by the goroutine, so it remains for longer in P-local CPU caches. This should improve the overall performance, since access to local CPU caches is usually faster than access to shared memory.
  • This should improve scalability of epoll() calls, since every P will poll its own epfd, thus removing bottlenecks related to access synchronization to shared epfd in kernel space.

Such a scheme may result in imbalance of goroutines among P workers, if a single goroutine creates many network connections (e.g. server accept loop). Then the Go scheduler will migrate all the goroutines, which make IO on these connections, to the original P where the original goroutine created all these network connections. This can be solved by periodic even re-distribution of the registered network connections among P-local epfds. For example, if P cannot find ready to run goroutines in local queue and in local epfd, then it can steal a few network connections from the busiest P, to de-register them from that P's epfd and then to register them in local epfd. The busiest P can be determined from some rough per-P CPU usage stats.

@aclements
Copy link
Member

I agree that most likely we need multiple epoll FDs, with some sort of affinity.

@bwerthmann , since you're able to get perf profiles, could you get one with perf record -g? I'd love to see where the osq_lock call is coming from to confirm the hypothesis.

It would be really helpful if someone could create a benchmark that reproduces this issue. If it can be done with only 96 UNIX domain sockets, it may not even be especially hard.

@aclements
Copy link
Member

If we want to go deep here, it might even be possible for the Go scheduler to become RX queue aware using sockopts like SO_INCOMING_CPU or SO_INCOMING_NAPI_ID. I suspect we can do a lot better without bringing in that complexity, but it's an interesting opportunity to consider.

@bwerthmann
Copy link

bwerthmann commented Jan 23, 2024

@aclements profile as requested. Taken with go1.21.5 on a 128 core machine:
image

@ianlancetaylor
Copy link
Member

io_uring is #31908.

@harshavardhana
Copy link
Contributor

Is anybody interested in seeing whether https://go.dev/cl/564197 fixes the problem? To be clear, I'm not going to submit it unless I have some reason to think that it helps. Thanks.

I have some cycles to test this out, as we have a reproducer of sort that generates random latencies.

harshavardhana added a commit to harshavardhana/minio that referenced this issue Jul 29, 2024
epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.

upstream issue golang/go#65064
seems to be a deeper problem, haven't yet tried the fix
provide in this issue but however this change without
changing the compiler helps. Of course this is a workaround.
harshavardhana added a commit to harshavardhana/minio that referenced this issue Jul 29, 2024
epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.

upstream issue golang/go#65064
seems to be a deeper problem, haven't yet tried the fix
provide in this issue but however this change without
changing the compiler helps. Of course this is a workaround.
harshavardhana added a commit to harshavardhana/minio that referenced this issue Jul 29, 2024
epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.

upstream issue golang/go#65064
seems to be a deeper problem, haven't yet tried the fix
provide in this issue but however this change without
changing the compiler helps. Of course this is a workaround.
harshavardhana added a commit to harshavardhana/minio that referenced this issue Jul 29, 2024
epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.

upstream issue golang/go#65064
seems to be a deeper problem, haven't yet tried the fix
provide in this issue but however this change without
changing the compiler helps. Of course this is a workaround.
harshavardhana added a commit to harshavardhana/minio that referenced this issue Jul 29, 2024
epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.

upstream issue golang/go#65064
seems to be a deeper problem, haven't yet tried the fix
provide in this issue but however this change without
changing the compiler helps. Of course this is a workaround.
harshavardhana added a commit to minio/minio that referenced this issue Jul 29, 2024
epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.

upstream issue golang/go#65064
It seems to be a deeper problem; haven't yet tried the fix
provide in this issue, but however this change without
changing the compiler helps. 

Of course, this is a workaround for now, hoping for a
more comprehensive fix from Go runtime.
@prattmic prattmic added the Scalability Issues related to runtime/application scalability label Dec 23, 2024
@amwolff
Copy link
Contributor

amwolff commented Jan 22, 2025

okay, I found a reliable though somewhat clunky reproducer and tested with EPYC 7443P and EPYC 9754 and go1.23.5 and go1.23.5+https://go.dev/cl/564197.

Reproducer: random-socket-reader

EPYC 9754 (128 cores, 256 threads)

$ uname -srvmpio
Linux 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

go1.23.5

EPYC9754_go1.23.5.png

flamegraph image | profile (pprof.host | download)

go1.23.5+564197

EPYC9754_go1.23.5+564197.png

flamegraph image | profile (pprof.host | download)

EPYC 7443P (24 cores, 48 threads)

$ uname -srvmpio
Linux 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

go1.23.5

EPYC7443P_go1.23.5.png

flamegraph image | profile (pprof.host | download)

go1.23.5+564197

EPYC7443P_go1.23.5+564197.png

flamegraph image | profile (pprof.host | download)

testing "process"

I haven't done this in a while now, so if anyone wants to double-check if I compiled go1.23.5+564197 correctly, here's a list of steps:

$ git clone https://go.googlesource.com/go
$ cd go/
$ git checkout go1.23.5
$ git fetch https://go.googlesource.com/go refs/changes/97/564197/1 && git cherry-pick FETCH_HEAD
$ cd src/
$ ./make.bash
… (go to https://github.com/amwolff/swiss/tree/main/cmd/golang/go/issue65064/random-socket-reader and build the binary using $GOROOT/bin/go build)

It seems like CL 564197 does help in some way, but I'm unsure whether the CPU time spent on locks in findRunnable is now just a proxy for latency and that the program waits for more work. The CPU usage is ~30-40% less while running the updated program on EPYC 9754 and more time spent on the task indicates we don't poll as much, so even though we're slower, we have the capacity to take on more work (?).

$ time ./random-socket-reader_go1.23.5 1000 100 10485760

real    0m24.475s
user    1m37.741s
sys     95m7.423s
$ time ./random-socket-reader_go1.23.5+564197 1000 100 10485760

real    0m33.903s
user    7m28.396s
sys     86m5.832s

@prattmic @ianlancetaylor if it would be helpful, I'm happy to give you access to the test machines. Also let me know if these pprof profiles are enough or if perf profiles going down to the kernel calls are more helpful here.

@amwolff
Copy link
Contributor

amwolff commented Jan 23, 2025

I was able to test CL 564197 with my colleagues today on one of our prod services. Here's the result:

Before:

Image

(30s) pprof.host

After:

Image

(30s) pprof.host | (90s) pprof.host

It looks like that CL fully alleviates the epoll issue. I don't know what to think about the runtime still taking up 40% of the CPU time, but there's a good chance this is just an inefficiency of our code and excessive allocations, as it looks like it's mostly GC.

While it seems effective here, I'm still concerned about the potential side effect of slower network I/O. It likely won't matter for most cases, aside from highly intensive I/O workloads like the synthetic test I mentioned earlier, but I'm unsure how to confirm it won't impact Go programs more broadly.

@prattmic
Copy link
Member Author

@amwolff Thank you for the reproducer and production experiment! If you don't mind, could you share a bit more about the scale of this production workload? How many cores/GOMAXPROCS? How many QPS it is handling (assuming that is a relevant metric)?

@amwolff
Copy link
Contributor

amwolff commented Jan 23, 2025

@prattmic Yes, of course. To explain the scale of the production workload a little better, a bit more context: one of services we run is a program that takes the inbound traffic, which is usually a large file(s), processes it and sends the processed data (expended by a constant factor) to many smaller servers we call storage nodes. In short: an upload to a machine like the one I took a profile on results in many (100+) connections and uploads to other machines in different locations. I suspect this characteristic is why the scalability problem shows up so well there.

Profiles in my previous comment are from a EPYC 9754 (128 cores, 256 vCPUs=GOMAXPROCS) machine, which we're testing in one of our locations. We typically run a fleet of 7443Ps (24 cores) but would love to see Go scale to higher core count ones for various reasons.

Some data from the time of the experiment:

Network traffic:

Image

Network packets:

Image

QPS:

Image

CPU:

Image

RAM:

Image

Profiles were taken somewhere around the peaks. Let me know if you would like to see anything additional.

For comparison, here's a profile from a 7443P machine (without CL 564197): https://pprof.host/vc40/flamegraph

@prattmic
Copy link
Member Author

Thanks @amwolff, that's great. The GOMAXPROCS=24 profile is a nice comparison as well.

I don't know what to think about the runtime still taking up 40% of the CPU time, but there's a good chance this is just an inefficiency of our code and excessive allocations, as it looks like it's mostly GC.

Actually, most of this is the scheduler, not the GC. The GC is primary in runtime.gcBgMarkWorker (see the GC Guide for tips on identifying the GC in profiles), which is only 6.6% of time (vs 4.7% in the GOMAXPROCS=24 profile, so still up a little).

The time under runtime.schedule (~32%) is the Go scheduler. You probably got tripped up by runtime.(*gcControllerState).findRunnableGCWorker, which is related to scheduling the GC workers. This poor scaling is tracked by #68399. Per #68399 (comment), we think we've actually fixed this primary problem with https://go.dev/cl/602477, which is in 1.24. So I recommend trying out 1.24rc2 + https://go.dev/cl/564197, or just patch https://go.dev/cl/602477 into 1.23.

Other parts under runtime.schedule, such as runtime.lock and runtime.resetSpinning look like scaling issues as well. That said, even the GOMAXPROCS=24 profile is spending a lot of time in the scheduler, so I suspect that this application is entering the scheduler very often.

If you are willing to share a short (~1-5s) execution trace from the application that could better show the scheduling behavior. You can email me privately if you don't want to share publicly.

@amwolff
Copy link
Contributor

amwolff commented Jan 29, 2025

Thanks so much @prattmic. I compiled our application with 1.24rc2+564197 and compared it with 1.23.5+564197 and 1.23.5 compilations in a brief load test. Some observations:

  1. We can immediately serve 2-3x more traffic with 1.24rc2+564197 than with 1.23.5
    1. With the average CPU usage being 60-70% (thoughts on this in the next paragraph)
  2. GC pauses with default GC settings become much more visible
    1. However, this is likely expected and can be tuned and/or optimized

One phenomenon we couldn't explain during the load test is we couldn't get past certain CPU usage and GBit/s of traffic served, but since our testing setup is still a bit ad-hoc, afterward I realized we started to hit some limits (like conntrack's) that would explain all of that. We need to review those, make sure they won't be limiting factors and retest again, but 1.24rc2 + CL 564197 is immediately useful to us. Thank you!

I emailed you with a spreadsheet of traces and profiles. For others, this is the matrix that I sent:

Image

For now, I will limit myself to just posting 1.24rc2+564197 CPU profiles here:

Image

@prattmic
Copy link
Member Author

Thanks, this is very useful!

Here are two views from ~5ms of the traces.

go1.23.5:

Image

go1.24rc2+564197

Image

In the 1.23.5 trace, all the blank spaces are times when that P is in the scheduler instead of running a goroutine, presumably spending way way too much time in epoll. The trace is actually very interesting because you can see each P run a clump of short-running goroutines between each gap. I think that is the P running all of the things in its local run queue before it needs to go look for more work.

There are lots of goroutines that run for very short periods (1-5us). Those are probably contributing to the epoll probably by entering the scheduler very frequently.

The go1.24rc2+564197 trace looks much better. Densely packed and spending most time running goroutines, even though there are still very short running goroutines around.

From the go1.24rc2+564197 profiles above, I would say that 1.24 + CL 564197 completely solves the scheduling scalability issues for this application. Only 6% of time is in runtime.schedule, which is in range for a normal Go program.

There is definitely still a scalability problem in the GC around managing work buffers. I think that is most related to #21056, so I will post discussion of those symptoms there.

amwolff pushed a commit to amwolff/go that referenced this issue Feb 4, 2025
For golang#65064

Change-Id: Ifecd7e332d2cf251750752743befeda4ed396f33
amwolff pushed a commit to amwolff/go that referenced this issue Feb 4, 2025
For golang#65064

Change-Id: Ifecd7e332d2cf251750752743befeda4ed396f33
amwolff pushed a commit to amwolff/go that referenced this issue Feb 6, 2025
For golang#65064

Change-Id: Ifecd7e332d2cf251750752743befeda4ed396f33
@harshavardhana
Copy link
Contributor

Will CL 564197 be merged anytime soon? Will this be backported to go1.23.x?

@prattmic
Copy link
Member Author

We will likely merge CL 564197 (or something similar) for Go 1.25.

I don't expect that we will backport to 1.23 or 1.24 [1], as this is primarily a performance improvement.

[1] I know 1.24 isn't released yet, but it is mere days away!

@harshavardhana
Copy link
Contributor

I don't expect that we will backport to 1.23 or 1.24 [1], as this is primarily a performance improvement.

Okay we will keep back relevant backports for this locally.

phuslu added a commit to phuslu/go that referenced this issue Feb 13, 2025
harshavardhana added a commit to minio/minio that referenced this issue Mar 17, 2025
epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.

upstream issue golang/go#65064
It seems to be a deeper problem; haven't yet tried the fix
provide in this issue, but however this change without
changing the compiler helps. 

Of course, this is a workaround for now, hoping for a
more comprehensive fix from Go runtime.
amwolff pushed a commit to amwolff/go that referenced this issue Mar 31, 2025
For golang#65064

Change-Id: Ifecd7e332d2cf251750752743befeda4ed396f33
amwolff pushed a commit to amwolff/go that referenced this issue Apr 3, 2025
For golang#65064

Change-Id: Ifecd7e332d2cf251750752743befeda4ed396f33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux Performance Scalability Issues related to runtime/application scalability
Projects
Development

No branches or pull requests