Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fasthttp instead of net/http #37

Draft
wants to merge 66 commits into
base: master
Choose a base branch
from
Draft

Conversation

kkty
Copy link

@kkty kkty commented Sep 9, 2019

@kkty
Copy link
Author

kkty commented Sep 11, 2019

Performance comparison between the original implementation and the new one

The following graph compares the throughputs between the original implementation (with net/http) and the new one (with fasthttp.)

The CPU usage limit was set to 100%, 125%, 150%, 175% and 200% for each. Hence, the graph has 2 * 5 = 10 lines. The other parameters passed to the benchmark application can be seen in the legend section.

The x-axis represents how much load was placed (number of concurrent requests made during a test) and the y-axis represents the throughputs (number of successful responses during a test.)

Commit values, which can also be seen in the legend, show the hash of the most recent commit when the benchmark was run (refer to "master" branch and the "use-fasthttp" branch
at https://github.com/kkty/chocon to see the corresponding commits.) Specifically, the lines with “commit”: “a3b1…“ correspond to the new implementation, and the other lines correspond to the original implemantation.

In summary, the colors are described by the table below.

Original New
100% Blue Brown
125% Orange Pink
150% Green Gray
175% Red Ocher
200% Purple Light Blue

Figure_1 (2)

Results

From the above results, we can see that

  • with 100% CPU limit, the throughput has improved by 20%.
  • with 200% CPU limit, the throughput has improved by 60%.
  • with low concurrency, the original implementation could have made more requests through.

One of their reasons, as I see it, is that fasthttp has put much effort in reducing the number of heap allocations, and that it does not reduce the number of CPU operations.

With a lot of (concurrent) heap memory allocations, we can see performance degradation due to the heap's locking mechanisms. Go's memory allocator, which is loosely based on tcmalloc, performs really well in this kind of situation. But still, we are seeing some impacts. GC pressure may account for the results, too.

Also, it is worth mentioning that fasthttp uses worker pool model. And the first some requests, until the number of workers has reached the limit, involve worker creations. That might be the reason why the new implementation is not performant with smaller loads. If we send some requests before the actual benchmarks, the performance improves, which may confirm that the creation of workers is having impacts on the performance.

Profile

I also collected profiles of heap memory allocation with pprof and its -alloc_objects option.

The new implementation

profile011

The original implementation

profile014

With the same number of requests passed through (which can be confirmed by the fact that the number of allocations by time.Format, that is used for logging, is the same), we can see the 80% reduction of heap memory allocations.

Notes

The machine spec, with which I tested, is as follows.

  • Intel(R) Xeon(R) Bronze 3106 CPU @ 1.70GHz * 8
  • 32GB of RAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant