Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: cpu usage on ARM #658

Closed
Huangyan9188 opened this issue Sep 24, 2015 · 14 comments
Closed

nsqd: cpu usage on ARM #658

Huangyan9188 opened this issue Sep 24, 2015 · 14 comments

Comments

@Huangyan9188
Copy link

2015-09-24 5 22 11

2015-09-24 5 22 34

@ploxiln
Copy link
Member

ploxiln commented Sep 24, 2015

It does seem like a surprising amount of time is taken to get time and divide just to generate GUIDs.

By the way,

  • which OS is in use?
  • which CPU is in use?
  • what revision of nsqd is in use?

@mreiferson mreiferson changed the title pprof get out the res,because my nsqd's cpu usage up to 97% nsqd: nsqd cpu usage pprof Sep 24, 2015
@Huangyan9188
Copy link
Author

os is ubuntu 14.04

cpu is tk1

the nsqd is in version 0.3.2

i 've build another version 0.3.5 today and would test if the question is resolved

@ploxiln
Copy link
Member

ploxiln commented Sep 25, 2015

So the Tegra K1, ARMv7 (32bit) Cortext A15 (or "project denver" 64-bit ARMv8 cores, but I'm guessing that's less likely).

I don't think much nsqd benchmarking/tuning has been done on ARM. It makes sense that int64 division overhead could be rather different. And as for time related functions, I'm really not sure; for c programs on x86/64 it's accelerated with the VDSO (virtual dynamic shared object), for go on arm ... ?

@jehiah
Copy link
Member

jehiah commented Sep 25, 2015

@Huangyan9188 Also if you can share which version of Go you had 0.3.2 compiled with and what version you recompile 0.3.5 with (ideally Go 1.5.1), that would be helpful.

@mreiferson mreiferson changed the title nsqd: nsqd cpu usage pprof nsqd: cpu usage on ARM Sep 25, 2015
@ploxiln
Copy link
Member

ploxiln commented Sep 29, 2015

@Huangyan9188 might you be able to try my branch in #663

@ploxiln
Copy link
Member

ploxiln commented Sep 30, 2015

While looking at this, I noticed something interesting: idPump() is doing a lot of time.Now() and time.Sub(), which it would only do if there were a lot of errors from NewGUID() (or if runtime.Gosched() happened to result in inline calls to those same functions)

@mreiferson
Copy link
Member

@Huangyan9188 can you paste the nsqd logs?

@mreiferson mreiferson added the perf label Oct 3, 2015
@Huangyan9188
Copy link
Author

Hello, guys, we have find the point

Testing Case

  1. Start Nsqd
  2. Change Time Forward
  3. Change Time Backward

Then Nsq's CPU Usage up to the top

The core of this problem is the NewGUID function, It's related time serious

If i could fix it by changing the guid generation method?

@mreiferson
Copy link
Member

@Huangyan9188 would you like to open a PR for discussion?

@ploxiln
Copy link
Member

ploxiln commented Oct 12, 2015

We recognize that nsqd does not handle it very well when time goes backwards. It's more serious than high cpu usage; nsqd can't generate any usable GUIDs for messages.

In the short term, if you're trying to use nsqd, use something like ntpd, and make sure the time is OK before nsqd starts. Maybe use ntpdate to jump the time, then start ntpd, which will "slew" the time, and never cause it to jump backwards.

@ploxiln
Copy link
Member

ploxiln commented Oct 22, 2015

So, uh, I'll just mention that an obvious alternative algorithm for generating IDs, which does not depend on millisecond-precise time, is a well seeded, good pseudorandom generator (openbsd's arc4random() is just one on the top of my head). There is a 1% chance of a duplicate of random 64-bit ID if you generate 610,000,000 of them (see this useful birthday paradox table).

But it's probably not worth changing the algorithm and seeing what the performance is like, because there are other places in nsqd that I expect might be confused if time went backwards, like message expiry and defers (though those might be fixed by being based on system monotonic time?).

Anyway, this isn't really about ARM performance anymore, it might be worth closing.

@Huangyan9188
Copy link
Author

image

I have comment the time related if segments and then i got the errors like image

@Huangyan9188
Copy link
Author

why id are all 0784e362e6570000, because we use nsqd in cloud devices, so should always ntpupdate ,so we should fix this issue to continue use that

@mreiferson
Copy link
Member

Going to close this, thanks for the discussion.

Although there might be room for improvement in the way nsqd generates IDs, the fundamental problem here seems to be managing time on the host nsqd is running on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants