Skip to content

Conversation

@zwoop
Copy link
Contributor

@zwoop zwoop commented Aug 8, 2023

This tries to fix the problem Yahoo is seeing with times on short lived requests are skewed now when the thread local times aren't updated as often as they need to be. This does

  • Revert the thread_local time cache
  • Always call clock_gettime() (it's cheap / fast, with native TSC handled by the kernel / VDSO
  • It also switches to CLOCK_REALTIME_COARSE, for now at least, we can reevaluate if that accuracy is enough.
  • Finally it removes some old code, and cleans up the interfaces.

co-author: Chris McFarlen, Craig Taylor

@zwoop zwoop added this to the 10.0.0 milestone Aug 8, 2023
@zwoop zwoop self-assigned this Aug 8, 2023
@zwoop zwoop added the Core label Aug 8, 2023
@ywkaras
Copy link
Contributor

ywkaras commented Aug 9, 2023

How about this instead? #10164

bneradt
bneradt previously approved these changes Aug 9, 2023
Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this does away with the get_hrtime_updated vs get_hrtime distinction. It sounds like your benchmarks shows it's performant too.

I approve. I would appreciate hearing @masaori335's thoughts as well.

@zwoop
Copy link
Contributor Author

zwoop commented Aug 9, 2023

How about this instead? #10164

I think 10164 essentially reverts everything back to as before the time "cache" became thread_local. Since these are high res values, odds are that you will always write the updated time. It will fix yahoo's problem, but essentially is the same as reverting 72e6602 in almost all cases.

@zwoop zwoop marked this pull request as draft August 9, 2023 04:26
@zwoop zwoop force-pushed the FixHRTime branch 2 times, most recently from 472f650 to 64bb4a9 Compare August 9, 2023 16:26
@zwoop zwoop marked this pull request as ready for review August 9, 2023 19:53
@c-taylor
Copy link

c-taylor commented Aug 9, 2023

To record here also as 'coarse' clocks are mentioned. Underlying kernel config affects the resolution of that clock source.
CONFIG_HZ does vary by distribution, but defaults (currently) to 1000Hz. Because of the variance I would continue to default (as the PR does) to CLOCK_REALTIME.

https://www.kernel.org/doc/html/latest/core-api/timekeeping.html

bneradt
bneradt previously approved these changes Aug 9, 2023
Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tested with this patch internally and confirmed it resolves the latency metrics issue we saw with thread_local.

Thanks for working on this!

Comment on lines 5207 to 5213
.. ts:cv:: CONFIG proxy.config.system_clock INT 0
*For advanced users only*. This allows to specify the underlying system clock
used by ATS. The default is ``CLOCK_REALTIME`` (``0``), but a higher performance
option could be ``CLOCK_REALTIME_COARSE`` (``5``). See ``clock_gettime(2)``_ for
more details.
Copy link
Contributor

@bneradt bneradt Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say something about how to ascertain the values of these? My man page for clock_gettime lists the names but not values (0, 5, etc). Or, since it is low level, do we assume they can figure out how to get the values from the appropriate .h file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add something about where to find the numbers. But if they can’t find that they probably shouldn’t be messing with this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defined in the time.h on Linux.

Linux (https://github.com/torvalds/linux/blob/v6.4/include/uapi/linux/time.h)

#define CLOCK_REALTIME			0
#define CLOCK_REALTIME_COARSE		5

Now, another CAVEAT is the value depends on the platform.

FreeBSD (https://github.com/freebsd/freebsd-src/blob/release/13.2.0/sys/sys/_clock_id.h)

#define CLOCK_REALTIME		0
#define CLOCK_REALTIME_FAST	10
#define	CLOCK_REALTIME_COARSE	CLOCK_REALTIME_FAST

macOS doesn't support CLOCK_REALTIME_COARSE

_CLOCK_REALTIME __CLOCK_AVAILABILITY = 0,
#define CLOCK_REALTIME _CLOCK_REALTIME

@masaori335
Copy link
Contributor

I'm +1 of getting rid of the cur_time cache from ATS. It looks like the modern Linux Kernel can handle it well from our benchmark.

  • vanilla (includes thread_local cur_time patch): 541,994 req/sec
  • Revert PR#9184: 463,053 req/sec (85.4%)
  • PR#10163 with CLOCK_REALTIME: 535,093 req/s (98.7%)
  • PR#10163 with CLOCK_REALTIME_COARSE: 536,563 req/s (99.0%)

@zwoop zwoop force-pushed the FixHRTime branch 2 times, most recently from 5e8fd06 to 4488f6a Compare August 11, 2023 03:20
@apache apache deleted a comment from ezelkow1 Aug 11, 2023
@zwoop zwoop merged commit ba467a8 into apache:master Aug 14, 2023
@zwoop zwoop deleted the FixHRTime branch August 14, 2023 16:06
bneradt pushed a commit to bneradt/trafficserver that referenced this pull request Aug 14, 2023
* Make sure that the thread local time is updated timely

* Remove dead TSC code from the dark ages

* Adds a new configuration option, and defaults to CLOCK_REALTIME

(cherry picked from commit ba467a8)
bneradt added a commit to bneradt/trafficserver that referenced this pull request Aug 14, 2023
This adds back the get_htrime call to the constructor of
QUICSentPacketInfo that was accidentally removed in apache#10163.
bneradt pushed a commit to bneradt/trafficserver that referenced this pull request Aug 14, 2023
* Make sure that the thread local time is updated timely

* Remove dead TSC code from the dark ages

* Adds a new configuration option, and defaults to CLOCK_REALTIME

(cherry picked from commit ba467a8)
bneradt added a commit that referenced this pull request Aug 14, 2023
This adds back the get_htrime call to the constructor of
QUICSentPacketInfo that was accidentally removed in #10163.
zwoop added a commit that referenced this pull request Aug 14, 2023
* Make sure that the thread local time is updated timely

* Remove dead TSC code from the dark ages

* Adds a new configuration option, and defaults to CLOCK_REALTIME

(cherry picked from commit ba467a8)

Co-authored-by: Leif Hedstrom <zwoop@apache.org>
masaori335 pushed a commit to masaori335/trafficserver that referenced this pull request Sep 26, 2023
…apache#10183)

* Make sure that the thread local time is updated timely

* Remove dead TSC code from the dark ages

* Adds a new configuration option, and defaults to CLOCK_REALTIME

(cherry picked from commit ba467a8)

Co-authored-by: Leif Hedstrom <zwoop@apache.org>
cmcfarlen pushed a commit to cmcfarlen/trafficserver that referenced this pull request Jun 3, 2024
* asf/master: (221 commits)
  LSan: Fix leak of test_Metrics (apache#10179)
  LSan: Fix memory leak of test_EventSystem (apache#10178)
  LSan: Fix memory leak of test_X509HostnameValidator (apache#10161)
  Remove cqtx log field (apache#10111)
  Require ATS plugins to be compiled with C++17. (apache#10007)
  Fix conf_remap plugin build on macOS (apache#10177)
  libswoc: Update to 1.5.4 (apache#10155)
  Makes cmake build again, on macOS (apache#10172)
  Check SNI in h3 (apache#10184)
  Remove autoconf headers during CMake configuration (apache#10173)
  test_QUICLossDetector.cc: Add back get_hrtime() (apache#10185)
  ink_ink_get_hrtime -> ink_get_hrtime (apache#10182)
  mgmt: make libconfigmanager a true static library (apache#10181)
  Make sure that the thread local time is updated timely (apache#10163)
  Unrequire remap rules for OCSP (apache#10146)
  cache_range test performance improvement (apache#10170)
  Clean up certifier plugin debug messages. (apache#9975)
  cmake: add check for clock_gettime (apache#10169)
  Remove Http3NoError allocations (apache#10165)
  Fix Throttler initialization. (apache#10154)
  ...
@maskit maskit mentioned this pull request Aug 15, 2024
91 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants