Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add start time to distinguish between cached and active entries #43

Open
colinbendell opened this issue Dec 13, 2017 · 9 comments
Open
Milestone

Comments

@colinbendell
Copy link

colinbendell commented Dec 13, 2017

The UA may receive Server-Timing entries that were cached (eg: by a CDN), provide ancestral detail but may not be relevant to visualizing the current timeline from the UA's perspective. To allow the UA to distinguish between recent and historical Server-Timing values, servers should add a start= timestamp. This way the UA can distinguish between recent Server-Timing entries and past entries when visualizing the timeline.

example

For example, the following request:
curl -D - -o /dev/null -s https://colinbendell.cloudinary.com/image/upload/c_crop,f_auto,g_auto,h_350,w_400/v2/Wizard-Clap-by-Markus-Magnusson.gif

Consider the two responses:

date: Wed, 13 Dec 2017 04:04:39 GMT
server: cloudinary
content-type: image/gif
content-length: 580970

server-timing: cloudinary;dur=55;start=2017-12-13T04:04:39.239Z;total=55
server-timing: akam;dur=59;start=2017-12-13T04:04:39Z;total=114;desc="miss"
server-timing: fastly;dur=48;start=2017-12-13T04:04:38.677Z;total=162;desc="miss"

Here we have UA -> fastly -> akamai -> Origin and all layers are active in this response which populates the CDN layers with a cach miss.

A few minutes later we re-request the resource and we get:

date: Wed, 13 Dec 2017 06:25:29 GMT
server: cloudinary
content-type: image/gif
content-length: 580970

server-timing: cloudinary;dur=55;start=2017-12-13T04:04:39.239Z;total=55
server-timing: akam;dur=1;start=2017-12-13T06:25:29Z;total=1;desc="hit"
server-timing: akam;dur=13;start=2017-12-13T06:25:29Z;total=14;desc="miss"
server-timing: fastly;dur=36;start=2017-12-13T06:25:28.991Z;total=50;desc="miss"

now we have UA -> CDN -> CDN -> CDN.

In the second example, the UA had a total TTFB of 116ms.

Using Server-Timing the UA is now able to distinguish between network time and server time. However, if the UA simply summed all the Server-Timing dur= values it would yield a total of 105ms as server think time. In reality 55ms of that 105ms total should not apply to visualizing this request.

To solve this, the UA should look at the start time and ignore entries that happen before the requested start time.

Can we trust the start time? What about time drift?

In this scenario, we can trust the start time because we are only worried about approximation to the request time to the UA as a mechanism to ignore older entries. In reality, servers can often be out of sync by 10s of milliseconds (maybe as high as a few hundreds). Modern infrastructure makes this less pervasive since encryption often requires a higher level of time synchronization. Regardless, this is intended to be directional and allow the UA to determine order and filter out 'past' entries, vs 'current' entries.

Why not mark the 'last' server-timing entry as last?

I'm assuming the principle that each layer in the chain should have isolated awareness. that is, a request should only be responsible for adding itself and a) should never touch another server-timing entry and b) should never need to be aware of other server-timing entries that may or may not be present. In short, each layer should be ignorant of any other timing metrics in the eco system and therefore oblivious if it is the last or first hop in the chain.

Shouldn't only the current server-timing entries be sent?

Just as the principle of isolated awareness, each hop should not be required to expunge past entries (though this may be desirable for business reasons). The ancestral details that Server-Timing provide are insightful for other purposes - debugging, triaging, utilization analytics, etc. For example, the Server-Timing values in the description may describe the A:B state from the origin and useful when combining with an analytics engine. The preservation of historical server-timing should be a business decision and not the responsibility of the spec.

[edit: remove extraneous desc details]
[edit: changed from epoch to ISO8601. In our implementation, we found that it was mostly humans looking at this value and humans can't read epoch very well. switching to ios8601 made for this to be more practically useful in production at the expense of a few extra characters]

@yoavweiss yoavweiss added this to the Level 2 milestone Oct 17, 2018
@colinbendell colinbendell changed the title add epoch start time to distinguish between cached and active entries add start time to distinguish between cached and active entries Apr 17, 2019
@hauleth
Copy link

hauleth commented Dec 16, 2019

I would use at keyword for that and leave start to denote the "start time" within processed event itself. That would allow server-timing to be used to display real trace of the events that happened on the server.

@colinbendell
Copy link
Author

Are you suggesting this would allow you to group at on a process layer with different start for each inner-process? Can you give a proposed example?

@hauleth
Copy link

hauleth commented Feb 1, 2020

@colinbendell sorry for late response but the idea is to use start keyword as a millisecond/nanosecond since imaginary start of the request handling by the server, so for example with headers like:

server-timing: request;dur=40;start=0
server-timing: db;dur=10;start=10
server-timing: json;dur=5;start=35

Would result in diagram like:

|--------------------------------------| request
          |--------| db
                                   |---| json

And start would be by default set to 0 (however no timing would be required to have start=0, so in theory all can start after starting handling request).

EDIT:

However I now see that this is just different approach to #44, but instead of total which is "end" time it uses start as "begin" time.

@noamr
Copy link
Contributor

noamr commented Aug 16, 2022

Since server timing exposes timings that come from different clocks, which makes not 100% comparable, and OTOH considering that this data could be useful for DevTools et al, I believe perhaps a reasonable way to look at it is that one of the attributes would provide a start time approximation. I believe this would also solve #44.

Developers that want to look at it or at waterfalls created based on this data should be aware of the possible clock differences and that any waterfall is an approximation, though in some cases still useful.

Perhaps approx_start=${ISO-8601}?

/cc @yoavweiss @paulirish

@murraylchapman
Copy link

Lack of a standard for "start time" means that different vendors are implementing their own incompatible solutions:

Fastly is going with:

Server-Timing: time-start-msec;dur=1544707555517,time-elapsed;dur=81

Cloudinary is going with:

server-timing: cloudinary;dur=94;start=1516653847.986;total=94

@hauleth
Copy link

hauleth commented Jan 19, 2023

One of the biggest problem I see there is that there is no API for fetching extra metadata from the entry (see #72). This would allow digesters to circumvent this problem in more uniform way.

@paulirish
Copy link
Member

A month ago Cloudflare started using a new convention:

server-timing: cfL4;desc="?proto=TCP&rtt=7337&sent=8&recv=8&lost=0&retrans=0&sent_bytes=3419&recv_bytes=832&delivery_rate=548023&cwnd=25&unsent_bytes=0&cid=94dae6b578f91145&ts=225

@LPardue
Copy link

LPardue commented Oct 23, 2024

Hi @paulirish. I added cfL4. Mainly as a way to tightly couple with other RUM data. Other header fields aren't as accessible from the client-side javascript we have running.

We are shoehorning a lot into the desc but on balance it seemed like the most effective way to accommodate data values of arbitrary type or units. At least in the absence of something like #72. It's in early deployment but already proving useful to divide up our cohorts into groups based on things like RTT or packet loss.

Not sure exactly what cfL4 has to do with the OP issue of start time. Am I missing something?

Happy to answer any questions on our approach, or entertain other ideas.

@paulirish
Copy link
Member

@LPardue Yup! Sorry I wasn't casting aspersions. :) The solution seems clever. All good.

I've returned to this thread a few times to look at how various teams have created conventions for including a start time. So, I was adding mention to be comprehensive. Admittedly, it's a tad offtopic from Colin's original question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants