Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay subject/async scheduler susceptible to clock disciplining race conditions #5514

Closed
joshribakoff opened this issue Jun 21, 2020 · 17 comments

Comments

@joshribakoff
Copy link
Contributor

joshribakoff commented Jun 21, 2020

Bug Report

Current Behavior

Reproduction
I did not reproduce the bug, but I am confident this will cause bugs.

Consider that a user could simply adjust their clock manually, and the web app would erase chat message(s) or other values pending in a replay subject, prematurely (consider the consuming web app specified these values should remain in the replay subject for 30 seconds by passing the windowTime argument to the constructor of ReplaySubject.)

Other reasons the NTP time must not be used to calculate durations of time, is NTP will slew the clock so 1000ms elapse in 995ms of wall time, and will adjust the time abruptly if the time goes out of sync by over 128ms, yielding negative durations if relied upon to time the duration of something.

Expected behavior
Use performance.now() to time how long something took, since it not subject to clock disciplining, and timing the durations with it would be totally transparent to users, aside from fixing the bug.

Environment
n/a

Possible Solution
I would change Date.now() to performance.now(), in every single case where it is used to time the duration of how long something to took. Users should not suffer from clock drift issues by default.

Additional context/Screenshots
Please read https://www.w3.org/TR/hr-time-2/#introduction
Also please read http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-CLOCK-DISCIPLINE "how will NTP adjust my clock" for additional context
https://caniuse.com/#search=performance.now

@kwonoj
Copy link
Member

kwonoj commented Jun 21, 2020

Not directly related but somwhat related discussion: #5475

Changing default provider would be potentially breaking changes but I think it's worth to consider at least to provide a consistent way to specify custom provider, then consider change default values later.

@cartant
Copy link
Collaborator

cartant commented Jun 21, 2020

In general, I'm in favour of this, but there are a couple of potential problems:

  • IDK whether or not Jest mocks performance.now. I know that Sinon does. If Jest does not, changing this could break developer's tests.
  • performance.now was introduced in Node 8.5, so that's fine. However, it's not a global. That means that RxJS would have to deal with consuming it in Node environments and that would complicate things.

I'm wondering if this could not be handled in the wrappers - mentioned in our last meeting - that need to be introduced for testing purposed? If we are going to put thin a wrapper around the default provider, it could be made configurable and a developer could configure their own default provider using the performance.now for their environment.

@kwonoj
Copy link
Member

kwonoj commented Jun 21, 2020

If we are going to put thin a wrapper around the default provider

Is reason I thought attached issue might be related. If we have single point of abstracted interface access to timestamp which is configurable, I think that would solve these maybe.

@joshribakoff
Copy link
Contributor Author

For pre node 8.5 there is also process.hrtime(), for it not being a global, would it be an option to import it in a node specific build of RxJS, or fall back to Date.now() with a console warning?

In general, NodeJS users should not unwittingly be writing code that is susceptible to clock drift. Unknowingly introducing bugs far outweighs the downsides of having a warning, or jumping through hoops, in my opinion. At the very least there should be some sort of warning in the docs or elsewhere if there's going to be "unsafe" defaults, in my opinion.

In Jest I usually do jest.spyOn(performance, 'now').mockImplementation(...)

@kwonoj
Copy link
Member

kwonoj commented Jun 21, 2020

For pre node 8.5 there is also process.hrtime(), for it not being a global, would it be an option to import it in a node specific build of RxJS, or fall back to Date.now() with a console warning?

I think what @cartant meant is rxjs core itself cannot just do process.xxxx as if it could run in node - we have to create package agnostic to environment out of box, otherwise browser side bundler like webpack pulls in own polyfill (or other bundler might have different side effect) which will make user faces unexpected end results.

@joshribakoff
Copy link
Contributor Author

joshribakoff commented Jun 21, 2020

I'm not sure I understand why RxJS cannot do feature detection to check which global is present, and use the appropriate global as the default? Why couldn't it do something like

if(performance && performance.now && typeof performance.now === 'function') {
 tsProvider = performance.now
} else if(process && process.hrtime && ... ) {
 tsProvider = ()=>process.hrtime/1000
} else if (...){
  ... 
}

Wherein the last fallback, or last resort is Date.now, which emits warnings each time its called if NODE_ENV is dev. You can also access the node version at runtime, when running in node. When webpack / terser compiles for production, the warnings would be stripped. Regarding the API not being global in node, RxJS could polyfill the global to require() the module, and expose this polyfill either globally or internally. This would solve the issue with "unsafe" clock usage by default, what would be the objections to this approach?

otherwise browser side bundler like webpack pulls in own polyfill (or other bundler might have different side effect) which will make user faces unexpected end results.

I would trust the polyfill'd performance.now() more than I trust Date.now() to give accurate durations, and the user can always opt out of the polyfill since their bundler is in their control. Just playing devil's advocate here, these are all good things to think about. In theory the user can already be getting unexpected results today, in RxJS.

@kwonoj
Copy link
Member

kwonoj commented Jun 21, 2020

above example only helps runtime detection, bundler / compiler cannot detect it statically and strips out code.

I would trust the polyfill'd performance.now() more than I trust Date.now() to give accurate durations,

I'm not talking about trust of implementation, it means consumer of rxjs unnecessarily pulls in those bytes even though some env never able to use it: i.e browser should not need performance polyfill, but once rxjs core have top-level side effect to detect those runtimes, some bundler will pull those things in right away, and there's no way to us as lib prevent those thing to happen unless user explicitly configures their application's build config. As a library, we cannot afford to ask all user to config their app to a specific way.

@kwonoj
Copy link
Member

kwonoj commented Jun 21, 2020

Somewhat similar crux of change we had is #3566 - this is not runtime, but compile time enforcement user have to bring dom type definition lib even though their runtime never intend to use browser-like env. In result, we decide to take it out.

@joshribakoff
Copy link
Contributor Author

joshribakoff commented Jun 21, 2020

i.e browser should not need performance polyfill

If the code is going to run on an old browser that does not support performance.now(), and the code times the duration of something, it absolutely should be polyfilled, but we're talking about an API that has existed since Chrome 6 and is not polyfilled by Webpack or any other modern FE bundler I'm aware of. We used performance.now() on Twitch also, which is the 4th largest consumer of peak internet traffic, where we also used Webpack & did not have any issues with unwanted polyfills.

Regarding NodeJS, does solving issues there block fixing this for the browser by doing feature detection on the globals, given the code could fallback if the function is not present, and given that doing so is not going to cause extra bytes for the browser & will actually only fix unexpected behavior? Or is there a requirement to make nodeJS behavior match browser behavior more exactly? If the latter, I could look into solutions, but I'd want to make sure there's no other objections before I invest more time in that...

@kwonoj
Copy link
Member

kwonoj commented Jun 21, 2020

In my opinion, this discussion doesn't need to be distracted to pursue those directions: If rx provides proper configurable single, consistent entrypoint accepts any timestamp provider you want to use? then I guess what should be default value would be discussed separately, esp changing default will be definitely some kind of breaking but configurable entrypoint should not be considered as breaking.

@cartant
Copy link
Collaborator

cartant commented Jun 22, 2020

FWIW, process.hrtime uses a frame of reference that is different to that of performance.now, so the two are not interchangeable:

> console.log(performance.now(), process.hrtime())
80711.52759999037 [ 495473, 158099100 ]

IMO, the keeping the current Date implementation as the default, but providing an API for configuring the default provider is the way forward.

@joshribakoff
Copy link
Contributor Author

joshribakoff commented Jun 22, 2020

this discussion doesn't need to be distracted to pursue those directions
I guess what should be default value would be discussed separately

I started this discussion to talk about a potentially "unsafe" default, FYI.

FWIW, process.hrtime uses a frame of reference that is different to that of performance.now

They're both monotonically increasing clocks with known units. They can be wrapped in a simple pure function by RxJS that returns the number of ms since process start / page load, making them interchangeable.

IMO, the keeping the current Date implementation as the default, but providing an API for configuring the default provider is the way forward.

IMO, if that's the path forward, Rx should have more of a warning about the potential pitfalls of capturing durations with a non monotonically increasing clock, maybe a callout in the docs, highlighted in red or something... My issue here isn't lack of a workaround, its unsafe defaults & the fact that people that could unknowingly write "unsafe" code

@cartant
Copy link
Collaborator

cartant commented Jun 22, 2020

... making them interchangeable

They are not interchangeable from a documentation perspective. The docs cannot say the the default provider is performance.now if the values returned from calling now() on a scheduler use a different frame of reference. And having - documented or otherwise - different providers for the browser and Node is something to be avoided, IMO.

Rx should have more of a warning about the potential pitfalls ...

That is fine.

IMO, changing the default and breaking tests - if Jest does not mock performance.now() in useFakeTimers - is a deal breaker. Making a change to the default provider that could break tests for developers that potentially have no exposure to any misbehaviour would not be acceptable.

@joshribakoff
Copy link
Contributor Author

joshribakoff commented Jun 22, 2020

Agreed on both those points, the only other point I'd make is if there are any Rx operators that specifically try to measure the durations of things, it is worth accepting differing time origins in browser vs node, to calculate those durations in those operators specifically.

Replay subject works this way today. It diffs two NTP times instead of using setInterval like bufferTime operator. Ideally it would use setTimeout/setInterval, rather than subtracting relative timestamps. Would this also be in scope, in addition to the docs change?

I would try to avoid the time drift, where possible. You're right we should not break BC wholesale, just out of principle. In the case of replay subject, there is some workaround that does not depend on performance.now actually, which is to use setInterval. In fact if we made ReplaySubject monadic like I suggested in #5515, we could get rid of the potentially error prone logic in ReplaySubject that depends on relative NTP times, and it's behavior could be implemented with the existing operators that buffer with the async scheduler, eg. bufferTime

@joshribakoff
Copy link
Contributor Author

I guess one other after thought, if its so simple to override the timestamp provider to performance.now(), doesn't that also mean it'd also be easy for people to fix their test suites, by setting it back to the "unsafe" Date.now() if they care more about their tests than the time drift? At least people going forward would start writing code that is "safe". Seems better than a warning in the docs, to have the "warning" actually just be an upgrade note.

The docs cannot say the the default provider is performance.now

They can say they use monotonically increasing clock that approximates time since page load/process start in milliseconds. In theory the behavior should be the exact same of the time origins. It in theory would make timing more deterministic & consistent in Rx.

Is the tradeoff actually just people with legacy test suites adding 1 line, vs people without legacy test suited having to add a line to get "sane" defaults? Could RxJS implement a timeProvider that implements a monotonic platform independent clock, if it were not the default? Would it really be that egregious to make a breaking change that requires changing 1 line? I'm sure this library has done it before, for good reasons, pipe-able operators come to mind.

For a library designed to deal with events emitted over time & abstract away time, I really would be nice if it were not susceptible to the time drift by default, without having to change lines of code or write polyfilled clocks as the end user.

@joshribakoff
Copy link
Contributor Author

  • if Jest does not mock performance.now() in useFakeTimers - is a deal breaker

This is also misguided, I think. Jest does not mock Date.now in fake timers either, fake timers literally only mocks timers, eg setTimeout / setInterval. I frequently use jest fake timers, and still need to mock out Date.now & performance.now, with or without Rxjs.

@joshribakoff
Copy link
Contributor Author

joshribakoff commented Sep 10, 2020

This is what I'm currently doing to be able to write code that isn't going to suffer from clock drift:

myOperator(
  ...args,
  scheduler: SchedulerLike = new AsyncScheduler({} as any, () => performance.now())
)

It also says this is deprecated, and there's nothing in the docs calling out that the default behavior will yield to clock drift. IMO I still think it would be great if there could be an officially supported way to use a reliable time, without using undocumented / deprecated APIs (if not just using the "right" time method if it exists, it should at least be easier to override, I would propose)

This doesn't really have anything to do with jest. I specifically want to use marble tests, which means I have to get the current time from the scheduler, but I don't want to do that (at least not with the official scheduler) because it will make my code suffer from race conditions. I don't think it make sense this is blocked on anything to do with jests fake timers, which seem to be pretty unrelated to this issue

@benlesh benlesh closed this as completed May 4, 2021
@ReactiveX ReactiveX locked and limited conversation to collaborators May 4, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants