Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problems from my 1+ year experience with CLS #59

Open
iliakan opened this issue Feb 14, 2016 · 22 comments
Open

The problems from my 1+ year experience with CLS #59

iliakan opened this issue Feb 14, 2016 · 22 comments

Comments

@iliakan
Copy link

iliakan commented Feb 14, 2016

I'm writing it as a note to myself and maybe other guys who plan/use CLS (continuation-local-storage).

I met several practical problems with it. But my another issue hangs about half-year+ without an answer (not even sure if it's actual any more, after AL updates), so I'm not asking anything from devs. Anyway I'm not using CLS for anything serious any more.

Just noting the things I met for a year+ of trying to make it work (with occasional success), so that people know the difficulties ahead, cause some of them are rather hidden.

The patching problem:

  • There are many modules which do not support CLS. Namely, bluebird, mongoose, a bunch of others. CLS just doesn't work with them or works with serous bugs (ye I know why).
  • Most modules can be monkeypatched to support CLS. But that's only possible if the needed functionality is exported, otherwise I need to fork/really patch it.
  • In practice the monkeypatch tends to break on new major versions of the modules. Then repatching takes time to dive deep into the updated module once again.

The private versions patching problem:

  • Even if I manage to patch something, like bluebird v3, there's a chance that a 3rd-party module in my app uses it's own version of bluebird, say v2, deep inside it's node_modules hierarchy.
  • So I must watch not only over modules I need, but also track any module in the node_modules tree of my app, to see if there's a private version of a CLS-unfriendly module, and patch that. Every module/submodule/subsubmodule/... must support CLS or be patched for it.
  • That greatly extends the area of maintenance I need to do to keep everything CLS'able. Also gives so much space for bugs in case if something updates and I miss it.

The dangerous bugs problem:

  • The bugs introduced by a missed CLS-unfriendly updates of a module or it's submodule may be subtle yet really destructive and hard to fix back. Imagine a payment going to a wrong account. Or a regular user getting an admin view of page with all the secret information.
  • That prevents relying on CLS even if though mostly works.

I admire the idea behind the module/async-listener. Async chains in Node.JS are like green threads. A context for an async chain is so cool, it's a must. It's like thread local variables, widely used by so many systems.

Maybe there's something that can be done in the node core to make it work everywhere, make every module CLS'able without critical perf loss?

It would be great to get more attention to the module and its functionality from the minds of the community.

@holm
Copy link

holm commented Mar 5, 2016

All of the above exactly mirrors my experiences from using CLS for about a year. We still use it, as I have not found any other way to have "thread local storage" in Node. I admire CLS for coming up with a solution that mostly works.

Coming to Node with a background in Java, Python and Ruby, the lack of async chains was the biggest surprise. I still wonder how people properly track transactions, authentication and other contextual data through a large system.

@othiym23
Copy link
Owner

As of Node 6, Trevor Norris's AsyncWrap API is a documented, stable part of the core Node API. It would be excellent to port CLS to take advantage of AsyncWrap when it's available, as I believe that it addresses a number of the points raised above (and Trevor is a much better steward of AsyncWrap than I've been able to be for async-listener, which is after all only my backport of Trevor's previous sketch at this kind of API). It would also be excellent to see CLS extended to directly incorporate all of the known shims for packages like Bluebird that require a little extra help to work with CLS.

If anyone felt like taking on that work, I would be happy to add them to the repo and list of owners on npm, because it's quite clear to me that my other commitments don't afford me the time to take on that work, which is pretty time-consuming and finicky.

@ofrobots
Copy link

AsyncWrap is indeed a good alternative to async-listener, but it doesn't provide enough functionality to cover the continuation-local-storage use cases.

OP mentioned 'the patching problem'. I call it the 'user space queuing problem' (more details here). It stems from the fact that any user space code (JS or native) could queue callbacks can resume them in a context different. The status quo here has been to discover such code, and get them monkey patch them to ensure they continue working. This is a never ending battle.

IMO a longer term solution would be to

  1. Get the concept of 'async context' into the language and/or host environment.
  2. Have the module authors (e.g. bluebird) uphold the contract with the language, by ensuring their modules propagate the async context.

Ideally we want this to work with no monkey patching at all.

Authors of bluebird have indicated that they have little interest in propagating context for continuation-local-storage; they do propagate the current domain, however, since domains are part of core. This is an understandable position, and bluebird is not the only module with this disposition.

In my opinion, to get a solution that is better than 'mostly works', we need:

  1. AsyncWrap in node.js.
  2. Zones in TC39. This brings in the concept of async context into the language.

@RobinQu
Copy link

RobinQu commented Aug 20, 2016

I think cls should remains as a basic component to deal with async contexts/zones.
Should we propose a new repo to implement APIs compatible to Zone in TC39?
Or a major version bump should happen if we shift to those APIs.

@Qard
Copy link
Collaborator

Qard commented Aug 22, 2016

I don't think CLS would ever change its API to match zones. Likely a new polyfill would implement the zones API and people would just start using that.

It's possible CLS might start using a zones polyfill internally though, at some point. Currently it uses an async-listener polyfill, and there's some effort happening to upgrade to an async_wrap polyfill.

@RobinQu
Copy link

RobinQu commented Aug 23, 2016

What's the point of using CLS API if Zone from TC39 has already provided a
similar set of APIs?

On Tue, Aug 23, 2016 at 6:21 AM, Stephen Belanger notifications@github.com
wrote:

I don't think CLS would ever change its API to match zones. Likely a new
polyfill would implement the zones API and people would just start using
that.

It's possible CLS might start using a zones polyfill internally though, at
some point. Currently it uses an async-listener polyfill, and there's some
effort happening to upgrade to an async_wrap polyfill.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#59 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAU6FT80f9AUCg8_YlQTb9DFRp_qLvsiks5qiiDigaJpZM4HZ9ff
.

@Qard
Copy link
Collaborator

Qard commented Aug 23, 2016

Zones are lower level. CLS is purely just meant to be a data structure to store data scoped to the execution context, and likely it'd remain as just that.

If someone wants deeper control of how the async contexts are linked, they can reach for whatever Zones API polyfill surfaces.

@ckitt
Copy link

ckitt commented Apr 7, 2018

As of 2018, with node 8.0 async hook features, wonder if CLS implementation still the best way for implementing async context in node.js, if yes, do you guys have any better suggestion? Thank you very much and appreciate your help!

@overlookmotel
Copy link
Contributor

@ckitt I think the plan is for CLS to be rewritten to utilise Async Hooks.

cls-hooked module claims to do exactly this, but I don't know how stable it is at present.

Also, Async Hooks has been going through some changes to tackle problems that became evident at its first release, particularly as it relates to Promises. Again, I can't tell you where this work is up to - I'm out of date - and whether Async Hooks is really ready for prime time at this point.

@holm
Copy link

holm commented Apr 7, 2018

The overall tracking issue for any changes to async-hooks is at nodejs/diagnostics#124.

For what it is worth we use cls-hooked on top of async-hooks in production without any real issues. It generally just works. I am pretty sure there is nothing else out there that is any better, and to be honest I also think it is good enough in the vast majority of cases.

@ckitt
Copy link

ckitt commented Apr 7, 2018

Thanks for the prompt answer! @holm @overlookmotel

I am asking because we hit some strange problem when using cls under node 8.x environment (not sure if this is the root cause), the context is not removed properly after the execution ended, which causes cls to return the other execution context from other async execution (this is very serious in our use cases as we use cls for storing permission information of user). After studying the source code from cls, we suspect that this related to bugs in async-listener, thus we are asking if there's any plan for cls to be integrated with Async Hook.

@holm
Copy link

holm commented Apr 7, 2018

I would definitely recommend using cls-hooked. It has much less quirks than this original module, since it requires much less monkey patching of other libraries. Before we had to monkey-patch many libraries to get the context updated at all times.

With cls-hooked using async-hooks you get pretty much everything working out of the box, you only really need to monkey-patch promise libraries.

@ckitt
Copy link

ckitt commented Apr 7, 2018

Thanks for your advices @holm, quite disapointed that we cannot even figure out the root cause for the stange behavior encountered, but will definitely give cls-hooked a try.

@Qard
Copy link
Collaborator

Qard commented Apr 7, 2018

Yeah, definitely just use cls-hooked. This module is more just a legacy support module at this point. Unless you are on a pre-6.x version, you don’t actually need all the monkey-patching and the async-listener polyfill this module provides. The cls-hooked module provides the same interface, so you can easily just swap this one out for that.

@iliakan
Copy link
Author

iliakan commented Apr 7, 2018

I'm the author of the original issue.

Is cls-hooked totally free of the described problems, and it provides "async-local variables" just like CLS?

@Qard
Copy link
Collaborator

Qard commented Apr 7, 2018

Yep. It’s the same module, they just pulled out all the monkey-patching and async-listener parts and replaced them with async-hooks. The user-facing surface of the module is the same. The only difference being that this module only works on older versions of Node.js while cls-hooked only works on newer versions.

@dazld
Copy link

dazld commented Apr 8, 2018

@Qard that's really super great news, updating the readme would be cool if you can.

@iliakan
Copy link
Author

iliakan commented Jun 25, 2018

@holm and others who use cls-hooked in production.

After a close look, I really see no working difference between this module and cls-hooked. The low-level internals are different: hook-patching instead of monkey-patching :), but the problems are the same. The principle is same.

Mongoose, bluebird - exactly same failures.

Or please explain :)

@overlookmotel
Copy link
Contributor

@iliakan I don't know about Mongoose, but I can explain the failure with bluebird.

Bluebird internally does it's own queueing which by default doesn't work with async_hooks (or indeed async-listener, which is why it didn't work with this old version of CLS). Bluebird needs to be patched to support async_hooks. There's an issue open on bluebird repo for this, but I don't think it's been done yet.

You can patch bluebird using cls-bluebird module to play nice with CLS. That module has been tested with old CLS, but not cls-hooked, so I can't guarantee it'll work but in principle it should (I wrote the current version).

The other alternative is to use native JS Promises - those support async_hooks.

Likely Mongoose gets CLS contexts mixed up for same reasons as bluebird - it does some internal queuing. Since async_hooks is part of Node core, Mongoose should support it - I'd suggest opening an issue for that if there isn't already. This will be affecting not just CLS, but other uses of async_hooks e.g. tracing.

Hope that helps somewhat.

@iliakan
Copy link
Author

iliakan commented Jul 1, 2018

Yes, that’s exactly the same issue that I was talking about in the beginning of the thread. So the async hook solution is no different.

My original issue is still actual, unfortunately.

@iamnewspecies
Copy link

Thanks for the prompt answer! @holm @overlookmotel

I am asking because we hit some strange problem when using cls under node 8.x environment (not sure if this is the root cause), the context is not removed properly after the execution ended, which causes cls to return the other execution context from other async execution (this is very serious in our use cases as we use cls for storing permission information of user). After studying the source code from cls, we suspect that this related to bugs in async-listener, thus we are asking if there's any plan for cls to be integrated with Async Hook.

I still see this issue in cls hooked ("^4.2.2"). Generally the context of previous api call appears. It is like it holding some reference to the old context and it is there till it is not garbage collected (Speculating -as it suddenly changes to some new value). @ckitt were you able to figure out why it is really happening?

@joebowbeer
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests