User's Guide 06:00 Administering TaskBotJS

So my day job is not application development. I build applications, from Ruby web frameworks to video mixer control applications, but my day job is devops consulting. It's not just because of my day job that I've tried to make TaskBotJS as stump-dumb simple for an administrator to handle as I could, but...it's a big part of it.

Redis Versions

tl;dr: whatever's the newest that AWS Elasticache supports, I support

As of this writing (0.1.0-wip), TaskBotJS works with almost any moderately recent version of Redis. The set of commands used is small, we currently aren't using any Lua scripting for Redis, etc. -- a large part of TaskBotJS was developed and built against Redis 2.8.20, released on 4 June 2015, and since it's been bumped to use 3.2.10, released in July 2017. I would expect, though have not exhaustively tested (an integration test matrix is planned by or not long after 1.0 release), that TaskBotJS will play nicely with any future Redis version so long as backwards compatibility is maintained--and antirez, he of the Redis wizardry, seems very into maintaining compatibility.

One guarantee I can make is that, until I declare otherwise, TaskBotJS will always run on the most recent version of AWS ElastiCache (3.2.10 at time of writing). It's much harder to determine what versions of Redis that managed Redis-alikes (or even fully managed Redis services--sometimes it's hard to tell!) such as Google Cloud Memorystore or Azure Redis Cache, but I consider those to be first-class citizens as well. TaskBotJS will consistently run on those platforms, too.

Setting Up a Service

tl;dr: use systemd or runit or supervisord or whatever, but use something

TaskBotJS tries to be well-behaved with regards to running as a service. It doesn't daemonize itself, expecting instead to be run under a process handler. As a CentOS user by habit, I use systemd, but it's not exactly rocket science and whatever you want to use should be fine.

Logging

tl;dr: JSON logs in production unless you tell Bunyan otherwise

TaskBotJS standardizes on the awesome Bunyan logging library. (If you use Winston or something else, no worries-- there exist converters to sync everything up.) A big part of why I like Bunyan, and why IMO you should too, is that it goes hard for structured logging: every log entry comes out as a single JSON object. I apologize if you're already familiar with the Good News and understand why it's most cromulent that we do our logs with Bunyan, but for everyone else, the big win is that it's really easy to add context to your log entries. And that context makes debugging a whole lot easier on you later. Consider this line:

{"name":"consumer","hostname":"bigboss","pid":31800,"component":"ArgJob","jobId":"crash-anarchical-531252","level":30,"arg":25,"msg":"I have an arg: 25","time":"2018-05-03T06:02:13.416Z","v":0}

Here we've got the hostname/pid of the running service to answer the age-old question "where the heck is this coming from?", the component field against which your friendly aggregation system can let you filter down to ArgJob, we've got our job ID inline--everything that's useful to you. And, because it's Bunyan, custom fields are trivial to add, like the arg field added to this row. Bunyan also supports child loggers, so that it's easier to pass a logger to a method to retain the context that belongs to the current flow of execution.

Signal Handling and Shutdown

TaskBotJS catches SIGTERM and SIGINT and treats them both effectively the same. The server process waits for the intake and worker loops to stop (thus halting the fetching and queueing of new jobs), then puts back any jobs that are currently in progress or are waiting to be launched.

TaskBotJS Enterprise includes a graceful shutdown option.

Concurrency Configuration

tl;dr: the defaults are usually okay, but check your metrics

By default, a newly created Config object in TaskBotJS has a concurrency value of 20. This means that TaskBotJS will attempt to keep in flight a maximum of about 20 jobs at any given time. (This is not an iron-clad limit; occasionally you may see 21 jobs with a concurrency of 20, and that's intentional.) This makes the most sense for very IO-constrained operations or ones that are offloading a lot of work to C++ extensions. Because, after all, we're still in a NodeJS process; not blocking the main thread of execution is kind of important. However, there certainly exist programmatically complex, high-CPU jobs that your code is bouncing out towards--in such situations ramping concurrency down to something lower, such as 1 to 1.5 threads per execution thread, will reduce thrashing and improve throughput.

If you're using TaskBotJS Pro, per-worker metrics can help you tease out how your configuration is handling your workload.

Provide feedback

Saved searches