Operability #2

eldavido · 2014-04-15T18:44:25Z

A few days ago, we tried to deploy a new revision of our node.js-based data collection software into production. The revision in question required this library using the usual line in package.json, and invoked it using the parse method.

Everything worked fine in local development.

When we went to deploy it, we found the library was downloading a new user agent strings file into our production environment. After reading the code, I understand this is intended behavior, and the library does this automatically to ensure freshness of the user-agent definitions.

However, during our deployment, we hit a snag where the user-agent strings site was temporarily unavailable, causing our deployment to fail. After discussing this with our director of operations, we decided it's too risky to have a core part of our UA parse logic depending on the uptime of a site that's out of our control, especially given that this library doesn't appear to fail gracefully if it can't update itself.

To summarize, the current strategy has the following problems:

Deployments can fail if the remote library isn't available
Our deployment packages don't fully capture the state of the system
- Behavior can differ between instances of the code, if they've loaded different versions of the updates file (we run this code on a cluster of dozens of machines)
- Bug reproduction becomes more complicated, as the behavior of the system won't stay constant over time, which complicates change management and issue tracking

Before I try to write a patch, I wonder how @GUI feels about any of the following approaches:

Allow specification of a filesystem-based path from which the user-agent string will be loaded, which our operations team will be responsible for updating (probably via either a deployment package, or a cronjob, but let us handle that part of it as part of our deployment/change management procedures)
Allow automatic updates to be disabled, instructing the code to use the snapshot of the data which ships with the module

I think my preference would be (1) but I'm just brainstorming here, any input would be appreciated.

GUI · 2014-04-15T21:16:41Z

Great points all around. Thanks for bringing this up. I like the sounds of option 1 too. If you wanted to write a patch for that, I'm definitely open to pull requests.

eldavido · 2014-04-15T21:42:48Z

I chatted with the team, unfortunately we've decided to go with useragent NPM over this one as it already does some of these things. Good lessons to learn for future software dev though. - D

GUI · 2014-04-15T22:44:01Z

Thanks for raising these issues, in any case. I'm going to reopen this since I would still like to tackle these issues (cleaning up the auto-update mechanism, making it more configurable, and handling potential outages better). I'm not quite sure when I'll be able to get to this, but hopefully it will be sometime soonish.

Thanks again!

eldavido · 2014-04-15T23:13:12Z

On behalf of ops teams everywhere, thanks. I wore the pager at my current place for 1.5yrs (Crittercism), was amazing what all I learned about deployment, safety, and operability.

eldavido closed this as completed Apr 15, 2014

GUI reopened this Apr 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operability #2

Operability #2

eldavido commented Apr 15, 2014

GUI commented Apr 15, 2014

eldavido commented Apr 15, 2014

GUI commented Apr 15, 2014

eldavido commented Apr 15, 2014

Operability #2

Operability #2

Comments

eldavido commented Apr 15, 2014

GUI commented Apr 15, 2014

eldavido commented Apr 15, 2014

GUI commented Apr 15, 2014

eldavido commented Apr 15, 2014