forked from ddopson/underscore-cli
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.template
371 lines (223 loc) · 19.8 KB
/
README.template
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
<%
var formats = require('./lib/output-formats.js');
var sample = require('./example-data/complex.js');
var program = require('./bin/underscore');
%># Overview
JSON is an excellent data interchange format and rapidly becoming the preferred format for Web APIs.
Thusfar, most of the tools to process it are very limited. Yet, when working in Javascript, JSON is fluid and natural.
<b>Why can't command-line Javascript be easy?</b>
Underscore-CLI can be a simple pretty printer:
cat data.json | underscore print --color
![example.png](https://raw.github.com/ddopson/underscore-cli/master/doc/example.png)
Or it can form the backbone of a rich, full-powered Javascript command-line, inspired by "perl -pe", and doing for structured data what sed, awk, and grep do for text.
cat example-data/earthporn.json | underscore extract 'data.children' | underscore pluck data | underscore pluck title
See [Real World Example] (#real_world_example) for the output and more examples.
### Underscore-CLI is:
* **FLEXIBLE** - THE "swiss-army-knife" tool for processing JSON data - can be used as a simple pretty-printer, or as a full-powered Javascript command-line
* **POWERFUL** - Exposes the full power and functionality of [underscore.js] (http://documentcloud.github.com/underscore/) (plus [underscore.string] (https://github.com/epeli/underscore.string), [json:select](http://jsonselect.org/#overview), and [CoffeeScript](http://coffeescript.org/))
* **SIMPLE** - Makes it simple to write JS one-liners similar to using "perl -pe"
* **CHAINED** - Multiple command invokations can be chained together to create a data processing pipeline
* **MULTI-FORMAT** - Rich support for input / output formats - pretty-printing, strict JSON, etc. See [Data Formats] (#data_formats)
* **DOCUMENTED** - Excellent command-line documentation with multiple examples for every command
### A Bit More Explanation ...
Underscore-CLI is built on [Node.js](http://nodejs.org/#download), which is less than a 4M download and [very easy to install](#installing_node). Node.js is rapidly gaining mindshare as a tool for writing scalable services in Javascript.
Unfortutately, out-of-the-box, Node.js is a pretty horrible as a command-line tool. This is what it takes to simply echo stdin:
cat foo.json | node -e '
var data = "";
process.stdin.setEncoding("utf8");
process.stdin.on("data", function (d) {
data = data + d;
});
process.stdin.on("end", function () {
// put all your code here
console.log(data);
});
process.stdin.resume();
'
Ugly. Underscore-CLI handles all the verbose boilerplate, making it easy to do simple data manipulations:
echo '[1, 2, 3, 4]' | underscore process 'map(data, function (value) { return value+1 })'
If you are used to seeing "_.map", note that because we arn't worried about keeping the global namespace clean, [many useful functions](dead_link_for_now) (including all of underscore.js) are exposed as globals.
Of course 'mapping' a function to a dataset is super common, so as a shortcut, it's exposed as a first-class command, and the expression you provide is auto-wrapped in "function (value, key, list) { return ... }".
echo '[1, 2, 3, 4]' | underscore map 'value+1'
Also, while you can pipe data in, if the data is just a string like the example above, there's a shortcut for that too:
underscore -d '[1, 2, 3, 4]' map 'value+1'
Or if it's stored in a file, and you want to write the output to another file:
underscore -i data.json map 'value+1' -o output.json
Here's what it takes to increment the minor version number for an NPM package (straight from our Makefile):
underscore -i package.json process 'vv=data.version.split("."),vv[2]++,data.version=vv.join("."),data' -o package.json
<a name="installing" />
# Installing Underscore-CLI
<a name="installing_node"></a>
### Installing Node (command-line javascript)
Installing Node is easy. It's only a 4M download:
[Download Node](http://nodejs.org/#download)
Alternatively, if you do [homebrew](http://mxcl.github.com/homebrew/), you can:
brew install node
For more details on what node is, see [this StackOverflow thread](http://stackoverflow.com/questions/1884724/what-is-node-js/6782438#6782438)
### Installing
npm install -g underscore-cli
underscore help
<a name="documentation" />
# Documentation
<a name="usage"/>
### Usage
If you run the tool without any arguments, this is what prints out:
<%= program.helpInformation().replace(/^/gm, ' ').replace(/\033\[[;0-9]*m/g, "") %>
<a name="data_formats"/>
### Data Formats
<%
_.map(formats, function (f, name) {
%>
#### <%= name %>
<%= f.description.replace('<', '<') %>
<%
if (name !== 'text') {
%>
<pre><code><%=
(f.stringify(sample).toString('binary'))
.replace(/\033\[[;0-9]*m/g, "")
.replace(/./g, function (x) {
var cc = x.charCodeAt(0);
if (cc > 31 && cc < 127 && !/[<>]/.test(x)) {
return x;
} else {
return '&#' + cc + ';';
}
})
%></code></pre>
<%
}
});
%>
<a name="real_world_example"/>
# Real World Examples
### Playing with data from a webservice
Let's play with a real data source, like http://www.reddit.com/r/earthporn.json. For convenience (and consistent test results), an abbreviated version of this data is stored in example-data/earthporn.json.
First of all, note how raw unformatted JSON is really hard to parse with your eyes ...
{"kind":"Listing","data":{"modhash":"","children":[{"kind":"t3","data":{"domain":"i.imgur.com","banned_by":null,"media_e
mbed":{},"subreddit":"EarthPorn","selftext_html":null,"selftext":"","likes":null,"saved":false,"id":"rwoa4","clicked":fa
lse,"title":"Eating breakfast in the Norwegian woods! Captured with my phone [2448x3264] ","num_comments":70,"score":960
,"approved_by":null,"over_18":false,"hidden":false,"thumbnail":"http://b.thumbs.redditmedia.com/mytr7zvc0zZdPVV7.jpg","s
ubreddit_id":"t5_2sbq3","author_flair_css_class":null,"downs":352,"is_self":false,"permalink":"/r/EarthPorn/comments/rwo
a4/eating_breakfast_in_the_norwegian_woods_captured/","name":"t3_rwoa4","created":1333763527,"url":"http://i.imgur.com/R
hBFe.jpg","author_flair_text":null,"author":"pansermannen","created_utc":1333738327,"media":null,"num_reports":null,"ups
":1312}},{"kind":"t3","data":{"domain":"imgur.com","banned_by":null,"media_embed":{},"subreddit":"EarthPorn","selftext_h
tml":null,"selftext":"","likes":null,"saved":false,"id":"rwgmb","clicked":false,"title":"The Rugged Beauty of Zion NP Ut
ah at Sunrise [OC] (1924x2579)","num_comments":5,"score":72,"approved_by":null,"over_18":false,"hidden":false,"thumbnail
":"http://f.thumbs.redditmedia.com/0v2GKlqrj35YUaVw.jpg","subreddit_id":"t5_2sbq3","author_flair_css_class":null,"downs"
:20,"is_self":false,"permalink":"/r/EarthPorn/comments/rwgmb/the_rugged_beauty_of_zion_np_utah_at_sunrise_oc/","name":"t
3_rwgmb","created":1333755348,"url":"http://imgur.com/veRJD","author_flair_text":null,"author":"TeamLaws","created_utc":
1333730148,"media":null,"num_reports":null,"ups":92}},{"kind":"t3","data":{"domain":"flickr.com","banned_by":null,"media
_embed":{},"subreddit":"EarthPorn","selftext_html":null,"selftext":"","likes":null,"saved":false,"id":"rvuiu","clicked":
false,"title":"Falls and island near Valdez, AK on a rainy day [4200 x 3000]","num_comments":10,"score":573,"approved_by
As I've already mentioned, it would be trivial to pretty print the data with 'underscore print'. However, if we are just trying to get a sense of the structure of the data, we can do one better:
TODO: working on a 'summarize' command -- INSERT_THAT_HERE (2012-05-04)
Now, let's say that we want a list of all the image titles; using a [json:select](http://jsonselect.org#overview) query, this is downright trivial:
cat example-data/earthporn.json | underscore select .title
Which prints:
[ 'Fjaðrárgljúfur canyon, Iceland [OC] [683x1024]',
'New town, Edinburgh, Scotland [4320 x 3240]',
'Sunrise in Bryce Canyon, UT [1120x700] [OC]',
'Kariega Game Reserve, South Africa [3584x2688]',
'Valle de la Luna, Chile [OS] [1024x683]',
'Frosted trees after a snowstorm in Laax, Switzerland [OC] [1072x712]' ]
If we want to grep the results, 'text' is a better format choice:
cat example-data/earthporn.json | underscore select .title --outfmt text
Fjaðrárgljúfur canyon, Iceland [OC] [683x1024]
New town, Edinburgh, Scotland [4320 x 3240]
Sunrise in Bryce Canyon, UT [1120x700] [OC]
Kariega Game Reserve, South Africa [3584x2688]
Valle de la Luna, Chile [OS] [1024x683]
Frosted trees after a snowstorm in Laax, Switzerland [OC] [1072x712]
Let's create code-style names for those images using the 'camelize' function from [underscore.string] (https://github.com/epeli/underscore.string).
cat earthporn.json | underscore select '.data .title' | underscore map 'camelize(value.replace(/\[.*\]/g,"")).replace(/[^a-zA-Z]/g,"")' --outfmt text
Which prints ...
FjarrgljfurCanyonIceland
NewTownEdinburghScotland
SunriseInBryceCanyonUT
KariegaGameReserveSouthAfrica
ValleDeLaLunaChile
FrostedTreesAfterASnowstormInLaaxSwitzerland
Try doing THAT with any other CLI one-liner!
### Version-bump in package.json
This one is straight out of our own [Makefile](https://github.com/ddopson/underscore-cli/blob/master/Makefile):
underscore -i package.json process 'vv=data.version.split("."); vv[2]++; data.version=vv.join("."); data;' -o package.json
### Getting a greppable list of URLs fetched during the load of a website
This is one I did at work the other day. Chrome --> Dev Console (CMD-OPT-J) --> Network Tab --> (right click context menu) --> Save All as HAR. I have no idea why it's called a "HAR" file, but it's pure JSON data ... pretty verbose stuff, but I just want the urls ...
cat site.har | underscore select '.url' --outfmt text | grep mydomain > urls.txt
Well, I'd also like to [ack](http://betterthangrep.com) through the contents of all those files. Best to get a local snapshot of it all:
cat urls.txt | while read line; do curl $line > $(echo $line | perl -pe 's/https?://([^?]*)[?]?.*/$1'); done
And I'm off to the races analyzing the behavior and load ordering of a complex production site that dynamically loads (literally) hundreds of individual resources off the network. Sure, I could have viewed all that stuff inside Chrome, but I wanted a local directory-structured snapshot that I could serve on a local Nginx instance by adding entries in /etc/hosts that mapped the production domains to 127.0.0.1. Now I can run the exact production site locally, make changes, and see what they would do.
Look at [Examples.md](https://github.com/ddopson/underscore-cli/blob/master/Examples.md) for a more comprehensive list of examples.
<a name="polish" />
# Polish: 1001 Little Conveniences
This section is intended to capture all the places where I spent a great deal of effort to get the best possible behavior on something subtle. aka "polish". It also captures some of the intended "best behaviors" that I haven't had cycles to implement yet.
### Templates as first class NPM modules - ie, real stack traces
When using the 'template' command, we go to great length to provide a fully debuggable experience. We have a custom version of the template compilation code (templates are compiled to JS and then evaluated) that ensures a 1:1 mapping between line numbers in the original *.template file and line numbers in the generated JS code. This code is then loaded as if it were a real Node.js module (literally, using a require() statement). This means that should anything go wrong, the resulting stack traces and sytax exceptions will have correct line numbers from the original template file.
### Expressions auto-return the last value
This one is a bit [CoffeeScript](http://coffee-script.org) inspired. When we parse command-line expressions for commands like 'map', they are evaluated as NodeScript objects. This allows us to retrieve the last value in the expression. In a previous version we wrapped expressions in function boilerplate; however this blocked the use of semicolons within an expression. With first class Script objects, we can evaluate multiple semicolon delimited expressions and still capture the value from the last expression evaluated. Thus, all of the following expressions will return "10".
underscore run '5 + 5'
underscore run 'x=5; y=5; x+y;'
underscore run 'x=5, y=5, x+y;'
This even works to find the last evaluated value inside conditional branches (these also return 10):
underscore run 'x=5; if (x > 0) { 10; } else { 0; }' # last value is 10
underscore run 'x=5; if (x > 0) { y=5; } else { y=-99; } x+y;' # last value is 'x+y'
In general, the principle here is that the code should just return what you intuitively expect without requiring much thought.
### Autodetection of CoffeeScript
If you type a CoffeeScript expression and forget to use the '--coffee' flag, Underscore-CLI will first attempt to parse it as JavaScript, and if that fails, parse it as CoffeeScript.
However, a warning is emitted:
"Warning: Parsing user expression 'foo?.bar?.baz' as CoffeeScript. Use '--coffee' to be more explicit."
Why do we print a warning? Unfortunately, there are a number of language features that are ambiguous between JS and Coffee. ie, expressions that are valid in both languages but with different meaning. For example:
test ? 10 : 20; // JS: if test is true, then 10, else 20
test ? 10 : 20; // Coffee: if test is true, then test, else {10: 20}. Tragic.
### Lazy loading of CoffeeScript/Msgpack/JSONSelect libraries
Loading the 'coffee-script' npm module takes 50+ ms. JSONSelect is another 5ms. That may not sound like much time, but it's the difference between 153 ms and 93 ms, and 153 ms is definitely human perceivable. It will also make a difference if you are writing a quick-and-dirty bash loop that executes underscore-CLI repeatedly. Plus, fast just feels good.
A few more notes... Node.js takes about 33 ms to run "hello world", and 45ms if you either "require('fs')" or 'require' anything that's not pre-compiled into the node executable (pretty hard to avoid that). Adding underscore, underscore.string and a few of node's pre-compiled modules, and basic code loading takes ~60 ms. That leaves ~33 ms spent on actually running code that initializes the command list and decides what to do with the command-line args that were passed in.
3rdly, as of v0.2.16, underscore-CLI now marks these packages as "optionalDependencies", meaning that on the minority of systems where there is an issues installing one of those packages (there was [a report](https://github.com/ddopson/underscore-cli/issues/12) of problems with msgpack), the overall underscore-CLI installation won't fail.
### Smart whitespace in output
As mentioned above, dense JSON is nearly unreadable to human beings, so we want to pretty print it. JSON.stringify will accept an 'indentation' parameter that does make JSON much more readible; however, this will put _everything_ on a new line resulting in output that is silly verbose -- printing "[1, 2, 3, 4]" will take up 6 lines despite having only 12ish characters. Node's "util.inspect" is a bit better, but it doesn't print valid JSON (eg, inspect uses single instead of double quotes). I don't want to compromize on JSON compatibility just to get pretty output. So I wrote my own formatter that gives the best of both worlds. The default output format is strictly JSON compatible, human readible, yet avoids excessive verbosity by putting small objects and arrays on a single line where possible. The formatting code is also pretty flexible, allowing me to support colorization and a bunch of other nifty features; at some point, I may break the formatter into it's own npm module.
### Smart auto-consumption of STDIN
TBI - as of this version, if there is no data, we will block for reading STDIN. We should only do this if the user expression refers to the well-known 'data' variable. This would unify the 'process' and 'run' commands.
### Smart auto-detection of return value
TBI - as of this version, the last evaluated expression value is always returned. However, sometimes, you want to mutate the existing data instead of returning a new value. This should be easy. If the expression does something like 'data.key = value', then the return value should be 'data'. Today, you have to write 'data.key = value; data'. I want that last part to be implicit, but only if you mutate the data variable. And there should be a command-line flag "--retval={expr,data,auto}", with 'auto' being the default.
### Efficienct stream processing for set-oriented commands like 'map'
TBI - as of this version, all commands slurp the entire input stream and parse it before doing any data manipulation. This works fine for the vast majority of scenarios, but if you actually had a 30GB JSON file, it would be a bit clunky. For set-oriented commands like 'map', a smarter core engine plus a smarter JSON parser could enable stream-oriented processing where data processing occurs continuously as the input is read and streamed to the output without ever needing to store the entire dataset in memory at once. This feature requires a custom JSON-parser and some serious fancy, but I'll get to it eventually. If you have any performance-sensitive use-cases, post an issue on Github, and I'd be glad to work with you.
# Reporting Bugs / Requesting Features
I strongly encourage bug reports and feature requests. I'll look at all of them eventually, though if I'm slammed at work or have something happening in my personal life, I might get a little bit behind. It is my hobby project after-all, and by all means, you are welcome to submit a pull-request which I'll get to a heck of a lot faster than a feature I have to build myself :)
When reporting a bug that might be related to a dependency, it's usually helpful to list out which platform you are on. Here's my info (as of 2012-11-05):
# uname -a
Darwin ddopson.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 i386 MacBookPro10,1 Darwin
# node -v
v0.8.1
# npm -v
1.1.35
# npm ls
underscore-cli@0.2.16 /Users/Dopson/work/other/underscore-cli
├── coffee-script@1.4.0
├─┬ commander@1.0.5
│ └── keypress@0.1.0
├── JSONSelect@0.4.0
├─┬ mocha@1.6.0
│ ├── commander@0.6.1
│ ├── debug@0.7.0
│ ├── diff@1.0.2
│ ├── growl@1.5.1
│ ├─┬ jade@0.26.3
│ │ └── mkdirp@0.3.0
│ ├── mkdirp@0.3.3
│ └── ms@0.3.0
├── msgpack@0.1.7
├── underscore@1.4.2
└── underscore.string@2.3.0
<a name="alternatives" />
# Alternatives
* [jsonpipe] (https://github.com/dvxhouse/jsonpipe) - Python focused, w/ a featureset centered around a single scenario
* [jshon] (http://kmkeen.com/jshon/) - Has a lot of functions, but very terse
* [json-command] (https://github.com/zpoley/json-command) - very limited
* [TickTick] (https://github.com/kristopolous/TickTick) - Bash focused JSON manipulation. Iteresting w/ heavy Bash integration. Complements this tool.
* [json] (https://github.com/trentm/json) - Similar idea.
* [jsawk] (https://github.com/micha/jsawk) - Similar idea. Uses a custom JS environment. Good technical documentation.
* [jsonpath] (http://code.google.com/p/jsonpath/wiki/Javascript) - this is not a CLI tool. It's a runtime JS library.
* [json:select()] (http://jsonselect.org/#tryit) - this is not a CLI tool. CSS-like selectors for JSON. Very interesting idea.... now available as an Underscore-CLI command.
* [RecordStream] (https://github.com/benbernard/RecordStream) - PERL based tool intended to process JSON "rows". I need to add similar multi-record support to underscore-CLI.
Please add a Github issue if I've missed any.