-
Notifications
You must be signed in to change notification settings - Fork 0
Home
ls | pyp "p.replace('maybe','yes') | pp.sort() | pp[1:3] |p , p , p.strip('abc') | whitespace | p[3], 'no' | p.upper()
Pyp is a linux command line text manipulation tool similar to awk or sed, but which uses standard python string and list methods as well as custom functions evolved to generate fast results in an intense production environment. Pyed Pyper was developed at Sony Pictures Imageworks to facilitate the construction of complex image manipulation "one-liner" commands during visual effects work on Alice in Wonderland, Green Lantern, and the The Amazing Spiderman.
Because pyp employs it's own internal piping syntax ("|") similar to unix pipes, complex operations can be proceduralized by feeding the output of one python command to the input of the next. This greatly simplifies the generation and troubleshooting of multistep operations without the use of temporary variables or nested parentheses. The variable "p" represents each line as a string, while "pp" is entire input as python list: > ls | pyp "p[0] | pp.sort() | p + ' first letter, sorted!'" #gives sorted list of first letters of every line
In practice, the ability to easily construct complicated command sequences can largely replace "for each" loops on the command line, thus significantly speeding up work-flow using standard unix command recycling.
pyp output has been optimized for typical production scenarios. For example, if text is broken up into an array using the "split()" method, the output will be automatically numbered by field making selecting a particular field trivial. Numerous other conveniences have been included, such as an accessible history of all inter-pipe sub-results, an ability to perform mathematical operations, and a complement of variables based on common metacharcter split/join operations.
For power users, commands can be easily saved and recalled from disk as macros, providing an alternative to quick and dirty scripting. For the truly advanced user, additional methods can be added to the pyp class via a config file, allowing tight integration with larger facilities data structures or custom toolsets.
Thank you to code guru Max Smythe who has added these REGEX functions. Let's face it, sometimes a good REGEX is a great quick and dirty solution, and Max has brought us these straightforward and pythonic functions to get the job done! These changes can be found in the pyp_beta, available here. Let us know if you find any issues... p.rekill(RE1,RE2...) removes matches of specified regexes p.rereplace(re, str) replaces all matches of regex with str p.refindall(re) returns a list of all matches of regex Max also added the ability to kill colored output for you purists out there.
pyp includes can perform basic statistics using pp.stats(), allowing pyp to be used to quickly analyse any numeric sequence on the linux command line. Common attributes, such as MAX, MIX, STD-DEV, and MEAN (average) are computed for the entire input, and then "stamped" onto every line in the form of a python dictionary for further analysis. For example, consider the analysis of the following data set: [1,5,10] >printf "1\n5\n10" | pyp_beta "pp.stats()" [0]{max: 10.0, mean: 5.33333333333, min: 1.0, n: 3, original: 1.0, stddev: 3.68178700573, sum: 16.0} [1]{max: 10.0, mean: 5.33333333333, min: 1.0, n: 3, original: 5.0, stddev: 3.68178700573, sum: 16.0} [2]{max: 10.0, mean: 5.33333333333, min: 1.0, n: 3, original: 10.0, stddev: 3.68178700573, sum: 16.0}
Individual items are easily recalled using standard python dictionary notation. So, to divide the original value of each line by the max value of the entire inputs, just use this: ``` > printf "1\n5\n10" | pyp_beta "pp.stats() | p['original'] / p['max']" 0.1 0.5 1.0
Basic Match functions can be performed using the math module:
/>printf "1\n5\n10" | pyp_beta "p | float(p) | math.log(p)"
0.0
1.60943791243 2.30258509299
Please note, you must recast any strings to float before using the math module, but this isn't necessary when using pp.stats()
This release also incorporates several bug fixes.
pyp keeps all previous subresults with the history or h variable. You can access any previous subresult simply by using the h variable. For example, h[0] is the inintial input, while h[3] is the fourth subresult.
Download the beta here
# A Quick Tour
The simplest pyp example shows how python string methods can be used easily on the command line. For example, to split up the different columns of a linux long listing, we just use the split method with pyp's line-by-line variable "p"
ls -l | pyp "p.split()"
we can then use standard python indexing to select the column. For example, to select the last column, we can just use this:
ls -l | pyp "p.split()[-1]"
Any other python string methods can be used; for example p.lower() will make everything lowercase.
For a more complicated example, we take a linux long listing, capture every other of the 5th through the 10th lines, keep username and file name fields, replace "hello" with "goodbye", capitalize the first letter of every word, and then add the text "is splendid" to the end:
ls -l | pyp "pp[5:11:2] | whitespace[2], w[-1] | p.replace('hello','goodbye') | p.title(),'is splendid'"
This uses pyp's built-in line-by-line and entire input variables (p and pp), as well as the variable whitespace and it's shortcut w, which both represent a list based on splitting each line on whitespace (whitespace = w = p.split()).
The other functions and selection techniques are all standard python. Notice the pipes ("|") are inside the pyp command.
We can then save this as a macro to disk using the flag --macro_save splendid_example
The next time we need to perform this operation, we can simply use this:
ls -l | pyp splendid_example
# File and Second Streams
Typically, data is piped into pyp just like any other unix utility through STD-IN. There are, however, two other completely independent ways to get data into pyp. These other streams can then be manipulated separately as well as combined in various ways with the main stream.
File inputs can be specified using the --text_file flag, and are referred to on a line-by line-basis using the variable 'fp'. For example, to print out the std-in line-by-line next to each line in a file use this:
pyp "p + fp" --text_file example.txt
Second stream inputs, on the other hand, are anything after the pyp command that is not associated with an option flag. This can then be accessed separately from the primary stream by using the variable 'sp'. To print out the std-in line-by-line next to each sp_example, use this:
pyp "p + sp" sp_example1 sp_example2 sp_example3
# File and Second Stream List Variables
List operations can now be performed on file inputs and second stream inputs using the variables spp and fpp, respectively. These essentially treat each input as a list, so you can use standard python list methods as well as any of the specialized pyp list methods. Once modified, these changes will "stick", so future references to the list will use the new list.
For example to sort a file input use: ```
fpp.sort()
``` Once this operation takes place, the sorted fpp will be used for all future operations, such as referring to the file input line-by-line using fp.
If you need arbitrary strings added to your list, you can just use simple python list additions:
fpp + ['last list']
You can also add these inputs to the std-in stream using like this:
pp+fpp
If pp is 10 lines, and fpp is 10 line, this will result in a new pp stream of 20 lines, with the first 10 being from pp and the last 10 being from fpp. fpp will remain untouched; only pp will change with this operation.
Of course, you can trim these to your needs using standard python list selection techniques: pp[0:5]+ fpp[0:5]
This will result in a new composite input stream of 10 lines.
Keep in mind that the length of fpp and spp is trimmed to reflect that of std-in. If you need to see more of your file second stream input, you can extend your std-in stream simply:
pp+['']*10
will add 10 blank lines to std-in, and thus reveal another 10 lines of fpp if available.
Also, there are a few useful python math functions that work on lists of integers or floats like sum, min, and max. For example, to add up all of the integers in the last column input: ```
whitespace[-1] | int(p) | sum(pp)
Other Features p.ext string attribute added for file extension p.letters()|returns array of contiguous letters in p p.digits() |returns array of contiguous numbers in p.| p.punctuation()
The next functions are just some nice cleanup tools:
p.kill(STR1, STR2,...) takes multiple arguments. all STR* will be removed from p p.clean() |replaces "bad" metacharacters with underscores |
Useful Variables letters | abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ digits | 0123456789 | | punctuation | !"#$%&'()*+,-./:;<=>?@[]^_{|}~` |
letters is great when used with the line counter "n"...for example, to name things a,b,c,d just use "letters[n]"
Useful flags --keep_false | Normally pyp filters out anything that tests False ([],'',0,etc). With this flag, pyp just prints out a blank line if something tests False.