-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.template
61 lines (42 loc) · 2.86 KB
/
README.template
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# clixpath
A command-line tool to extract the values of *Xpath* expressions from *HTML* and *XML* documents.
clixpath's distinguishing features are intended ease of use and producing output that can be parsed
programmatically.
Supports value extraction analogous to captuure groups in regexp.
A companion utility to [clixmod](https://github.com/talwrii/clixmod).
Tested with Python 2.7 and Python 3.5.
# Installing
```
pip install git+https://github.com/talwrii/clixpath#egg=clixpath
```
# Related tools (and flagrant self-promotion)
The author maintains a [list of some tools he has written](https://github.com/talwrii/tools).
# Examples / Cheat sheet
```bash
{cheat_sheet}
```
# Usage
```
{usage}
```
# Alternatives considered before writing
- **xmlstarlet** I always forget how this works and I couldn't make it like HTML with 5 minutes work
- **xmllint** Easy to use, but I could not make the output easily parseable
- **xidel** Considered, but I doubted it would produce machine readable output, and it was in pascal
- **pq** Did not produce machine readable output
- **htmlpath** Did not produce machine readable output, project seemed too small to justify taking up the authors time pushing changes
- **cli-scrape** Didn't run when I used it, gave up after 5 minutes (sorry for the FUD)
- **XSLT** useful for more general tasks, boilerplate plus learning curve
- **lxml in python** mostly the same as xslt with a slightly more shallow learning curve at the price of requiring turing completeness.
- **Converting your XML / HTML into json** you can then use JSON tools like *jq*. See, for example, [xml2json](https://github.com/hay/xml2json). I found this approach problematic when using XML with namespaces.
- **XPath 2.0** and **XQuery** support [some transformation of data](https://stackoverflow.com/questions/11372160/how-to-do-group-capture-in-xpath) that could be used for this sort of task, albeit with some boiler plate. A cursory inspection of open source tools failed to find any tools that I would describe as "do what I mean convenient command line tools". Though the reader might like to be aware of [xqilla](http://xqilla.sourceforge.net/) and [galax](http://galax.sourceforge.net/)
# Testing
- If you write tests in the `tests` directory, then I will trust you code more.
- If you run tox then it will work when I run it before merging your code...
# Caveats
This tool is in the category of convenience utilities rather than a general-purpose tool.
It makes common tasks easy and general tasks impossible.
I may not accept your patch on the grounds of something being too complicated for this tool.
The most likely place that this will come up is support for building recursive *JSON* records using the `--extract` option.
In such cases, a more powerful tool like *XSLT* or *lxml* should be used.
Do not expect any code that depends on parsing *non*-JSON output to not getting broken.