The Go Cascadia package implements CSS selectors for html. This is the command line tool, started as a thin wrapper around that package, but growing into a better tool to test CSS selectors without writing Go code:
cascadia wrapper
Version 1.3.0 built on 2023-06-30
Copyright (C) 2016-2023, Tong Sun
Command line interface to go cascadia CSS selectors package
Usage:
cascadia -i in -c css -o [Options...]
Options:
-h, --help display help information
-i, --in *The html/xml file to read from (or stdin)
-o, --out *The output file (or stdout)
-c, --css *CSS selectors (can provide more if not using --piece)
-t, --text Text output for none-block selection mode
-R, --Raw Raw text output, no trimming of leading and trailing white space
-p, --piece sub CSS selectors within -css to split that block up into pieces
format: PieceName=[PieceStyle:]selector_string
PieceStyle:
RAW : will return the selected as-is
ATTR : will return the value of attribute selector_string
Else the text will be returned
-d, --delimiter delimiter for pieces csv output [= ]
-w, --wrap-html wrap up the output with html tags
-y, --style style component within the wrapped html head
-b, --base base href tag used in the wrapped up html
-q, --quiet be quiet
Its output has two modes, none-block selection mode and block selection mode, depending on whether the --piece
parameter is given on the command line or not.
For details about the concept of block and pieces, check out andrew-d/goscrape (in fact, cascadia
was initially developed just for it, so that I don't need to tweak Go code, build & run it just to test out the block and pieces selectors). Here is the exception:
- Inside each page, there's 1 or more blocks - some logical method of splitting up a page into subcomponents.
- Inside each block, you define some number of pieces of data that you wish to extract. Each piece consists of a name, a selector, and what data to extract from the current block.
This all sounds rather complicated, but in practice it's quite simple. See the next section for details.
In summary,
- The none-block selection mode will output the selection as HTML source by default
- but if
-t
, or--text
cli option is provided, the none-block selection mode will output as text instead.- By default, such text output will get their leading and trailing white space trimmed.
- However, if
-R
, or--Raw
cli option is provided, no trimming will be done.
- but if
- The block selection mode will output HTML as text in a
tsv
/csv
table form by default- if the
--piece
selection is prefixed withRAW:
, then that specific block selection will output in HTML instead. See the following for details.
- if the
All the three -i -o -c
options are required. By default it reads from stdin
and output to stdout
:
$ echo '<input type="radio" name="Sex" value="F" />' | tee /tmp/cascadia.xml | cascadia -i -o -c 'input[name=Sex][value=F]'
1 elements for 'input[name=Sex][value=F]':
<input type="radio" name="Sex" value="F"/>
Either the input or the output can be followed by a file name:
$ cascadia -i /tmp/cascadia.xml -o -c 'input[name=Sex][value=F]'
1 elements for 'input[name=Sex][value=F]':
<input type="radio" name="Sex" value="F"/>
$ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html
1 elements for 'input[name=Sex][value=F]':
$ cat /tmp/out.html
<input type="radio" name="Sex" value="F"/>
More other options can be applied too:
# using --wrap-html
$ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html -w
1 elements for 'input[name=Sex][value=F]':
$ cat /tmp/out.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<base href="">
</head>
<body>
<input type="radio" name="Sex" value="F"/>
</body>
# using --wrap-html with --style
$ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html -w -y '<link rel="stylesheet" href="styles.css">'
1 elements for 'input[name=Sex][value=F]':
$ cat /tmp/out.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<base href="">
<link rel="stylesheet" href="styles.css">
</head>
<body>
<input type="radio" name="Sex" value="F"/>
</body>
-
For more on using the
--style
option, check out "adding styles". -
For more examples, check out the wiki, which includes but not limits to,
sudo apt install -y cascadia
- The latest binary executables are available as the result of the Continuous-Integration (CI) process.
- I.e., they are built automatically right from the source code at every git release by GitHub Actions.
- There are two ways to get/install such binary executables
- Using the binary executables directly, or
- Using packages for your distro
- The latest binary executables are directly available under
https://github.com/suntong/cascadia/releases/latest - Pick & choose the one that suits your OS and its architecture. E.g., for Linux, it would be the
cascadia_verxx_linux_amd64.tar.gz
file. - Available OS for binary executables are
- Linux
- Mac OS (darwin)
- Windows
- If your OS and its architecture is not available in the download list, please let me know and I'll add it.
- The manual installation is just to unpack it and move/copy the binary executable to somewhere in
PATH
. For example,
tar -xvf cascadia_*_linux_amd64.tar.gz
sudo mv -v cascadia_*_linux_amd64/cascadia /usr/local/bin/
rmdir -v cascadia_*_linux_amd64
The repo setup instruction url has been given above. For example, for Debian --
curl -1sLf \
'https://dl.cloudsmith.io/public/suntong/repo/setup.deb.sh' \
| sudo -E bash
# That's it. You then can do your normal operations, like
sudo apt update
apt-cache policy cascadia
sudo apt install -y cascadia
To install the source code instead:
go install github.com/suntong/cascadia@latest
Powered by WireFrame
the one-stop wire-framing solution for Go cli based projects, from init to deploy.
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!