Skip to content

Commit

Permalink
creating bash jupyter notebook guide
Browse files Browse the repository at this point in the history
  • Loading branch information
tednaleid committed Jul 29, 2024
1 parent 80d3f4e commit 9fbafe7
Show file tree
Hide file tree
Showing 14 changed files with 1,381 additions and 75 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,5 @@ dist
ganda-amd64

dist/
node_modules
package*.json
128 changes: 85 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,59 @@

Ganda lets you make HTTP/HTTPS requests to hundreds to millions of URLs in just a few minutes.

It's designed with the unix philosophy of ["do one thing well"](https://en.wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_Do_It_Well) and wants to be used in a chain of command line pipes to make its requests in parallel.
It's designed with the Unix philosophy of ["do one thing well"](https://en.wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_Do_It_Well) and wants to be used in a chain of command line pipes to make its requests in parallel.

By default, it will echo all response bodies to standard out but can optionally save the results of each request in a directory for later analysis.

### Documentation Links

* [Installation](#installation)
* [User Guide](docs/GUIDE.ipynb)

# Quick Examples

Given a file with a list of IDs in it, you could do something like:

cat id_list.txt | awk '{printf "https://api.example.com/resource/%s?apikey=foo\n", $1}' | ganda
```
cat id_list.txt | awk '{printf "https://api.example.com/resource/%s?apikey=foo\n", $1}' | ganda
```

and that will pipe a stream of urls into `ganda` in the format `https://api.example.com/resource/<ID>?apikey=foo`.
and that will pipe a stream of URLs into `ganda` in the format `https://api.example.com/resource/<ID>?apikey=foo`.

Alternatively, if you have a file full of urls (one per line), you can just tell `ganda` to run that:
Alternatively, if you have a file full of URLs (one per line), you can just tell `ganda` to run that:

ganda my_file_of_urls.txt
```
ganda my_file_of_urls.txt
```

If you give `ganda` a `-o <directory name>` parameter, it will save the body of each in a separate file inside `<directory name>`. If you want a single file, just pipe stdout the normal way `... | ganda > result.txt`.

For many more examples, see ["Using HTTP APIs on the Command Line - Part 3 - ganda"](http://www.naleid.com/2018/04/04/using-http-apis-on-the-command-line-3-ganda.html).
For many more examples, take a look at the [User Guide](docs/GUIDE.ipynb).

# Why use `ganda` over `curl` (or `wget`, `httpie`, `postman-cli`, ...)?

# Installing
All existing CLI tools for making HTTP requests are oriented around making a single request at a time. They're great
at starting a pipe of commands (ex: `curl <url> | jq .`) but they're awkward to use beyond a few reqeusts.

The easiest way to use them is in a bash `for` loop or with something like `xargs`. This is slow and expensive as they open up a new HTTP connection on every request.

`ganda` makes many requests in parallel and can maintain context between the request and response. It's designed to
be used in a pipeline of commands and can be used to make hundreds of thousands of requests in just a few minutes.

`ganda` will reuse HTTP connections and can specify how many "worker" threads should be used to tightly control parallelism.

The closest CLIs I've found to `ganda` are load-testing tools like `vegeta`. They're able to make many requests in
parallel, but they're not designed to only call each URL once, don't maintain context between the request and response,
and don't have the same flexibility in how the response is handled.

`ganda` isn't for load testing, it's for making lots of requests in parallel and processing the results in a pipeline.


# Installation

You currently have 3 options:

1. on MacOS you can install with [homebrew](https://brew.sh/)
1. on MacOS you can install using [homebrew](https://brew.sh/)
```
brew tap tednaleid/homebrew-ganda
brew install ganda
Expand All @@ -38,42 +68,54 @@ brew install ganda
go install github.com/tednaleid/ganda@latest
```

to install in your `$GOPATH/bin` (which you want in your `$PATH`)
or, if you have this repo downloaded locally:

# Usage


-- TODO: update

# Example

This command takes the first 1000 words from the macOS dictionary file, then turns each of them into a [Wikipedia API](https://www.mediawiki.org/wiki/API:Main_page) url.

Those urls are then piped into `ganda` and saved in a directory called `out` in the current directory.


head -1000 /usr/share/dict/words |\
awk '{printf "https://en.wikipedia.org/w/api.php?action=query&titles=%s&prop=revisions&rvprop=content&format=json\n", $1}' |\
ganda -o out --subdir-length 2

Output (shows the HTTP status code of 200 OK for each along with the resulting output file that each was saved at):

Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=aam&prop=revisions&rvprop=content&format=json -> out/95/https-en-wikipedia-org-w-api-php-action-query-titles-aam-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=A&prop=revisions&rvprop=content&format=json -> out/71/https-en-wikipedia-org-w-api-php-action-query-titles-A-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=aal&prop=revisions&rvprop=content&format=json -> out/99/https-en-wikipedia-org-w-api-php-action-query-titles-aal-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=a&prop=revisions&rvprop=content&format=json -> out/69/https-en-wikipedia-org-w-api-php-action-query-titles-a-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=aardwolf&prop=revisions&rvprop=content&format=json -> out/31/https-en-wikipedia-org-w-api-php-action-query-titles-aardwolf-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=aalii&prop=revisions&rvprop=content&format=json -> out/91/https-en-wikipedia-org-w-api-php-action-query-titles-aalii-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=aa&prop=revisions&rvprop=content&format=json -> out/ae/https-en-wikipedia-org-w-api-php-action-query-titles-aa-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=Aani&prop=revisions&rvprop=content&format=json -> out/7f/https-en-wikipedia-org-w-api-php-action-query-titles-Aani-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=Aaron&prop=revisions&rvprop=content&format=json -> out/db/https-en-wikipedia-org-w-api-php-action-query-titles-Aaron-prop-revisions-rvprop-content-format-json
Response: 200 https://en.wikipedia.org/w/api.php?action=query&titles=aardvark&prop=revisions&rvprop=content&format=json -> out/c4/https-en-wikipedia-org-w-api-php-action-query-titles-aardvark-prop-revisions-rvprop-content-format-json
... 990 more lines

As `ganda` is designed to make many thousands of requests, you can use the `--subdir-length` to avoid making your filesystem unhappy with 1M files in a single directory. That switch will hash each url and place the response in a subdirectory (similar to how git stores its objects).
```
make install
```

example run:
to install in your `$GOPATH/bin` (which you want in your `$PATH`)

![ganda example run against wikipedia API](https://cdn.rawgit.com/tednaleid/ganda/gh-pages/images/ganda-example.gif)
# Usage

```
ganda help
NAME:
ganda - make http requests in parallel
USAGE:
<urls/requests on stdout> | ganda [options]
VERSION:
1.0.0
DESCRIPTION:
Pipe urls to ganda over stdout for it to make http requests to each url in parallel.
AUTHOR:
Ted Naleid <contact@naleid.com>
COMMANDS:
echoserver Starts an echo server, --port <port> to override the default port of 8080
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--base-retry-millis value the base number of milliseconds to wait before retrying a request, exponential backoff is used for retries (default: 1000)
--response-body value, -B value transforms the body of the response. Values: 'raw' (unchanged), 'base64', 'discard' (don't emit body), 'escaped' (JSON escaped string), 'sha256' (default: raw)
--connect-timeout-millis value number of milliseconds to wait for a connection to be established before timeout (default: 10000)
--header value, -H value [ --header value, -H value ] headers to send with every request, can be used multiple times (gzip and keep-alive are already there)
--insecure, -k if flag is present, skip verification of https certificates (default: false)
--json-envelope, -J emit result with JSON envelope with url, status, length, and body fields, assumes result is valid json (default: false)
--color if flag is present, add color to success/warn messages (default: false)
--output value, -o value if flag is present, save response bodies to files in the specified directory
--request value, -X value HTTP request method to use (default: "GET")
--response-workers value number of concurrent workers that will be processing responses, if not specified will be same as --workers (default: 0)
--retry value max number of retries on transient errors (5XX status codes/timeouts) to attempt (default: 0)
--silent, -s if flag is present, omit showing response code for each url only output response bodies (default: false)
--subdir-length value, -S value length of hashed subdirectory name to put saved files when using -o; use 2 for > 5k urls, 4 for > 5M urls (default: 0)
--throttle value, -t value max number of requests to process per second, default is unlimited (default: -1)
--workers value, -W value number of concurrent workers that will be making requests, increase this for more requests in parallel (default: 1)
--help, -h show help (default: false)
--version, -v print the version (default: false)
```
25 changes: 21 additions & 4 deletions cli/cli.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ import (
"github.com/tednaleid/ganda/responses"
"github.com/urfave/cli/v3"
"io"
"math"
"os"
"os/signal"
"syscall"
"time"
)

type BuildInfo struct {
Expand Down Expand Up @@ -49,7 +51,7 @@ func SetupCommand(
ErrWriter: stderr,
Flags: []cli.Flag{
&cli.IntFlag{
Name: "base-retry-ms",
Name: "base-retry-millis",
Usage: "the base number of milliseconds to wait before retrying a request, exponential backoff is used for retries",
Value: conf.BaseRetryDelayMillis,
Destination: &conf.BaseRetryDelayMillis,
Expand Down Expand Up @@ -80,7 +82,7 @@ func SetupCommand(
},
},
&cli.IntFlag{
Name: "connect-timeout-ms",
Name: "connect-timeout-millis",
Usage: "number of milliseconds to wait for a connection to be established before timeout",
Value: conf.ConnectTimeoutMillis,
Destination: &conf.ConnectTimeoutMillis,
Expand Down Expand Up @@ -111,6 +113,7 @@ func SetupCommand(
&cli.StringFlag{
Name: "output",
Aliases: []string{"o"},
Usage: "if flag is present, save response bodies to files in the specified directory",
Destination: &conf.BaseDirectory,
},
&cli.StringFlag{
Expand Down Expand Up @@ -169,10 +172,16 @@ func SetupCommand(
Usage: "Port number to start the echo server on",
Value: 8080, // Default port number
},
&cli.IntFlag{
Name: "delay-millis",
Usage: "Number of milliseconds to delay responding",
Value: 0, // Default delay is 0 milliseconds
},
},
Action: func(ctx ctx.Context, cmd *cli.Command) error {
port := cmd.Int("port")
shutdown, err := echoserver.Echoserver(port, io.Writer(os.Stdout))
delayMillis := cmd.Int("delay-millis")
shutdown, err := echoserver.Echoserver(port, delayMillis, io.Writer(os.Stdout))
if err != nil {
fmt.Println("Error starting server:", err)
os.Exit(1)
Expand Down Expand Up @@ -231,7 +240,15 @@ func ProcessRequests(context *execcontext.Context) {
requestsWithContextChannel := make(chan parser.RequestWithContext)
responsesWithContextChannel := make(chan *responses.ResponseWithContext)

requestWaitGroup := requests.StartRequestWorkers(requestsWithContextChannel, responsesWithContextChannel, context)
var rateLimitTicker *time.Ticker

// don't throttle if we're not limiting the number of requests per second
if context.ThrottlePerSecond != math.MaxInt32 {
rateLimitTicker = time.NewTicker(time.Second / time.Duration(context.ThrottlePerSecond))
defer rateLimitTicker.Stop()
}

requestWaitGroup := requests.StartRequestWorkers(requestsWithContextChannel, responsesWithContextChannel, rateLimitTicker, context)
responseWaitGroup := responses.StartResponseWorkers(responsesWithContextChannel, context)

err := parser.SendRequests(requestsWithContextChannel, context.In, context.RequestMethod, context.RequestHeaders)
Expand Down
10 changes: 5 additions & 5 deletions cli/cli_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ func TestTimeout(t *testing.T) {
}))
defer server.Server.Close()

runResults, _ := RunGanda([]string{"ganda", "--connect-timeout-ms", "1"}, server.stubStdinUrl("bar"))
runResults, _ := RunGanda([]string{"ganda", "--connect-timeout-millis", "1"}, server.stubStdinUrl("bar"))

url := server.urlFor("bar")

Expand All @@ -60,7 +60,7 @@ func TestRetryEnabledShouldRetry5XX(t *testing.T) {
}))
defer server.Server.Close()

runResults, _ := RunGanda([]string{"ganda", "--retry", "1", "--base-retry-ms", "1"}, server.stubStdinUrl("bar"))
runResults, _ := RunGanda([]string{"ganda", "--retry", "1", "--base-retry-millis", "1"}, server.stubStdinUrl("bar"))

url := server.urlFor("bar")

Expand All @@ -81,7 +81,7 @@ func TestRunningOutOfRetriesShouldStopProcessing(t *testing.T) {
}))
defer server.Server.Close()

runResults, _ := RunGanda([]string{"ganda", "--retry", "2", "--base-retry-ms", "1"}, server.stubStdinUrl("bar"))
runResults, _ := RunGanda([]string{"ganda", "--retry", "2", "--base-retry-millis", "1"}, server.stubStdinUrl("bar"))

url := server.urlFor("bar")

Expand All @@ -102,7 +102,7 @@ func TestRetryEnabledShouldNotRetry4XX(t *testing.T) {
}))
defer server.Server.Close()

runResults, _ := RunGanda([]string{"ganda", "--retry", "1", "--base-retry-ms", "1"}, server.stubStdinUrl("bar"))
runResults, _ := RunGanda([]string{"ganda", "--retry", "1", "--base-retry-millis", "1"}, server.stubStdinUrl("bar"))

url := server.urlFor("bar")

Expand All @@ -125,7 +125,7 @@ func TestRetryEnabledShouldRetryTimeout(t *testing.T) {
}))
defer server.Server.Close()

runResults, _ := RunGanda([]string{"ganda", "--connect-timeout-ms", "10", "--retry", "1", "--base-retry-ms", "1"}, server.stubStdinUrl("bar"))
runResults, _ := RunGanda([]string{"ganda", "--connect-timeout-millis", "10", "--retry", "1", "--base-retry-millis", "1"}, server.stubStdinUrl("bar"))
url := server.urlFor("bar")

//assert.Equal(t, 2, requestCount, "expected a second request")
Expand Down
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
scratch
1 change: 1 addition & 0 deletions docs/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
gandadocs-3.12.2
Loading

0 comments on commit 9fbafe7

Please sign in to comment.