creating bash jupyter notebook guide

tednaleid · Jul 29, 2024 · 9fbafe7 · 9fbafe7
1 parent 80d3f4e
commit 9fbafe7
Show file tree

Hide file tree

Showing 14 changed files with 1,381 additions and 75 deletions.
diff --git a/.gitignore b/.gitignore
@@ -10,3 +10,5 @@ dist
 ganda-amd64
 
 dist/
+node_modules
+package*.json
diff --git a/README.md b/README.md
@@ -2,29 +2,59 @@
 
 Ganda lets you make HTTP/HTTPS requests to hundreds to millions of URLs in just a few minutes.
 
-It's designed with the unix philosophy of ["do one thing well"](https://en.wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_Do_It_Well) and wants to be used in a chain of command line pipes to make its requests in parallel. 
+It's designed with the Unix philosophy of ["do one thing well"](https://en.wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_Do_It_Well) and wants to be used in a chain of command line pipes to make its requests in parallel. 
 
 By default, it will echo all response bodies to standard out but can optionally save the results of each request in a directory for later analysis.
 
+### Documentation Links
+
+* [Installation](#installation)
+* [User Guide](docs/GUIDE.ipynb)
+
+# Quick Examples
+
 Given a file with a list of IDs in it, you could do something like:
 
-    cat id_list.txt | awk '{printf "https://api.example.com/resource/%s?apikey=foo\n", $1}' | ganda
+```
+cat id_list.txt | awk '{printf "https://api.example.com/resource/%s?apikey=foo\n", $1}' | ganda
+```
 
-and that will pipe a stream of urls into `ganda` in the format `https://api.example.com/resource/<ID>?apikey=foo`.
+and that will pipe a stream of URLs into `ganda` in the format `https://api.example.com/resource/<ID>?apikey=foo`.
 
-Alternatively, if you have a file full of urls (one per line), you can just tell `ganda` to run that:
+Alternatively, if you have a file full of URLs (one per line), you can just tell `ganda` to run that:
 
-    ganda my_file_of_urls.txt
+```
+ganda my_file_of_urls.txt
+```
 
 If you give `ganda` a `-o <directory name>` parameter, it will save the body of each in a separate file inside `<directory name>`.  If you want a single file, just pipe stdout the normal way `... | ganda > result.txt`.
 
-For many more examples, see ["Using HTTP APIs on the Command Line - Part 3 - ganda"](http://www.naleid.com/2018/04/04/using-http-apis-on-the-command-line-3-ganda.html).
+For many more examples, take a look at the [User Guide](docs/GUIDE.ipynb).
+
+# Why use `ganda` over `curl` (or `wget`, `httpie`, `postman-cli`, ...)?
 
-# Installing
+All existing CLI tools for making HTTP requests are oriented around making a single request at a time.  They're great
+at starting a pipe of commands (ex: `curl <url> | jq .`) but they're awkward to use beyond a few reqeusts.
+
+The easiest way to use them is in a bash `for` loop or with something like `xargs`.  This is slow and expensive as they open up a new HTTP connection on every request.  
+
+`ganda` makes many requests in parallel and can maintain context between the request and response.  It's designed to
+be used in a pipeline of commands and can be used to make hundreds of thousands of requests in just a few minutes. 
+
+`ganda` will reuse HTTP connections and can specify how many "worker" threads should be used to tightly control parallelism. 
+
+The closest CLIs I've found to `ganda` are load-testing tools like `vegeta`.  They're able to make many requests in
+parallel, but they're not designed to only call each URL once, don't maintain context between the request and response,
+and don't have the same flexibility in how the response is handled.
+
+`ganda` isn't for load testing, it's for making lots of requests in parallel and processing the results in a pipeline.
+
+
+# Installation
 
 You currently have 3 options:
 
-1. on MacOS you can install with [homebrew](https://brew.sh/)
+1. on MacOS you can install using [homebrew](https://brew.sh/)
 ```
 brew tap tednaleid/homebrew-ganda
 brew install ganda
@@ -38,42 +68,54 @@ brew install ganda
 go install github.com/tednaleid/ganda@latest
 ```
 
-to install in your `$GOPATH/bin` (which you want in your `$PATH`)
+or, if you have this repo downloaded locally:
 
-# Usage
-
-
--- TODO: update
-
-       
-# Example
-
-This command takes the first 1000 words from the macOS dictionary file, then turns each of them into a [Wikipedia API](https://www.mediawiki.org/wiki/API:Main_page) url.
-
-Those urls are then piped into `ganda` and saved in a directory called `out` in the current directory.
-
-
-    head -1000 /usr/share/dict/words |\
-    awk '{printf "https://en.wikipedia.org/w/api.php?action=query&titles=%s&prop=revisions&rvprop=content&format=json\n", $1}' |\
-    ganda -o out --subdir-length 2
-
-Output (shows the HTTP status code of 200 OK for each along with the resulting output file that each was saved at):
-
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aam&prop=revisions&rvprop=content&format=json -> out/95/https-en-wikipedia-org-w-api-php-action-query-titles-aam-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=A&prop=revisions&rvprop=content&format=json -> out/71/https-en-wikipedia-org-w-api-php-action-query-titles-A-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aal&prop=revisions&rvprop=content&format=json -> out/99/https-en-wikipedia-org-w-api-php-action-query-titles-aal-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=a&prop=revisions&rvprop=content&format=json -> out/69/https-en-wikipedia-org-w-api-php-action-query-titles-a-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aardwolf&prop=revisions&rvprop=content&format=json -> out/31/https-en-wikipedia-org-w-api-php-action-query-titles-aardwolf-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aalii&prop=revisions&rvprop=content&format=json -> out/91/https-en-wikipedia-org-w-api-php-action-query-titles-aalii-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aa&prop=revisions&rvprop=content&format=json -> out/ae/https-en-wikipedia-org-w-api-php-action-query-titles-aa-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=Aani&prop=revisions&rvprop=content&format=json -> out/7f/https-en-wikipedia-org-w-api-php-action-query-titles-Aani-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=Aaron&prop=revisions&rvprop=content&format=json -> out/db/https-en-wikipedia-org-w-api-php-action-query-titles-Aaron-prop-revisions-rvprop-content-format-json
-    Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aardvark&prop=revisions&rvprop=content&format=json -> out/c4/https-en-wikipedia-org-w-api-php-action-query-titles-aardvark-prop-revisions-rvprop-content-format-json
-    ... 990 more lines
-
-As `ganda` is designed to make many thousands of requests, you can use the `--subdir-length` to avoid making your filesystem unhappy with 1M files in a single directory.  That switch will hash each url and place the response in a subdirectory (similar to how git stores its objects).
+```
+make install
+```
 
-example run:
+to install in your `$GOPATH/bin` (which you want in your `$PATH`)
 
-![ganda example run against wikipedia API](https://cdn.rawgit.com/tednaleid/ganda/gh-pages/images/ganda-example.gif)
+# Usage
 
+```
+ganda help
+
+NAME:
+ganda - make http requests in parallel
+
+USAGE:
+<urls/requests on stdout> | ganda [options]
+
+VERSION:
+1.0.0
+
+DESCRIPTION:
+Pipe urls to ganda over stdout for it to make http requests to each url in parallel.
+
+AUTHOR:
+Ted Naleid <contact@naleid.com>
+
+COMMANDS:
+echoserver  Starts an echo server, --port <port> to override the default port of 8080
+help, h     Shows a list of commands or help for one command
+
+GLOBAL OPTIONS:
+--base-retry-millis value                              the base number of milliseconds to wait before retrying a request, exponential backoff is used for retries (default: 1000)
+--response-body value, -B value                        transforms the body of the response. Values: 'raw' (unchanged), 'base64', 'discard' (don't emit body), 'escaped' (JSON escaped string), 'sha256' (default: raw)
+--connect-timeout-millis value                         number of milliseconds to wait for a connection to be established before timeout (default: 10000)
+--header value, -H value [ --header value, -H value ]  headers to send with every request, can be used multiple times (gzip and keep-alive are already there)
+--insecure, -k                                         if flag is present, skip verification of https certificates (default: false)
+--json-envelope, -J                                    emit result with JSON envelope with url, status, length, and body fields, assumes result is valid json (default: false)
+--color                                                if flag is present, add color to success/warn messages (default: false)
+--output value, -o value                               if flag is present, save response bodies to files in the specified directory
+--request value, -X value                              HTTP request method to use (default: "GET")
+--response-workers value                               number of concurrent workers that will be processing responses, if not specified will be same as --workers (default: 0)
+--retry value                                          max number of retries on transient errors (5XX status codes/timeouts) to attempt (default: 0)
+--silent, -s                                           if flag is present, omit showing response code for each url only output response bodies (default: false)
+--subdir-length value, -S value                        length of hashed subdirectory name to put saved files when using -o; use 2 for > 5k urls, 4 for > 5M urls (default: 0)
+--throttle value, -t value                             max number of requests to process per second, default is unlimited (default: -1)
+--workers value, -W value                              number of concurrent workers that will be making requests, increase this for more requests in parallel (default: 1)
+--help, -h                                             show help (default: false)
+--version, -v                                          print the version (default: false)
+```
diff --git a/cli/cli.go b/cli/cli.go
@@ -11,9 +11,11 @@ import (
 	"github.com/tednaleid/ganda/responses"
 	"github.com/urfave/cli/v3"
 	"io"
+	"math"
 	"os"
 	"os/signal"
 	"syscall"
+	"time"
 )
 
 type BuildInfo struct {
@@ -49,7 +51,7 @@ func SetupCommand(
 		ErrWriter:   stderr,
 		Flags: []cli.Flag{
 			&cli.IntFlag{
-				Name:        "base-retry-ms",
+				Name:        "base-retry-millis",
 				Usage:       "the base number of milliseconds to wait before retrying a request, exponential backoff is used for retries",
 				Value:       conf.BaseRetryDelayMillis,
 				Destination: &conf.BaseRetryDelayMillis,
@@ -80,7 +82,7 @@ func SetupCommand(
 				},
 			},
 			&cli.IntFlag{
-				Name:        "connect-timeout-ms",
+				Name:        "connect-timeout-millis",
 				Usage:       "number of milliseconds to wait for a connection to be established before timeout",
 				Value:       conf.ConnectTimeoutMillis,
 				Destination: &conf.ConnectTimeoutMillis,
@@ -111,6 +113,7 @@ func SetupCommand(
 			&cli.StringFlag{
 				Name:        "output",
 				Aliases:     []string{"o"},
+				Usage:       "if flag is present, save response bodies to files in the specified directory",
 				Destination: &conf.BaseDirectory,
 			},
 			&cli.StringFlag{
@@ -169,10 +172,16 @@ func SetupCommand(
 						Usage: "Port number to start the echo server on",
 						Value: 8080, // Default port number
 					},
+					&cli.IntFlag{
+						Name:  "delay-millis",
+						Usage: "Number of milliseconds to delay responding",
+						Value: 0, // Default delay is 0 milliseconds
+					},
 				},
 				Action: func(ctx ctx.Context, cmd *cli.Command) error {
 					port := cmd.Int("port")
-					shutdown, err := echoserver.Echoserver(port, io.Writer(os.Stdout))
+					delayMillis := cmd.Int("delay-millis")
+					shutdown, err := echoserver.Echoserver(port, delayMillis, io.Writer(os.Stdout))
 					if err != nil {
 						fmt.Println("Error starting server:", err)
 						os.Exit(1)
@@ -231,7 +240,15 @@ func ProcessRequests(context *execcontext.Context) {
 	requestsWithContextChannel := make(chan parser.RequestWithContext)
 	responsesWithContextChannel := make(chan *responses.ResponseWithContext)
 
-	requestWaitGroup := requests.StartRequestWorkers(requestsWithContextChannel, responsesWithContextChannel, context)
+	var rateLimitTicker *time.Ticker
+
+	// don't throttle if we're not limiting the number of requests per second
+	if context.ThrottlePerSecond != math.MaxInt32 {
+		rateLimitTicker = time.NewTicker(time.Second / time.Duration(context.ThrottlePerSecond))
+		defer rateLimitTicker.Stop()
+	}
+
+	requestWaitGroup := requests.StartRequestWorkers(requestsWithContextChannel, responsesWithContextChannel, rateLimitTicker, context)
 	responseWaitGroup := responses.StartResponseWorkers(responsesWithContextChannel, context)
 
 	err := parser.SendRequests(requestsWithContextChannel, context.In, context.RequestMethod, context.RequestHeaders)

diff --git a/cli/cli_test.go b/cli/cli_test.go
@@ -36,7 +36,7 @@ func TestTimeout(t *testing.T) {
 	}))
 	defer server.Server.Close()
 
-	runResults, _ := RunGanda([]string{"ganda", "--connect-timeout-ms", "1"}, server.stubStdinUrl("bar"))
+	runResults, _ := RunGanda([]string{"ganda", "--connect-timeout-millis", "1"}, server.stubStdinUrl("bar"))
 
 	url := server.urlFor("bar")
 
@@ -60,7 +60,7 @@ func TestRetryEnabledShouldRetry5XX(t *testing.T) {
 	}))
 	defer server.Server.Close()
 
-	runResults, _ := RunGanda([]string{"ganda", "--retry", "1", "--base-retry-ms", "1"}, server.stubStdinUrl("bar"))
+	runResults, _ := RunGanda([]string{"ganda", "--retry", "1", "--base-retry-millis", "1"}, server.stubStdinUrl("bar"))
 
 	url := server.urlFor("bar")
 
@@ -81,7 +81,7 @@ func TestRunningOutOfRetriesShouldStopProcessing(t *testing.T) {
 	}))
 	defer server.Server.Close()
 
-	runResults, _ := RunGanda([]string{"ganda", "--retry", "2", "--base-retry-ms", "1"}, server.stubStdinUrl("bar"))
+	runResults, _ := RunGanda([]string{"ganda", "--retry", "2", "--base-retry-millis", "1"}, server.stubStdinUrl("bar"))
 
 	url := server.urlFor("bar")
 
@@ -102,7 +102,7 @@ func TestRetryEnabledShouldNotRetry4XX(t *testing.T) {
 	}))
 	defer server.Server.Close()
 
-	runResults, _ := RunGanda([]string{"ganda", "--retry", "1", "--base-retry-ms", "1"}, server.stubStdinUrl("bar"))
+	runResults, _ := RunGanda([]string{"ganda", "--retry", "1", "--base-retry-millis", "1"}, server.stubStdinUrl("bar"))
 
 	url := server.urlFor("bar")
 
@@ -125,7 +125,7 @@ func TestRetryEnabledShouldRetryTimeout(t *testing.T) {
 	}))
 	defer server.Server.Close()
 
-	runResults, _ := RunGanda([]string{"ganda", "--connect-timeout-ms", "10", "--retry", "1", "--base-retry-ms", "1"}, server.stubStdinUrl("bar"))
+	runResults, _ := RunGanda([]string{"ganda", "--connect-timeout-millis", "10", "--retry", "1", "--base-retry-millis", "1"}, server.stubStdinUrl("bar"))
 	url := server.urlFor("bar")
 
 	//assert.Equal(t, 2, requestCount, "expected a second request")

diff --git a/docs/.gitignore b/docs/.gitignore
@@ -0,0 +1 @@
+scratch
diff --git a/docs/.python-version b/docs/.python-version
@@ -0,0 +1 @@
+gandadocs-3.12.2