Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added pprof into the node API to debug memory leaks #67

Merged
merged 5 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# All the files for now
* @sparshev
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,13 +189,29 @@ The election process:

Simplify the cluster management, for example adding labels or check the status [#8](https://github.com/adobe/aquarium-fish/issues/8).

## Integration tests
## Development

Is relatively easy - you change logic, you run `./build.sh` to create a binary, testing it and send
the PR when you think it's perfect enough. That will be great if you can ask in the discussions or
create an issue on GitHub to align with the current direction and the plans.

### Integration tests

To verify that everything works as expected you can run integration tests like that:
```sh
$ FISH_PATH=$PWD/aquarium-fish.darwin_amd64 go test -v -failfast -parallel 4 ./tests/...
```

### Profiling

Is available through pprof like that:
```
$ go tool pprof 'https+insecure://<USER>:<TOKEN>@localhost:8001/api/v1/node/this/profiling/heap'
$ curl -ku "<USER>:<TOKEN>" 'https://localhost:8001/api/v1/node/this/profiling/?debug=1'
```

Or you can open https://localhost:8001/api/v1/node/this/profiling/ in browser to see the index.

## API

There is a number of ways to communicate with the Fish cluster, and the most important one is API.
Expand Down
48 changes: 48 additions & 0 deletions docs/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -681,6 +681,54 @@ paths:
security:
- basic_auth: []

# This /profiling/ endpoint is separate from the /profiling/{handler} because `required: false`
# did not behaved as expected. Since it is not, /profiling/ will route to a separate method that
# just calls the /profiling/{handler} endpoint with the empty string
/api/v1/node/this/profiling/:
get:
summary: Shows pprof index page
description:
Shows debug information about heap, goroutines, symbols etc. Very helpful in figuring out
the memory issues and to understand the internals of the Fish Node execution.
operationId: NodeThisProfilingIndexGet
tags:
- Node
responses:
'200':
description: Successful operation
'400':
description: Bad parameter or conditions
'401':
$ref: '#/components/responses/UnauthorizedError'
security:
- basic_auth: []

/api/v1/node/this/profiling/{handler}:
get:
summary: Gives profiling data from pprof
description:
Shows debug information about heap, goroutines, symbols etc. Very helpful in figuring out
the memory issues and to understand the internals of the Fish Node execution.
operationId: NodeThisProfilingGet
tags:
- Node
parameters:
- name: handler
in: path
description: Which pprof handler to use. If empty - will show index
required: false
schema:
type: string
responses:
'200':
description: Successful operation
'400':
description: Bad parameter or conditions
'401':
$ref: '#/components/responses/UnauthorizedError'
security:
- basic_auth: []

/api/v1/location/:
get:
summary: Get list of locations
Expand Down
37 changes: 37 additions & 0 deletions lib/openapi/api/api_v1.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ package api
import (
"fmt"
"net/http"
"net/http/pprof"
"time"

"github.com/google/uuid"
Expand Down Expand Up @@ -481,6 +482,42 @@ func (e *Processor) NodeThisMaintenanceGet(c echo.Context, params types.NodeThis
return c.JSON(http.StatusOK, params)
}

func (e *Processor) NodeThisProfilingIndexGet(c echo.Context) error {
return e.NodeThisProfilingGet(c, "")
}

func (e *Processor) NodeThisProfilingGet(c echo.Context, handler string) error {
user := c.Get("user")
if user.(*types.User).Name != "admin" {
message := "Only 'admin' can see profiling info"
c.JSON(http.StatusBadRequest, H{"message": message})
return fmt.Errorf(message)
}

switch handler {
case "":
// Show index if no handler name provided
pprof.Index(c.Response().Writer, c.Request())
case "allocs", "block", "goroutine", "heap", "mutex", "threadcreate":
// PProf usual handlers
pprof.Handler(handler).ServeHTTP(c.Response(), c.Request())
case "cmdline":
pprof.Cmdline(c.Response(), c.Request())
case "profile":
pprof.Profile(c.Response(), c.Request())
case "symbol":
pprof.Symbol(c.Response(), c.Request())
case "trace":
pprof.Trace(c.Response(), c.Request())
default:
message := "Unable to find requested profiling handler"
c.JSON(http.StatusNotFound, H{"message": message})
return fmt.Errorf(message)
}

return nil
}

func (e *Processor) VoteListGet(c echo.Context, params types.VoteListGetParams) error {
user := c.Get("user")
if user.(*types.User).Name != "admin" {
Expand Down
Loading