Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

NO MERGE docs(ebpf): update docs to use ebpf profiler from grafana agent #864

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 171 additions & 41 deletions docs/sources/configure-client/language-sdks/ebpf.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,76 +30,206 @@ For the eBPF integration to work you'll need:
### Step 1: Add the helm repo

```shell
helm repo add pyroscope-io https://pyroscope-io.github.io/helm-chart
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
```

### Step 2: Install pyroscope agent

```yaml
agent:
mode: 'flow'
configMap:
create: true
content: |
discovery.kubernetes "local_pods" {
selectors {
field = "spec.nodeName=" + env("HOSTNAME")
role = "pod"
}
role = "pod"
}
pyroscope.ebpf "instance" {
forward_to = [pyroscope.write.endpoint.receiver]
targets = discovery.kubernetes.local_pods.targets
}
pyroscope.write "endpoint" {
endpoint {
basic_auth {
password = "<PASSWORD>"
username = "<USERNAME>"
}
url = "<URL>"
}
}

securityContext:
privileged: true
runAsGroup: 0
runAsUser: 0
```
Replace the `<URL>` placeholder with the appropriate server URL. This could be the Grafana Cloud URL or your own custom Phlare server URL.

If you need to send data to Grafana Cloud, you'll have to configure HTTP Basic authentication. Replace `<User>` with your Grafana Cloud stack user and `<Password>` with your Grafana Cloud API key.

```shell
helm install pyroscope-ebpf pyroscope-io/pyroscope-ebpf
helm install pyroscope-ebpf grafana/grafana-agent -f values.yaml
```

It will install pyroscope eBPF agent on all of your nodes and start profiling applications across your cluster.

## Running eBPF profiler from binary
```shell
export PYROSCOPE_APPLICATION_NAME=my.ebpf.program
export PYROSCOPE_SERVER_ADDRESS=http://address-of-pyroscope-server:4040/
export PYROSCOPE_SPY_NAME=ebpfspy
# optionally, if authentication is enabled, specify the API key:
# export PYROSCOPE_AUTH_TOKEN={YOUR_API_KEY}
## Configuration

# to wrap an existing program and profile it
sudo -E pyroscope exec mongod
The component configures and starts a new ebpf profiling job to collect performance profiles from the current host.

# to profile the whole system
sudo -E pyroscope ebpf
```
You can use the following arguments to configure a `pyroscope.ebpf`. Only the
`forward_to` and `targets` fields are required. Omitted fields take their default
values.

## Dealing with `[unknowns]`
| Name | Type | Description | Default | Required |
|---------------------------|--------------------------|--------------------------------------------------------------|---------|----------|
| `targets` | `list(map(string))` | List of targets to group profiles by container id | | yes |
| `forward_to` | `list(ProfilesReceiver)` | List of receivers to send collected profiles to. | | yes |
| `collect_interval` | `duration` | How frequently to collect profiles | `15s` | no |
| `sample_rate` | `int` | How many times per second to collect profile samples | 97 | no |
| `pid_cache_size` | `int` | The size of the pid -> proc symbols table LRU cache | 32 | no |
| `build_id_cache_size` | `int` | The size of the elf file build id -> symbols table LRU cache | 64 | no |
| `same_file_cache_size` | `int` | The size of the elf file -> symbols table LRU cache | 8 | no |
| `container_id_cache_size` | `int` | The size of the pid -> container ID table LRU cache | 1024 | no |
| `collect_user_profile` | `bool` | A flag to enable/disable collection of userspace profiles | true | no |
| `collect_kernel_profile` | `bool` | A flag to enable/disable collection of kernelspace profiles | true | no |

eBPF relies on having debugging symbols available for each program installed in your system. If you don't have those you'll see a lot of stacktraces full of `[unknown]`s. On most systems you can get debugging symbols for most packages with `debuginfo-install` command:
## Exported fields

```shell
sudo debuginfo-install -y <pkg>
```
`pyroscope.ebpf` does not export any fields that can be referenced by other
components.

## Configuration
## Component health

`pyroscope.ebpf` is only reported as unhealthy if given an invalid
configuration.

## Debug information

* `targets` currently tracked active targets.
* `pid_cache` per process elf symbol tables and their sizes in symbols count.
* `elf_cache` per build id and per same file symbol tables and their sizes in symbols count.

## Debug metrics

* `pyroscope_fanout_latency` (histogram): Write latency for sending to direct and indirect components.
* `pyroscope_ebpf_active_targets` (gauge): Number of active targets the component tracks.
* `pyroscope_ebpf_profiling_sessions_total` (counter): Number of profiling sessions completed.
* `pyroscope_ebpf_profiling_sessions_failing_total` (counter): Number of profiling sessions failed.
* `pyroscope_ebpf_pprofs_total` (counter): Number of pprof profiles collected by the ebpf component.

## Profile collecting behavior

The `pyroscope.ebpf` component collects stack traces associated with a process running on the current host.
You can use the `sample_rate` argument to define the number of stack traces collected per second. The default is 97.

The following labels are automatically injected into the collected profiles if you have not defined them. These labels
can help you pin down a profiling target.

| Label | Description |
|--------------------|----------------------------------------------------------------------------------------------------------------------------------|
| `service_name` | Pyroscope service name. It's automatically selected from discovery meta labels if possible. Otherwise defaults to `unspecified`. |
| `__name__` | pyroscope metric name. Defaults to `process_cpu`. |
| `__container_id__` | The container ID derived from target. |

### Container ID

Each collected stack trace is then associated with a specified target from the targets list, determined by a
container ID. This association process involves checking the `__container_id__`, `__meta_docker_container_id`,
and `__meta_kubernetes_pod_container_id` labels of a target against the `/proc/{pid}/cgroup` of a process.

If a corresponding container ID is found, the stack traces are aggregated per target based on the container ID.
If a container ID is not found, the stack trace is associated with a `default_target`.

All parameters below are also supported as CLI arguments, a full list can be accessed via `pyroscope ebpf --help`. For brevity only environment variables are listed.
Any stack traces not associated with a listed target are ignored.

* `PYROSCOPE_KUBERNETES_NODE` Set to current k8s Node.nodeName for service discovery and labeling
* `PYROSCOPE_ONLY_SERVICES` Ignore processes unknown to service discovery
* `PYROSCOPE_SYMBOL_CACHE_SIZE` Max size of symbols cache (1 entry per process)
### Service name

| env var | default | description |
| -------------------------- | -------------------------------- | ---------------------------------------------- |
| `PYROSCOPE_KUBERNETES_NODE` | `""` | Used by service discovery. It's automatically set in the Helm Chart. |
| `PYROSCOPE_ONLY_SERVICES` | `false` | Ignore processes unknown to service discovery. In a Kubernetes cluster it ignores processes like `containerd, runc, kubelet` etc |
| `PYROSCOPE_SYMBOL_CACHE_SIZE` | `256` | Max size of symbols cache (1 entry per process). Change this value if you’re experiencing memory pressure or have many individual services. |
The special label `service_name` is required and must always be present. If it's not specified, it is
attempted to be inferred from multiple sources:

## Sending data to Grafana Cloud or Phlare with Pyroscope eBPF integration
- `__meta_kubernetes_pod_annotation_pyroscope_io_service_name` which is a `pyroscope.io/service_name` pod annotation.
- `__meta_kubernetes_namespace` and `__meta_kubernetes_pod_container_name`
- `__meta_docker_container_name`

Starting with [weekly-f8](https://hub.docker.com/r/grafana/phlare/tags) you can ingest pyroscope profiles directly to phlare.
If `service_name` is not specified and could not be inferred, it is set to `unspecified`.

## Troubleshooting unknown symbols

Symbols are extracted from various sources, including:

- The `.symtab` and `.dynsym` sections in the ELF file.
- The `.symtab` and `.dynsym` sections in the debug ELF file.
- The `.gopclntab` section in Go language ELF files.

The search for debug files follows [gdb algorithm](https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html).
For example, if the profiler wants to find the debug file
for `/lib/x86_64-linux-gnu/libc.so.6`
with a `.gnu_debuglink` set to `libc.so.6.debug` and a build ID `0123456789abcdef`. The following paths are examined:

- `/usr/lib/debug/.build-id/01/0123456789abcdef.debug`
- `/lib/x86_64-linux-gnu/libc.so.6.debug`
- `/lib/x86_64-linux-gnu/.debug/libc.so.6.debug`
- `/usr/lib/debug/lib/x86_64-linux-gnu/libc.so.6.debug`

### Dealing with unknown symbols

Unknown symbols in the profiles you’ve collected indicate that the profiler couldn't access an ELF file associated with a given address in the trace.

This can occur for several reasons:

- The process has terminated, making the ELF file inaccessible.
- The ELF file is either corrupted or not recognized as an ELF file.
- There is no corresponding ELF file entry in `/proc/pid/maps` for the address in the stack trace.

### Addressing unresolved symbols

If you only see module names (e.g., `/lib/x86_64-linux-gnu/libc.so.6`) without corresponding function names, this
indicates that the symbols couldn't be mapped to their respective function names.

This can occur for several reasons:

- The binary has been stripped, leaving no .symtab, .dynsym, or .gopclntab sections in the ELF file.
- The debug file is missing or could not be located.

To fix this for your binaries, ensure that they are either not stripped or that you have separate
debug files available. You can achieve this by running:

```bash
./pyroscope ebpf \
--application-name=phlare.ebpf.app \
--server-address=<URL> \
--basic-auth-user="<User>" \
--basic-auth-password="<Password>" \
--tenant-id=<TenantID> \
objcopy --only-keep-debug elf elf.debug
strip elf -o elf.stripped
objcopy --add-gnu-debuglink=elf.debug elf.stripped elf.debuglink
```

To configure eBPF integration to send data to Phlare, replace the `<URL>` placeholder with the appropriate server URL. This could be the Grafana Cloud URL or your own custom Phlare server URL.
For system libraries, ensure that debug symbols are installed. On Ubuntu, for example, you can install them by
executing:

If you need to send data to Grafana Cloud, you'll have to configure HTTP Basic authentication. Replace `<User>` with your Grafana Cloud stack user and `<Password>` with your Grafana Cloud API key.
```bash
apt install libc6-dbg
```

### Understanding flat stack traces

If your profiles show many shallow stack traces, typically 1-2 frames deep, your binary might have been compiled without frame pointers.

To compile your code with frame pointers, include the `-fno-omit-frame-pointer` flag in your compiler options.

### Profiling interpreted languages

Profiling interpreted languages like Python, Ruby, JavaScript, etc., is not ideal using this implementation.
The JIT-compiled methods in these languages are typically not in ELF file format, demanding additional steps for
profiling. For instance, using perf-map-agent and enabling frame pointers for Java.

If your Phlare server has multi-tenancy enabled, you'll need to configure a tenant ID. Replace `<TenantID>` with your Phlare tenant ID.
Interpreted methods will display the interpreter function’s name rather than the actual function.

## Examples

Check out the following resources to learn more about eBPF profiling:
- [The pros and cons of eBPF profiling](https://pyroscope.io/blog/ebpf-profiling-pros-cons) blog post (for more context on flamegraphs below)
- [Demo](https://demo.pyroscope.io/?query=rideshare-cluster-ebpf.cpu%7B%7D) showing breakdown of our examples cluster
- [docker-compose example](https://github.com/github/pyroscope/blob/main/examples/ebpf) in our repository
- Grafana agent documnetation for [pyroscope.ebpf](/docs/agent/next/flow/reference/components/pyroscope.ebpf/), [pyroscope.write](/docs/agent/next/flow/reference/components/pyroscope.write/), [discovery.kubernetes](/docs/agent/next/flow/reference/components/discovery.kubernetes/), [discovery.relabel](/docs/agent/next/flow/reference/components/discovery.relabel/) components