Skip to content

Commit bf4578d

Browse files
committed
Change guest tracing to use flatbuffers serialization
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
1 parent 8340ac3 commit bf4578d

File tree

19 files changed

+680
-490
lines changed

19 files changed

+680
-490
lines changed

.github/event.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"pull_request": {
3+
"number": 844,
4+
"base": { "ref": "main" },
5+
"head": { "ref": "HEAD" }
6+
}
7+
}

AI_PROMPT.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# AI Prompt: Hyperlight sandbox correlation ID and logging improvements
2+
3+
## Context
4+
- Project: Hyperlight (Rust). This crate provides the host-side runtime for executing guest code in micro VMs.
5+
- Target area: `hyperlight-host` crate. We need robust per-sandbox correlation to filter/attribute logs and traces in multi-tenant hosts.
6+
- Platform: Linux and Windows; primary dev focus often Linux with KVM/MSHV.
7+
8+
## Repo standards and constraints
9+
- Follow existing Rust style and lint checks. Run: `just fmt-apply`, `just Clippy debug`, `just Clippy release`.
10+
- No new crates/dependencies without discussion. Prefer zero public API breakage.
11+
- Add tests for new behavior; follow existing testing patterns under `src/hyperlight_host/{tests,examples}` and top-level `tests/`.
12+
- Docs: Add Rust documentation comments on public APIs; update docs/ if user-facing behavior changes.
13+
- CI/dev flow: `just build`, `just guests` (before tests) and for CI-like runs `just test-like-ci`.
14+
- Commit hygiene: signed and DCO sign-off; keep commits small and logically ordered.
15+
16+
## Current implementation snapshot (Done)
17+
- A per-sandbox correlation ID is generated upon `UninitializedSandbox::new()` using a UUID v4 hyphenated string.
18+
- The ID is stored on `UninitializedSandbox` and propagated on evolve into `MultiUseSandbox`.
19+
- Public getters exist on both:
20+
- `impl UninitializedSandbox { pub fn correlation_id(&self) -> &str }`
21+
- `impl MultiUseSandbox { pub fn correlation_id(&self) -> &str }`
22+
- Thread-safety: ID is immutable string owned by the sandbox, safe to share by reference.
23+
- Tests: A unit test asserts ID is generated and preserved across evolve.
24+
25+
## Goals next (this iteration)
26+
1. Ensure correlation ID is consistently attached to all host-emitted logs/traces/metrics related to a sandbox instance.
27+
2. Make it easy for callers to include the correlation ID in their own logs when handling a sandbox handle.
28+
3. Keep changes additive and backwards-compatible.
29+
30+
## Scope in this phase (What to implement)
31+
- Tracing/logging attachment:
32+
- When creating a sandbox (both uninitialized and multi-use lifecycle), create a tracing span or log context field that includes `correlation_id` and is parented appropriately for subsequent operations.
33+
- Ensure guest call paths (`MultiUseSandbox::call` and friends) include `correlation_id` either by span field or structured log fields on events.
34+
- Ensure error logs produced during sandbox creation/evolution and guest calls include `correlation_id`.
35+
- Metrics tags (if practical and already supported): include `correlation_id` as a label on per-sandbox metrics where cardinality is acceptable; otherwise, document why we avoid this and limit to logs/traces.
36+
- Examples: Update logging example(s) to print the sandbox `correlation_id()` to demonstrate usage.
37+
38+
## Non-goals in this phase
39+
- Changing guest binaries or guest-side logging wire format.
40+
- Persisting correlation ID in snapshots beyond in-memory association (ID is already part of snapshot ownership via sandbox ID; do not serialize correlation ID unless needed).
41+
42+
## API contract (public)
43+
- Keep the two getters as implemented; no additional public API required right now.
44+
- Avoid adding new types in public API unless justified.
45+
46+
## Implementation guidelines
47+
- Use existing `tracing` integration. Prefer `#[instrument]` spans or manual spans with `correlation_id` as a field.
48+
- Keep correlation ID value as the hyphenated UUID string already generated. Do not regenerate after evolve.
49+
- Do not add new dependencies.
50+
- File touch points likely include:
51+
- `src/hyperlight_host/src/sandbox/uninitialized.rs` (span at `new()`),
52+
- `src/hyperlight_host/src/sandbox/uninitialized_evolve.rs` (span during evolve/initialization),
53+
- `src/hyperlight_host/src/sandbox/initialized_multi_use.rs` (spans for `call()`, `snapshot()`, `restore()`, error paths),
54+
- Possibly error/log helpers to append a structured `correlation_id` field.
55+
56+
## Edge cases to consider
57+
- Multiple sandboxes created concurrently: ensure spans/fields don’t leak between instances.
58+
- Errors early in `new()` before correlation ID would be used elsewhere — still include `correlation_id` in emitted logs from that point onward.
59+
- Reused sandboxes: ID must remain stable.
60+
- Multi-threaded guest calls and interrupt handles: ensure context propagation does not require thread-local state; prefer explicit span entry.
61+
62+
## Testing requirements
63+
- Unit tests:
64+
- Correlation ID is present and non-empty on `UninitializedSandbox` and equals the one on `MultiUseSandbox` after evolve.
65+
- Logs/traces for guest call include the expected `correlation_id` field (use existing tracing test harness patterns under `sandbox::uninitialized::tests::{test_trace_trace, test_log_trace}` as a model to assert fields; add analogous tests for guest calls).
66+
- Integration tests:
67+
- Update a simple example or integration test to print correlation ID and verify basic behavior without changing guest artifacts.
68+
69+
## Documentation updates
70+
- Add Rust documentation comments for the new getters (already present) explaining purpose.
71+
- Update docs/examples to show how to fetch and use the correlation ID.
72+
73+
## How to build and run
74+
- Before tests: `just guests`
75+
- Build: `just build`
76+
- Tests: `just test` (or `just test-like-ci`)
77+
78+
## Acceptance criteria
79+
- All tests pass (unit and integration) on Linux.
80+
- No new linter warnings in debug and release.
81+
- Correlation ID appears in tracing/log outputs for sandbox lifecycle and guest call paths, and remains consistent for the sandbox.
82+
83+
## Reviewer checklist
84+
- Backwards compatibility preserved; no breaking API changes.
85+
- No new dependencies added.
86+
- Adequate tests added, including a minimal happy-path and at least one error-path assertion with `correlation_id` field present.
87+
- Logging/tracing fields are consistent and not duplicated; no span leaks across sandboxes.
88+
89+
---
90+
91+
Add more requirements below (product or engineering), or mark items as out of scope for this iteration:

Cargo.lock

Lines changed: 0 additions & 21 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

automate.sh

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
help() {
5+
echo "Usage: $0 [options] <BASE> <BRANCH>"
6+
echo ""
7+
echo "Options:"
8+
echo " -a, --all-runs Fetch logs for all workflow runs associated with the PR"
9+
echo " -f, --failed-only Only download logs for failed jobs"
10+
echo " -L, --with-logs Also download job logs (by default prints summary only)"
11+
echo " -h, --help Show this help"
12+
echo ""
13+
echo "Arguments:"
14+
echo " BASE Upstream repository as OWNER/REPO (e.g. hyperlight-dev/hyperlight) or a git remote name (e.g. upstream)"
15+
echo " BRANCH Head branch selector for PR search: either 'owner:branch' or just 'branch'"
16+
echo ""
17+
echo "Examples"
18+
echo ""
19+
echo " $0 hyperlight-dev/hyperlight dblnz:tracing-improvements"
20+
echo " $0 -a -f hyperlight-dev/hyperlight dblnz:tracing-improvements"
21+
}
22+
23+
parse_args() {
24+
ALL_RUNS=0
25+
FAILED_ONLY=0
26+
WITH_LOGS=0
27+
28+
# Parse options
29+
while [[ "$1" == -* ]]; do
30+
case "$1" in
31+
-a|--all-runs) ALL_RUNS=1; shift ;;
32+
-f|--failed-only) FAILED_ONLY=1; shift ;;
33+
-L|--with-logs) WITH_LOGS=1; shift ;;
34+
-h|--help) help; exit 0 ;;
35+
*) echo "Unknown option: $1" >&2; help; exit 1 ;;
36+
esac
37+
done
38+
39+
BASE="$1"; shift || true
40+
BRANCH="$1"; shift || true
41+
}
42+
43+
pr_ci_logs_from_fork() {
44+
local BASE="${1:-hyperlight-dev/hyperlight}" # remote name or OWNER/REPO
45+
local BRANCH="${2:-tracing-improvements}"
46+
local ALL_RUNS_FLAG="${3:-0}"
47+
local FAILED_ONLY_flag="${4:-0}"
48+
local WITH_LOGS_flag="${5:-0}"
49+
50+
resolve_base_repo() {
51+
local ref="$1"
52+
if [[ "$ref" != */* ]]; then
53+
local resolved
54+
resolved=$(gh repo view "$ref" --json nameWithOwner --jq .nameWithOwner 2>/dev/null) || true
55+
if [[ -n "$resolved" ]]; then
56+
echo "$resolved"
57+
return 0
58+
fi
59+
fi
60+
echo "$ref"
61+
}
62+
63+
find_pr_number() {
64+
local base="$1" branch_q="$2"
65+
gh pr list --repo "$base" --state all \
66+
--search "$branch_q" \
67+
--json number --jq '.[0].number'
68+
}
69+
70+
get_pr_head_sha() {
71+
local base="$1" pr="$2"
72+
gh pr view "$pr" --repo "$base" --json headRefOid --jq .headRefOid
73+
}
74+
75+
list_run_ids_all_commits() {
76+
local base="$1" pr="$2"
77+
gh api "repos/${base}/actions/runs" --paginate -f event=pull_request \
78+
--jq ".workflow_runs[] | select(any(.pull_requests[]; .number == ${pr})) | .id"
79+
}
80+
81+
list_run_ids_for_sha() {
82+
local base="$1" sha="$2"
83+
gh run list --repo "$base" --event pull_request --limit 200 \
84+
--json databaseId,headSha,createdAt \
85+
--jq "[.[] | select(.headSha==\"${sha}\")] | sort_by(.createdAt) | .[].databaseId"
86+
}
87+
88+
summarize_run() {
89+
local base="$1" rid="$2"
90+
gh run view "$rid" --repo "$base" \
91+
--json databaseId,headSha,headBranch,status,conclusion,url,jobs,createdAt,updatedAt \
92+
--jq '{
93+
run_id: .databaseId,
94+
head_sha: .headSha,
95+
head_branch: (.headBranch // null),
96+
status: .status,
97+
conclusion: .conclusion,
98+
url: .url,
99+
created_at: (.createdAt // null),
100+
updated_at: (.updatedAt // null),
101+
jobs: (.jobs | map({id:.databaseId, name, status, conclusion}))
102+
}'
103+
}
104+
105+
download_logs_for_summary() {
106+
local base="$1" pr="$2" rid="$3" summary_json="$4" failed_only="$5"
107+
echo "$summary_json" | jq '.jobs[]' > "pr${pr}-run${rid}-jobs.json"
108+
local jq_filter
109+
if [[ "$failed_only" == "1" ]]; then
110+
jq_filter='.jobs[] | select(.conclusion=="failure") | [.id, .name] | @tsv'
111+
else
112+
jq_filter='.jobs[] | [.id, .name] | @tsv'
113+
fi
114+
echo "$summary_json" | jq -r "$jq_filter" | while IFS=$'\t' read -r jid name; do
115+
[[ -z "$jid" ]] && continue
116+
local safe
117+
safe=$(echo "$name" | tr -cs '[:alnum:]._-' '-')
118+
echo "Fetching log for job $jid ($name) from run $rid..." >&2
119+
gh run view "$rid" --repo "$base" --job "$jid" --log > "pr${pr}-run${rid}-${safe}-${jid}.log"
120+
done
121+
}
122+
123+
aggregate_summaries() {
124+
local sha="$1" tmp_file="$2" pr="$3"
125+
local short_sha
126+
short_sha=${sha:0:7}
127+
if command -v jq >/dev/null 2>&1; then
128+
jq -s '.' "$tmp_file" | tee "pr${pr}-sha${short_sha}-runs-summary.json"
129+
else
130+
{
131+
echo '['
132+
paste -sd, "$tmp_file"
133+
echo ']'
134+
} | tee "pr${pr}-sha${short_sha}-runs-summary.json"
135+
fi
136+
}
137+
138+
# Resolve repo reference
139+
BASE=$(resolve_base_repo "$BASE")
140+
141+
# Find PR and head SHA
142+
local pr sha run
143+
pr=$(find_pr_number "$BASE" "$BRANCH") || return 1
144+
[[ -z "$pr" ]] && { echo "No PR found for ${BRANCH} in $BASE" >&2; return 1; }
145+
sha=$(get_pr_head_sha "$BASE" "$pr") || return 1
146+
147+
# Determine run ids
148+
local run_ids=()
149+
if [[ "$ALL_RUNS_FLAG" == "1" ]]; then
150+
while IFS= read -r rid; do
151+
[[ -n "$rid" ]] && run_ids+=("$rid")
152+
done < <(list_run_ids_all_commits "$BASE" "$pr")
153+
else
154+
while IFS= read -r rid; do
155+
[[ -n "$rid" ]] && run_ids+=("$rid")
156+
done < <(list_run_ids_for_sha "$BASE" "$sha")
157+
[[ ${#run_ids[@]} -eq 0 ]] && { echo "No workflow runs found for PR #$pr (sha $sha) in $BASE" >&2; return 1; }
158+
fi
159+
160+
echo "Using $BASE PR #$pr, SHA $sha, runs (${#run_ids[@]}): ${run_ids[*]}" >&2
161+
162+
local rid
163+
local tmp_summary_ndjson
164+
tmp_summary_ndjson=$(mktemp)
165+
166+
for rid in "${run_ids[@]}"; do
167+
echo "Inspecting run $rid" >&2
168+
local summary
169+
summary=$(summarize_run "$BASE" "$rid") || { echo "Failed to load run details for $rid" >&2; continue; }
170+
echo "$summary" | tee "pr${pr}-run${rid}-summary.json" >/dev/null
171+
echo "$summary" >> "$tmp_summary_ndjson"
172+
if [[ "$WITH_LOGS_flag" == "1" ]]; then
173+
download_logs_for_summary "$BASE" "$pr" "$rid" "$summary" "$FAILED_ONLY_flag"
174+
fi
175+
done
176+
177+
aggregate_summaries "$sha" "$tmp_summary_ndjson" "$pr"
178+
rm -f "$tmp_summary_ndjson"
179+
}
180+
181+
main() {
182+
# Parse CLI arguments
183+
parse_args $@
184+
185+
pr_ci_logs_from_fork "$BASE" "$BRANCH" "$ALL_RUNS" "$FAILED_ONLY" "$WITH_LOGS"
186+
}
187+
188+
main $@

gdb-cmds.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
2+
file ./src/tests/rust_guests/bin/debug/simpleguest
3+
target remote :8080
4+
set disassembly-flavor intel
5+
set disassemble-next-line on
6+
enable pretty-printer
7+
layout regs
8+
layout src
9+

lldb-cmds.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
lldb /path/to/executable -c /path/to/core/dump
2+
3+
image list
4+
setting show target.source-map
5+
bt
6+
frame select 0
7+
source list
8+
9+
disassemble --frame

0 commit comments

Comments
 (0)