-
Notifications
You must be signed in to change notification settings - Fork 238
Description
🤖 Filed with agentic-workflows agent mid-design conversation
Problem
The add-labels safe output currently supports allowed: [list] (allowlist) and max: N (count cap), but has no way to deny labels matching a pattern.
In large repositories like microsoft/vscode with 600+ labels, maintaining an exhaustive allowlist is impractical. However, there are classes of labels that should never be applied by an agentic workflow — for example:
- Labels prefixed with
~(tilde) are used as workflow trigger labels (e.g.,~staletriggers the triage workflow). An agent applying these could cause unintended workflow cascades. - Labels prefixed with
*have special administrative meaning.
Without infrastructure-level enforcement, these constraints can only be expressed in the prompt ("please don't apply labels starting with ~"), which is a weak defense against prompt injection attacks on workflows that process untrusted public input.
Proposed Solution
Add a blocked: field to add-labels (and potentially remove-labels) safe outputs that supports pattern matching:
safe-outputs:
add-labels:
blocked: ["~*", "*\\**"] # deny labels starting with ~ or *
max: 5Ideally this would support at minimum prefix matching (e.g., ~* matches any label starting with ~), and potentially the same glob/wildcard syntax used elsewhere in gh-aw (e.g., forks: patterns).
Why This Matters
For workflows that triage public issues (where the issue content is untrusted and may contain prompt injection payloads), the safe-outputs config is the hard security boundary — it's the "you literally can't" layer vs. the prompt-level "please don't" layer. Being able to deny dangerous label patterns at this layer would meaningfully reduce the attack surface for agentic triage workflows.
Current Workaround
Prompt-level instructions telling the agent not to apply certain labels. This works under normal conditions but is not a reliable defense against adversarial input.