Skip HTML comment security checks - comments are stripped during compilation#16505
Skip HTML comment security checks - comments are stripped during compilation#16505
Conversation
…during compilation Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adjusts the markdown security scanner to ignore HTML/XML comments that are stripped during workflow compilation, reducing false positives for documentation content.
Changes:
- Strip HTML/XML comments in
ScanMarkdownSecurity()after frontmatter removal (viaremoveXMLComments()). - Remove the redundant HTML-comment payload detection from
scanHiddenContent(). - Update tests to expect no findings when suspicious content appears only inside HTML comments.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/workflow/markdown_security_scanner.go | Strips XML comments prior to scanning and removes in-scanner HTML comment checks. |
| pkg/workflow/markdown_security_scanner_test.go | Updates expectations so suspicious content inside HTML comments is ignored. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // It automatically strips YAML frontmatter (delimited by ---) and HTML/XML comments | ||
| // so that only the active markdown content is scanned. Line numbers in returned findings | ||
| // are adjusted to match the original file. Returns a list of findings. If non-empty, | ||
| // the content should be rejected. | ||
| func ScanMarkdownSecurity(content string) []SecurityFinding { |
There was a problem hiding this comment.
ScanMarkdownSecurity now strips XML/HTML comments via removeXMLComments, but line-number adjustment still only accounts for stripped frontmatter. removeXMLComments can drop lines for multi-line comments (see xml_comments.go: it continues for lines fully inside a comment), which will shift subsequent line numbers and make findings point at the wrong place in the original file.
Consider either (a) making the comment-stripping step preserve the original newline count (replace removed comment lines with empty lines), or (b) tracking a line mapping/offset introduced by comment stripping and applying it to all findings. If you keep stripping comments here, adding a regression test for line numbers after a multi-line HTML comment would help prevent future drift.
| func scanHiddenContent(content string) []SecurityFinding { | ||
| var findings []SecurityFinding | ||
| lines := strings.Split(content, "\n") | ||
|
|
||
| // Check for HTML comments containing suspicious content | ||
| matches := htmlCommentPattern.FindAllStringSubmatchIndex(content, -1) | ||
| for _, match := range matches { | ||
| commentBody := content[match[2]:match[3]] | ||
| commentLine := lineNumberAt(content, match[0]) | ||
|
|
||
| // Flag comments that contain code-like content, URLs, or suspicious keywords | ||
| lowerComment := strings.ToLower(commentBody) | ||
| if containsSuspiciousCommentContent(lowerComment) { | ||
| findings = append(findings, SecurityFinding{ | ||
| Category: CategoryHiddenContent, | ||
| Description: "HTML comment contains suspicious content (code, URLs, or executable instructions)", | ||
| Line: commentLine, | ||
| Snippet: truncateSnippet(strings.TrimSpace(commentBody), 80), | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| // Check for CSS-hidden elements |
There was a problem hiding this comment.
After removing the HTML-comment scanning logic from scanHiddenContent, the related regex/helpers (htmlCommentPattern, suspiciousCommentWordPatterns, and containsSuspiciousCommentContent) are no longer referenced anywhere in the package. Keeping them (and their comments) around makes the hidden-content implementation harder to understand and can mislead future changes.
Recommend deleting the unused pattern/helpers (or repurposing them elsewhere if still needed) so the remaining hidden-content checks reflect the new behavior (CSS hiding + entity obfuscation only).
Security scanner was rejecting workflows with documentation in HTML comments, even though these comments are removed by
removeXMLComments()during compilation and never reach the final workflow.Changes
ScanMarkdownSecurity()by callingremoveXMLComments()after frontmatter strippingscanHiddenContent()since comments are now pre-strippedExample
This now passes security scanning:
Contains URLs like http://example.com and code examples that previously
triggered security alerts but are stripped during compilation.
-->
Actual workflow content...