Skip to content

Conversation

gheyderov
Copy link

Problem:
GAD slugs like go.etcd.io/etcd/client/v3 previously resulted in incorrect product names such as v3, lib, or client. This happened because the parser simply used the last path segment (parts[-1]) as the product name.

Solution:
Introduced a new helper function _derive_vendor_product_from_slug(slug) with conservative heuristics to extract more meaningful vendor/product values:
• Strip trailing /vN suffixes (e.g., v3, v10, v3.1)
• Remove common non-product tails like lib, client, clients, sync, pkg, cmd, internal, src, test
• Map github.com// → vendor = org, product = repo
• For custom hosts (e.g., go.etcd.io/etcd/...) → use the second segment as product and set vendor = UNKNOWN for now

Replaced the previous parts[-1] logic with this helper in gad_source.py.

Tests:
Added dedicated unit tests (test/test_gad_slug_parser.py) covering:
• go.etcd.io/etcd/client/v3 → product = etcd
• go.mozilla.org/sops/v3 → product = sops
• github.com/cloudflare/cfrpki/sync/lib → vendor = cloudflare, product = cfrpki

pytest -k gad → 9 passed, 3 skipped.

Notes:
• The heuristics are deliberately conservative; no guessing or LLM-based inference.
• Additional host-specific rules can be added later if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant