Comby-like structural search and rewrite for code
A Python implementation of Comby patterns for structural code matching and transformation.
from pycomby import pycomby
# Extract matches
matches = pycomby("Hello, world!", "Hello, :[greeting:word]!")
# [{'greeting': 'world'}]
# Rewrite code
code = "foo(bar(1, 2))"
result = pycomby(code, ":[func:()]", "[CALL :[func]]")
# "[CALL foo(bar(1, 2))]"# Extract
echo "John is 30. Jane is 25." | pycomby ':[name:word] is :[age:digit]'
{"name":"John","age":"30"}
{"name":"Jane","age":"25"}
# Replace
echo "John is 30" | pycomby ':[name:word] is :[age:digit]' 'Person: :[name.upper]'
Person: JOHN- Balanced delimiters – Match
(),[],{}with proper nesting (no regex hacks) - Macros –
:[x:word],:[x:digit],:[x:num]for common patterns - Structural macros –
:[x:()]for balanced parentheses,:[x:(_)]for content only - Optional holes –
:[x?]or:[x:word?]for optional matching - Regex constraints –
:[x~\d{3}]for custom patterns - Transformations – Chain operations:
:[path.basename.upper] - Comment/string aware – Ignores delimiters inside strings and comments
git clone https://github.com/bardo84/pycomby.git
cd pycomby
pip install -e .Then use:
pycomby [OPTIONS] PATTERN [REPLACEMENT] < input.txtSee SYNTAX.md for complete documentation.
| Pattern | Matches |
|---|---|
:[name] |
Any text, captured as name |
:[_] |
Any text, not captured |
... |
Any text (shorthand for :[_]) |
:[x:word] |
Word characters (\w+) |
:[x:digit] |
Digits (\d+) |
:[x:num] |
Numbers (int, float, scientific) |
:[x:()] |
Balanced parentheses with content |
:[x:(_)] |
Content inside parentheses only |
:[x~[a-z]+] |
Custom regex pattern |
:[x?] |
Optional match |
# String operations
pycomby(text, pattern, ":[name.upper]") # Uppercase
pycomby(text, pattern, ":[name.lower]") # Lowercase
pycomby(text, pattern, ":[name.capitalize]") # Capitalize
# Arithmetic (on numbers)
pycomby(text, pattern, ":[num.inc]") # +1
pycomby(text, pattern, ":[num.dec]") # -1
# Path operations
pycomby(text, pattern, ":[path.basename]") # Filename without extension
pycomby(text, pattern, ":[path.extension]") # File extension
pycomby(text, pattern, ":[path.filename]") # Full filename
# Chain operations
pycomby(text, pattern, ":[x.basename.upper]") # basename, then uppercase- Code refactoring – Find and rewrite patterns across files
- API migration – Update function calls (e.g.,
old_api()→new_api()) with captured context - String interpolation in rewrites – Inject captured identifiers into string literals (e.g., inject module name into
maybe_provider("...")) - Linting – Detect problematic patterns in code
- Code generation – Template-based transformations
- Log parsing – Extract structured data from unformatted text
python -m unittest discover -p "*test*.py" -vAll 28 tests pass. See pycomby_test.py and test_cli.py.
Unlike regex engines, pycomby:
- Tokenizes patterns into literals and holes
- Backtracks intelligently to find matches
- Handles structure with a stack-based scanner for balanced delimiters
This means you get:
- No regex escaping needed for literal text
- Proper handling of nested delimiters (unlike
\(.*\)) - Whitespace flexibility without explicit patterns
A powerful use case is injecting captured text into string literals during a rewrite, for example when consolidating every legacy_api::call(...) site into a shared guard.
# API migration with context: before/after
before = '''legacy_api::provider::<core::math::tan>::call(...)
legacy_api::provider::<core::array::add>::call(...)'''
# Capture the module path and inject it into the new guard call
matches = pycomby(
before,
"legacy_api::provider::<:[module]>::call",
'shared_guard::maybe_provider(":[module]")?.call',
)
# Result (Tier 1):
# shared_guard::maybe_provider("core::math::tan")?.call(...)
# shared_guard::maybe_provider("core::array::add")?.call(...)This is Tier 1 (structural injection): when the literal you need is already present in the matched text and can be captured, pycomby injects it directly into the replacement string literal. For cases where the injected literal must be derived, see pycomby_forward.md for the path toward Tier 2/3.
Find: target the existinglegacy_api::provider::<...>::callinvocations; capture the module path or guard label that now drives the helper.Replace: feed the captured literal into the new guard call so every rewritten site expresses the same API and logging behavior.Verify: rerun the integration or boundary tests that exercise the helper (e.g., those that expect the guard to emit a warning when the provider is missing).
This keeps every call site aligned with the shared guard and prevents legacy_api::provider from proliferating across the tree.
- Backtracking can be quadratic in worst case (use specific patterns when possible)
- Structural macros are language-agnostic (don't skip language-specific comments)
- Entire input loaded into memory
- String interpolation of captures inside literals currently requires the capture to already contain the desired text; if you need to derive or compute the literal, use multi-pass or external scripting
Contributions welcome! See CONTRIBUTING.md.
MIT License. See LICENSE.
Related: Comby – The original Comby implementation