SQL performance analysis and optimization advisor with SQLMesh integration
Lombardi is a powerful SQL performance analysis tool that helps you identify bottlenecks, detect antipatterns, and optimize your queries across different data warehouses. Built on top of SQLGlot's semantic parsing, it provides actionable insights for better query performance.
- Complexity Scoring: Quantitative assessment (0-100) based on joins, subqueries, nesting depth
- Antipattern Detection: Identifies common performance killers like SELECT *, cartesian joins, functions in WHERE clauses
- Optimization Suggestions: Actionable recommendations with specific examples
- Snowflake: Clustering keys, result caching, VARIANT optimization, time travel suggestions
- BigQuery: Partitioning recommendations, slot efficiency, materialized views, array operations
- Universal: Cross-platform optimizations that work everywhere
- SQLMesh Integration: Custom audit for performance checks during
sqlmesh plan
- CLI Tool: Rich terminal output with colors, tables, and formatting
- Python API: Programmatic access for custom workflows
- Multiple Output Formats: Rich terminal, JSON, plain text
pip install lombardi
# Analyze a SQL file
lombardi analyze query.sql --dialect snowflake
# Analyze a query string
lombardi analyze --sql "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id"
# Get just complexity metrics
lombardi complexity --sql "SELECT COUNT(*) FROM large_table WHERE date_col >= '2024-01-01'"
# Export results as JSON
lombardi analyze query.sql --format json > analysis.json
# BigQuery-specific analysis
lombardi analyze --sql "SELECT * FROM dataset.table" --dialect bigquery
from lombardi import ComplexityAnalyzer, AntipatternDetector, OptimizationSuggester
# Analyze query complexity
analyzer = ComplexityAnalyzer(dialect="snowflake")
metrics = analyzer.analyze("SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id")
print(f"Complexity Score: {metrics.complexity_score}/100")
print(f"Join Count: {metrics.join_count}")
print(f"Subqueries: {metrics.subquery_count}")
# Detect antipatterns
detector = AntipatternDetector()
issues = detector.detect("SELECT * FROM users WHERE UPPER(name) = 'JOHN'")
for issue in issues:
print(f"โ {issue.pattern}: {issue.description}")
# Get optimization suggestions
suggester = OptimizationSuggester(dialect="bigquery")
suggestions = suggester.suggest("SELECT * FROM large_table WHERE date_col >= '2024-01-01'")
for suggestion in suggestions:
print(f"๐ก {suggestion.title}: {suggestion.description}")
$ lombardi analyze --sql "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE UPPER(c.name) LIKE '%ACME%'"
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโ SQL Analysis Results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Complexity Score: 12.0/100 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Complexity Metrics
โโโโโโโโโโโโโโโโโโณโโโโโโโโ
โ Metric โ Value โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Join Count โ 1 โ
โ Subquery Count โ 0 โ
โ CTE Count โ 0 โ
โ Function Count โ 1 โ
โ Nesting Depth โ 0 โ
โ Table Count โ 2 โ
โ Column Count โ 3 โ
โโโโโโโโโโโโโโโโโโดโโโโโโโโ
Detected Issues
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Pattern โ Severity โ Description โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Select Star โ MEDIUM โ SELECT * can be inefficient โ
โ Function On Column In Where โ HIGH โ Functions prevent index usage โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Optimization Suggestions
โโโโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Category โ Priority โ Suggestion โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Performance โ HIGH โ Function On Column In Where: Rewrite to avoid โ
โ โ โ applying functions to indexed columns โ
โ Indexing โ HIGH โ JOIN Index Optimization: Ensure indexes exist on โ
โ โ โ JOIN columns: o.customer_id, c.id โ
โ Performance โ MEDIUM โ Select Star: Explicitly list required columns โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Integrate Lombardi into your SQLMesh workflow for automated performance checks:
# In your SQLMesh model
MODEL (
name my_project.optimized_table,
kind FULL,
audits [performance_check]
);
SELECT
region_code,
species_code,
COUNT(*) as observation_count
FROM @DEV.my_project.raw_observations
WHERE observation_date >= '2024-01-01'
GROUP BY 1, 2;
# Configure the audit
from lombardi.integrations.sqlmesh_audit import create_performance_audit
performance_audit = create_performance_audit(
complexity_threshold=30.0,
min_severity="medium",
dialect="snowflake"
)
lombardi/
โโโ analyzers/ # Core analysis engines
โ โโโ complexity_analyzer.py # Complexity scoring
โ โโโ antipattern_detector.py # Antipattern detection
โ โโโ optimization_suggester.py # Optimization recommendations
โโโ rules/ # Warehouse-specific optimizations
โ โโโ snowflake_rules.py # Snowflake optimizations
โ โโโ bigquery_rules.py # BigQuery optimizations
โโโ integrations/ # Third-party integrations
โ โโโ sqlmesh_audit.py # SQLMesh audit integration
โโโ cli.py # Command-line interface
Lombardi identifies these common SQL performance issues:
Pattern | Severity | Description |
---|---|---|
**SELECT *** | Medium | Can be inefficient and fragile |
Cartesian Joins | Critical | Missing JOIN conditions |
Functions in WHERE | High | Prevents index usage |
Leading Wildcards | Medium | LIKE '%pattern' prevents indexes |
Missing WHERE | Medium | Full table scans |
Subquery in SELECT | Medium | Often better as JOINs |
- โ๏ธ Clustering Keys: Recommendations for large table partitioning
- ๐ Result Caching: Identify cacheable query patterns
- ๐ Warehouse Sizing: Complexity-based sizing suggestions
- ๐ Time Travel: Optimize historical queries
- ๐ VARIANT: Semi-structured data query optimization
- ๐ Partitioning: Date/timestamp partitioning opportunities
- ๐๏ธ Clustering: Multi-column clustering recommendations
- ๐ฐ Slot Efficiency: Cost optimization suggestions
- ๐ Materialized Views: Pre-aggregation opportunities
- ๐ Array Operations: UNNEST and array function optimization
lombardi analyze [OPTIONS] [SQL_FILE]
Options:
--sql, -s TEXT SQL query string to analyze
--dialect, -d TEXT SQL dialect (snowflake, bigquery, etc.)
--threshold, -t FLOAT Complexity threshold (0-100) [default: 50.0]
--severity TEXT Minimum severity (low, medium, high, critical) [default: medium]
--format, -f TEXT Output format (rich, json, plain) [default: rich]
--suggestions/--no-suggestions Include optimization suggestions [default: suggestions]
# Initialize with specific dialect
analyzer = ComplexityAnalyzer(dialect="snowflake")
detector = AntipatternDetector(dialect="bigquery")
# Configure thresholds and rules
suggester = OptimizationSuggester(dialect="snowflake")
suggestions = suggester.suggest(sql_query)
# Warehouse-specific analysis
from lombardi.rules.snowflake_rules import SnowflakeRules
snowflake_rules = SnowflakeRules()
sf_suggestions = snowflake_rules.analyze(sql_query)
git clone https://github.com/Doctacon/lombardi.git
cd lombardi
uv sync
# Run tests
uv run pytest tests/ -v
# Test CLI
uv run lombardi analyze --sql "SELECT * FROM test_table"
# Test with coverage
uv run pytest tests/ --cov=lombardi
# Build package
uv build
# Test locally
pip install dist/lombardi-*.whl
Lombardi helps identify optimizations that can provide:
- Query Speed: 10x-100x faster execution through proper indexing
- Cost Reduction: 50-90% lower warehouse costs via efficient queries
- Resource Usage: Reduced CPU, memory, and I/O through better query patterns
- Maintainability: Cleaner, more readable SQL through antipattern detection
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes and add tests
- Run tests:
uv run pytest
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- SQLGlot: Powers our semantic SQL parsing
- SQLMesh: Inspiration for the audit integration pattern
- Rich: Beautiful terminal output
- Typer: Excellent CLI framework
- Documentation: [Coming Soon]
- PyPI: https://pypi.org/project/lombardi/
- Issues: https://github.com/Doctacon/lombardi/issues
- Discussions: https://github.com/Doctacon/lombardi/discussions
Named after Vince Lombardi, who believed "Perfection is not attainable, but if we chase perfection we can catch excellence" - the same philosophy we apply to SQL optimization.