Skip to content

Doctacon/lombardi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Lombardi ๐ŸŽฏ

SQL performance analysis and optimization advisor with SQLMesh integration

PyPI version Python 3.11+ License: MIT

Lombardi is a powerful SQL performance analysis tool that helps you identify bottlenecks, detect antipatterns, and optimize your queries across different data warehouses. Built on top of SQLGlot's semantic parsing, it provides actionable insights for better query performance.

โœจ Features

๐Ÿ” Semantic SQL Analysis

  • Complexity Scoring: Quantitative assessment (0-100) based on joins, subqueries, nesting depth
  • Antipattern Detection: Identifies common performance killers like SELECT *, cartesian joins, functions in WHERE clauses
  • Optimization Suggestions: Actionable recommendations with specific examples

๐Ÿข Warehouse-Specific Rules

  • Snowflake: Clustering keys, result caching, VARIANT optimization, time travel suggestions
  • BigQuery: Partitioning recommendations, slot efficiency, materialized views, array operations
  • Universal: Cross-platform optimizations that work everywhere

๐Ÿ› ๏ธ Integration Ready

  • SQLMesh Integration: Custom audit for performance checks during sqlmesh plan
  • CLI Tool: Rich terminal output with colors, tables, and formatting
  • Python API: Programmatic access for custom workflows
  • Multiple Output Formats: Rich terminal, JSON, plain text

๐Ÿš€ Quick Start

Installation

pip install lombardi

CLI Usage

# Analyze a SQL file
lombardi analyze query.sql --dialect snowflake

# Analyze a query string
lombardi analyze --sql "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id"

# Get just complexity metrics
lombardi complexity --sql "SELECT COUNT(*) FROM large_table WHERE date_col >= '2024-01-01'"

# Export results as JSON
lombardi analyze query.sql --format json > analysis.json

# BigQuery-specific analysis
lombardi analyze --sql "SELECT * FROM dataset.table" --dialect bigquery

Python API

from lombardi import ComplexityAnalyzer, AntipatternDetector, OptimizationSuggester

# Analyze query complexity
analyzer = ComplexityAnalyzer(dialect="snowflake")
metrics = analyzer.analyze("SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id")

print(f"Complexity Score: {metrics.complexity_score}/100")
print(f"Join Count: {metrics.join_count}")
print(f"Subqueries: {metrics.subquery_count}")

# Detect antipatterns
detector = AntipatternDetector()
issues = detector.detect("SELECT * FROM users WHERE UPPER(name) = 'JOHN'")

for issue in issues:
    print(f"โŒ {issue.pattern}: {issue.description}")

# Get optimization suggestions
suggester = OptimizationSuggester(dialect="bigquery")
suggestions = suggester.suggest("SELECT * FROM large_table WHERE date_col >= '2024-01-01'")

for suggestion in suggestions:
    print(f"๐Ÿ’ก {suggestion.title}: {suggestion.description}")

๐Ÿ“Š Example Output

$ lombardi analyze --sql "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE UPPER(c.name) LIKE '%ACME%'"
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ SQL Analysis Results โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Complexity Score: 12.0/100                                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

    Complexity Metrics    
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Metric         โ”ƒ Value โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Join Count     โ”‚ 1     โ”‚
โ”‚ Subquery Count โ”‚ 0     โ”‚
โ”‚ CTE Count      โ”‚ 0     โ”‚
โ”‚ Function Count โ”‚ 1     โ”‚
โ”‚ Nesting Depth  โ”‚ 0     โ”‚
โ”‚ Table Count    โ”‚ 2     โ”‚
โ”‚ Column Count   โ”‚ 3     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

                                Detected Issues                                 
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Pattern                     โ”ƒ Severity โ”ƒ Description                         โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Select Star                 โ”‚ MEDIUM   โ”‚ SELECT * can be inefficient        โ”‚
โ”‚ Function On Column In Where โ”‚ HIGH     โ”‚ Functions prevent index usage      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

                            Optimization Suggestions                            
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Category    โ”ƒ Priority โ”ƒ Suggestion                                          โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Performance โ”‚ HIGH     โ”‚ Function On Column In Where: Rewrite to avoid      โ”‚
โ”‚             โ”‚          โ”‚ applying functions to indexed columns               โ”‚
โ”‚ Indexing    โ”‚ HIGH     โ”‚ JOIN Index Optimization: Ensure indexes exist on   โ”‚
โ”‚             โ”‚          โ”‚ JOIN columns: o.customer_id, c.id                  โ”‚
โ”‚ Performance โ”‚ MEDIUM   โ”‚ Select Star: Explicitly list required columns      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”Œ SQLMesh Integration

Integrate Lombardi into your SQLMesh workflow for automated performance checks:

# In your SQLMesh model
MODEL (
  name my_project.optimized_table,
  kind FULL,
  audits [performance_check]
);

SELECT 
  region_code,
  species_code,
  COUNT(*) as observation_count
FROM @DEV.my_project.raw_observations 
WHERE observation_date >= '2024-01-01'
GROUP BY 1, 2;
# Configure the audit
from lombardi.integrations.sqlmesh_audit import create_performance_audit

performance_audit = create_performance_audit(
    complexity_threshold=30.0,
    min_severity="medium",
    dialect="snowflake"
)

๐Ÿ—๏ธ Architecture

lombardi/
โ”œโ”€โ”€ analyzers/           # Core analysis engines
โ”‚   โ”œโ”€โ”€ complexity_analyzer.py      # Complexity scoring
โ”‚   โ”œโ”€โ”€ antipattern_detector.py     # Antipattern detection  
โ”‚   โ””โ”€โ”€ optimization_suggester.py   # Optimization recommendations
โ”œโ”€โ”€ rules/               # Warehouse-specific optimizations
โ”‚   โ”œโ”€โ”€ snowflake_rules.py          # Snowflake optimizations
โ”‚   โ””โ”€โ”€ bigquery_rules.py           # BigQuery optimizations
โ”œโ”€โ”€ integrations/        # Third-party integrations
โ”‚   โ””โ”€โ”€ sqlmesh_audit.py            # SQLMesh audit integration
โ””โ”€โ”€ cli.py              # Command-line interface

๐ŸŽฏ Detected Antipatterns

Lombardi identifies these common SQL performance issues:

Pattern Severity Description
**SELECT *** Medium Can be inefficient and fragile
Cartesian Joins Critical Missing JOIN conditions
Functions in WHERE High Prevents index usage
Leading Wildcards Medium LIKE '%pattern' prevents indexes
Missing WHERE Medium Full table scans
Subquery in SELECT Medium Often better as JOINs

๐Ÿข Warehouse-Specific Optimizations

Snowflake

  • โ„๏ธ Clustering Keys: Recommendations for large table partitioning
  • ๐Ÿš€ Result Caching: Identify cacheable query patterns
  • ๐Ÿ“Š Warehouse Sizing: Complexity-based sizing suggestions
  • ๐Ÿ•’ Time Travel: Optimize historical queries
  • ๐Ÿ“„ VARIANT: Semi-structured data query optimization

BigQuery

  • ๐Ÿ“… Partitioning: Date/timestamp partitioning opportunities
  • ๐Ÿ—‚๏ธ Clustering: Multi-column clustering recommendations
  • ๐Ÿ’ฐ Slot Efficiency: Cost optimization suggestions
  • ๐Ÿ”„ Materialized Views: Pre-aggregation opportunities
  • ๐Ÿ“Š Array Operations: UNNEST and array function optimization

๐Ÿ”ง Configuration

CLI Options

lombardi analyze [OPTIONS] [SQL_FILE]

Options:
  --sql, -s TEXT              SQL query string to analyze
  --dialect, -d TEXT          SQL dialect (snowflake, bigquery, etc.)
  --threshold, -t FLOAT       Complexity threshold (0-100) [default: 50.0]
  --severity TEXT             Minimum severity (low, medium, high, critical) [default: medium]
  --format, -f TEXT           Output format (rich, json, plain) [default: rich]
  --suggestions/--no-suggestions  Include optimization suggestions [default: suggestions]

Python API Configuration

# Initialize with specific dialect
analyzer = ComplexityAnalyzer(dialect="snowflake")
detector = AntipatternDetector(dialect="bigquery")

# Configure thresholds and rules
suggester = OptimizationSuggester(dialect="snowflake")
suggestions = suggester.suggest(sql_query)

# Warehouse-specific analysis
from lombardi.rules.snowflake_rules import SnowflakeRules
snowflake_rules = SnowflakeRules()
sf_suggestions = snowflake_rules.analyze(sql_query)

๐Ÿงช Development

Setup

git clone https://github.com/Doctacon/lombardi.git
cd lombardi
uv sync

Testing

# Run tests
uv run pytest tests/ -v

# Test CLI
uv run lombardi analyze --sql "SELECT * FROM test_table"

# Test with coverage
uv run pytest tests/ --cov=lombardi

Building

# Build package
uv build

# Test locally
pip install dist/lombardi-*.whl

๐Ÿ“ˆ Performance Impact

Lombardi helps identify optimizations that can provide:

  • Query Speed: 10x-100x faster execution through proper indexing
  • Cost Reduction: 50-90% lower warehouse costs via efficient queries
  • Resource Usage: Reduced CPU, memory, and I/O through better query patterns
  • Maintainability: Cleaner, more readable SQL through antipattern detection

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Run tests: uv run pytest
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • SQLGlot: Powers our semantic SQL parsing
  • SQLMesh: Inspiration for the audit integration pattern
  • Rich: Beautiful terminal output
  • Typer: Excellent CLI framework

๐Ÿ”— Links


Named after Vince Lombardi, who believed "Perfection is not attainable, but if we chase perfection we can catch excellence" - the same philosophy we apply to SQL optimization.

About

SQL performance analysis and optimization advisor with SQLMesh integration

Topics

Resources

Stars

Watchers

Forks

Languages