Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .cargo/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[alias]
xtask = "run -p xtask --"
64 changes: 64 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[workspace]
members = ["crates/*", "lib/*", "xtask/codegen", "xtask/rules_check", "docs/codegen"]
members = ["crates/*", "lib/*", "xtask/codegen", "xtask/rules_check", "xtask/agentic", "docs/codegen"]
resolver = "2"

[workspace.package]
Expand All @@ -23,7 +23,9 @@ biome_js_syntax = "0.5.7"
biome_rowan = "0.5.7"
biome_string_case = "0.5.8"
bpaf = { version = "0.9.15", features = ["derive"] }
camino = "1.1.9"
crossbeam = "0.8.4"
dir-test = "0.4.1"
enumflags2 = "0.7.11"
ignore = "0.4.23"
indexmap = { version = "2.6.0", features = ["serde"] }
Expand Down Expand Up @@ -78,6 +80,8 @@ pgt_lexer_codegen = { path = "./crates/pgt_lexer_codegen", version = "0
pgt_lsp = { path = "./crates/pgt_lsp", version = "0.0.0" }
pgt_markup = { path = "./crates/pgt_markup", version = "0.0.0" }
pgt_plpgsql_check = { path = "./crates/pgt_plpgsql_check", version = "0.0.0" }
pgt_pretty_print = { path = "./crates/pgt_pretty_print", version = "0.0.0" }
pgt_pretty_print_codegen = { path = "./crates/pgt_pretty_print_codegen", version = "0.0.0" }
pgt_query = { path = "./crates/pgt_query", version = "0.0.0" }
pgt_query_ext = { path = "./crates/pgt_query_ext", version = "0.0.0" }
pgt_query_macros = { path = "./crates/pgt_query_macros", version = "0.0.0" }
Expand Down
21 changes: 21 additions & 0 deletions crates/pgt_pretty_print/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[package]
authors.workspace = true
categories.workspace = true
description = "<DESCRIPTION>"
edition.workspace = true
homepage.workspace = true
keywords.workspace = true
license.workspace = true
name = "pgt_pretty_print"
repository.workspace = true
version = "0.0.0"


[dependencies]
pgt_pretty_print_codegen.workspace = true
pgt_query.workspace = true

[dev-dependencies]
camino.workspace = true
dir-test.workspace = true
insta.workspace = true
165 changes: 165 additions & 0 deletions crates/pgt_pretty_print/INTEGRATION_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# PostgreSQL Pretty Printer Integration Plan

## Current Status

The pretty printer foundation is **complete and working**! Basic SQL formatting is functional with:
- ✅ SELECT statements with aliases, schema qualification
- ✅ Line length-based breaking (configurable via filename suffix)
- ✅ Proper comma placement and indentation
- ✅ Comprehensive test suite with snapshot testing
- ✅ AST integrity verification (location-aware comparison)

## Architecture Overview

```
SQL Input → pgt_query::parse() → AST → ToTokens → Layout Events → Renderer → Formatted SQL
```

**Key Components:**
- **ToTokens trait**: Converts AST nodes to layout events
- **Layout Events**: `Token`, `Space`, `Line(Hard/Soft/SoftOrSpace)`, `GroupStart/End`, `IndentStart/End`
- **Renderer**: Two-phase prettier-style algorithm (try single line, else break)

## Renderer Implementation Status

### ✅ Completed
- **Core rendering pipeline**: Event processing, text/space/line output
- **Basic grouping**: Single-line vs multi-line decisions
- **Indentation**: Configurable spaces/tabs with proper nesting
- **Line length enforcement**: Respects `max_line_length` config
- **Token rendering**: Keywords, identifiers, punctuation
- **Break propagation**: Child groups with `break_parent: true` force parent groups to break
- **Nested group independence**: Inner groups make independent fit decisions when outer groups break
- **Stack overflow elimination**: Fixed infinite recursion in renderer

### ❌ Missing Features (Priority Order)

#### 1. **Group ID References** (Medium Priority)
**Issue**: Groups can't reference each other's break decisions.

```rust
// Missing: Conditional formatting based on other groups
GroupStart { id: Some("params") }
// ... later reference "params" group's break decision
```

**Implementation**:
- Track group break decisions by ID
- Add conditional breaking logic

#### 2. **Advanced Line Types** (Medium Priority)
**Issue**: `LineType::Soft` vs `LineType::SoftOrSpace` handling could be more sophisticated.

**Current behavior**:
- `Hard`: Always breaks
- `Soft`: Breaks if group breaks, disappears if inline
- `SoftOrSpace`: Breaks if group breaks, becomes space if inline

**Enhancement**: Better handling of soft line semantics in complex nesting.

#### 3. **Performance Optimizations** (Low Priority)
- **Early bailout**: Stop single-line calculation when length exceeds limit
- **Caching**: Memoize group fit calculations for repeated structures
- **String building**: More efficient string concatenation

## AST Node Coverage Status

### ✅ Implemented ToTokens
- `SelectStmt`: Basic SELECT with FROM clause
- `ResTarget`: Column targets with aliases
- `ColumnRef`: Column references (schema.table.column)
- `String`: String literals in column references
- `RangeVar`: Table references with schema
- `FuncCall`: Function calls with break propagation support

### ❌ Missing ToTokens (Add as needed)
- `InsertStmt`, `UpdateStmt`, `DeleteStmt`: DML statements
- `WhereClause`, `JoinExpr`: WHERE conditions and JOINs
- `AExpr`: Binary/unary expressions (`a = b`, `a + b`)
- `AConst`: Literals (numbers, strings, booleans)
- `SubLink`: Subqueries
- `CaseExpr`: CASE expressions
- `WindowFunc`: Window functions
- `AggRef`: Aggregate functions
- `TypeCast`: Type casting (`::int`)

## Testing Infrastructure

### ✅ Current
- **dir-test integration**: Drop SQL files → automatic snapshot testing
- **Line length extraction**: `filename_80.sql` → `max_line_length: 80`
- **AST integrity verification**: Ensures no data loss during formatting
- **Location field handling**: Clears location differences for comparison

### 🔄 Enhancements Needed
- **Add more test cases**: Complex queries, edge cases
- **Performance benchmarks**: Large SQL file formatting speed
- **Configuration testing**: Different indent styles, line lengths
- **Break propagation testing**: Verified with `FuncCall` implementation

## Integration Steps

### ✅ Phase 1: Core Renderer Fixes (COMPLETED)
1. ✅ **Fix break propagation**: Implemented proper `break_parent` handling
2. ✅ **Fix nested groups**: Allow independent fit decisions
3. ✅ **Fix stack overflow**: Eliminated infinite recursion in renderer
4. ✅ **Test with complex cases**: Added `FuncCall` with break propagation test

### Phase 2: AST Coverage Expansion (2-4 days)
1. **Add WHERE clause support**: `WhereClause`, `AExpr` ToTokens
2. **Add basic expressions**: `AConst`, binary operators
3. **Add INSERT/UPDATE/DELETE**: Basic DML statements

### Phase 3: Advanced Features (1-2 days)
1. **Implement group ID system**: Cross-group references
2. **Add performance optimizations**: Early bailout, caching
3. **Enhanced line breaking**: Better soft line semantics

### Phase 4: Production Ready (1-2 days)
1. **Comprehensive testing**: Large SQL files, edge cases
2. **Performance validation**: Benchmark against alternatives
3. **Documentation**: API docs, integration examples

## API Integration Points

```rust
// Main formatting function
pub fn format_sql(sql: &str, config: RenderConfig) -> Result<String, Error> {
let parsed = pgt_query::parse(sql)?;
let ast = parsed.root()?;

let mut emitter = EventEmitter::new();
ast.to_tokens(&mut emitter);

let mut output = String::new();
let mut renderer = Renderer::new(&mut output, config);
renderer.render(emitter.events)?;

Ok(output)
}

// Configuration
pub struct RenderConfig {
pub max_line_length: usize, // 80, 100, 120, etc.
pub indent_size: usize, // 2, 4, etc.
pub indent_style: IndentStyle, // Spaces, Tabs
}
```

## Estimated Completion Timeline

- ✅ **Phase 1** (Core fixes): COMPLETED → **Fully functional renderer**
- **Phase 2** (AST coverage): 4 days → **Supports most common SQL**
- **Phase 3** (Advanced): 2 days → **Production-grade formatting**
- **Phase 4** (Polish): 2 days → **Integration ready**

**Total: ~1 week remaining** for complete production-ready PostgreSQL pretty printer.

## Current Limitations

1. **Limited SQL coverage**: Only basic SELECT statements and function calls
2. **No error recovery**: Unimplemented AST nodes cause panics
3. **No configuration validation**: Invalid configs not checked
4. **Missing group ID system**: Cross-group conditional formatting not yet implemented

The core renderer foundation is now solid with proper break propagation and nested group handling - the remaining work is primarily expanding AST node coverage.
1 change: 1 addition & 0 deletions crates/pgt_pretty_print/src/codegen/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pub mod token_kind;
1 change: 1 addition & 0 deletions crates/pgt_pretty_print/src/codegen/token_kind.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pgt_pretty_print_codegen::token_kind_codegen!();
68 changes: 68 additions & 0 deletions crates/pgt_pretty_print/src/emitter.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
pub use crate::codegen::token_kind::TokenKind;

#[derive(Debug, Clone, PartialEq)]
pub enum LineType {
/// Must break (semicolon, etc.)
Hard,
/// Break if group doesn't fit
Soft,
/// Break if group doesn't fit, but collapse to space if it does
SoftOrSpace,
}

#[derive(Debug, Clone, PartialEq)]
pub enum LayoutEvent {
Token(TokenKind),
Space,
Line(LineType),
GroupStart {
id: Option<String>,
break_parent: bool,
},
GroupEnd,
IndentStart,
IndentEnd,
}

pub struct EventEmitter {
pub events: Vec<LayoutEvent>,
}

impl EventEmitter {
pub fn new() -> Self {
Self { events: Vec::new() }
}

pub fn token(&mut self, token: TokenKind) {
self.events.push(LayoutEvent::Token(token));
}

pub fn space(&mut self) {
self.events.push(LayoutEvent::Space);
}

pub fn line(&mut self, line_type: LineType) {
self.events.push(LayoutEvent::Line(line_type));
}

pub fn group_start(&mut self, id: Option<String>, break_parent: bool) {
self.events
.push(LayoutEvent::GroupStart { id, break_parent });
}

pub fn group_end(&mut self) {
self.events.push(LayoutEvent::GroupEnd);
}

pub fn indent_start(&mut self) {
self.events.push(LayoutEvent::IndentStart);
}

pub fn indent_end(&mut self) {
self.events.push(LayoutEvent::IndentEnd);
}
}

pub trait ToTokens {
fn to_tokens(&self, emitter: &mut EventEmitter);
}
6 changes: 6 additions & 0 deletions crates/pgt_pretty_print/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
mod codegen;
pub mod emitter;
mod nodes;
pub mod renderer;

pub use crate::codegen::token_kind::TokenKind;
Loading
Loading