Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update comments / docs for Spanned #1549

Merged
merged 4 commits into from
Nov 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,13 +100,18 @@ similar semantics are represented with the same AST. We welcome PRs to fix such
issues and distinguish different syntaxes in the AST.


## WIP: Extracting source locations from AST nodes
## Source Locations (Work in Progress)

This crate allows recovering source locations from AST nodes via the [Spanned](https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html) trait, which can be used for advanced diagnostics tooling. Note that this feature is a work in progress and many nodes report missing or inaccurate spans. Please see [this document](./docs/source_spans.md#source-span-contributing-guidelines) for information on how to contribute missing improvements.
This crate allows recovering source locations from AST nodes via the [Spanned]
trait, which can be used for advanced diagnostics tooling. Note that this
feature is a work in progress and many nodes report missing or inaccurate spans.
Please see [this ticket] for information on how to contribute missing
improvements.

```rust
use sqlparser::ast::Spanned;
[Spanned]: https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html
[this ticket]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548

```rust
// Parse SQL
let ast = Parser::parse_sql(&GenericDialect, "SELECT A FROM B").unwrap();

Expand All @@ -123,9 +128,9 @@ SQL was first standardized in 1987, and revisions of the standard have been
published regularly since. Most revisions have added significant new features to
the language, and as a result no database claims to support the full breadth of
features. This parser currently supports most of the SQL-92 syntax, plus some
syntax from newer versions that have been explicitly requested, plus some MSSQL,
PostgreSQL, and other dialect-specific syntax. Whenever possible, the [online
SQL:2016 grammar][sql-2016-grammar] is used to guide what syntax to accept.
syntax from newer versions that have been explicitly requested, plus various
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by clarification

other dialect-specific syntax. Whenever possible, the [online SQL:2016
grammar][sql-2016-grammar] is used to guide what syntax to accept.

Unfortunately, stating anything more specific about compliance is difficult.
There is no publicly available test suite that can assess compliance
Expand Down
52 changes: 0 additions & 52 deletions docs/source_spans.md

This file was deleted.

64 changes: 59 additions & 5 deletions src/ast/helpers/attached_token.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,73 @@ use core::cmp::{Eq, Ord, Ordering, PartialEq, PartialOrd};
use core::fmt::{self, Debug, Formatter};
use core::hash::{Hash, Hasher};

use crate::tokenizer::{Token, TokenWithSpan};
use crate::tokenizer::TokenWithSpan;

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

#[cfg(feature = "visitor")]
use sqlparser_derive::{Visit, VisitMut};

/// A wrapper type for attaching tokens to AST nodes that should be ignored in comparisons and hashing.
/// This should be used when a token is not relevant for semantics, but is still needed for
/// accurate source location tracking.
/// A wrapper over [`TokenWithSpan`]s that ignores the token and source
/// location in comparisons and hashing.
///
/// This type is used when the token and location is not relevant for semantics,
/// but is still needed for accurate source location tracking, for example, in
/// the nodes in the [ast](crate::ast) module.
///
/// Note: **All** `AttachedTokens` are equal.
///
/// # Examples
///
/// Same token, different location are equal
/// ```
/// # use sqlparser::ast::helpers::attached_token::AttachedToken;
/// # use sqlparser::tokenizer::{Location, Span, Token, TokenWithLocation};
/// // commas @ line 1, column 10
/// let tok1 = TokenWithLocation::new(
/// Token::Comma,
/// Span::new(Location::new(1, 10), Location::new(1, 11)),
/// );
/// // commas @ line 2, column 20
/// let tok2 = TokenWithLocation::new(
/// Token::Comma,
/// Span::new(Location::new(2, 20), Location::new(2, 21)),
/// );
///
/// assert_ne!(tok1, tok2); // token with locations are *not* equal
/// assert_eq!(AttachedToken(tok1), AttachedToken(tok2)); // attached tokens are
/// ```
///
/// Different token, different location are equal 🤯
///
/// ```
/// # use sqlparser::ast::helpers::attached_token::AttachedToken;
/// # use sqlparser::tokenizer::{Location, Span, Token, TokenWithLocation};
/// // commas @ line 1, column 10
/// let tok1 = TokenWithLocation::new(
/// Token::Comma,
/// Span::new(Location::new(1, 10), Location::new(1, 11)),
/// );
/// // period @ line 2, column 20
/// let tok2 = TokenWithLocation::new(
/// Token::Period,
/// Span::new(Location::new(2, 10), Location::new(2, 21)),
/// );
///
/// assert_ne!(tok1, tok2); // token with locations are *not* equal
/// assert_eq!(AttachedToken(tok1), AttachedToken(tok2)); // attached tokens are
/// ```
/// // period @ line 2, column 20
#[derive(Clone)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct AttachedToken(pub TokenWithSpan);

impl AttachedToken {
/// Return a new Empty AttachedToken
pub fn empty() -> Self {
AttachedToken(TokenWithSpan::wrap(Token::EOF))
AttachedToken(TokenWithSpan::new_eof())
}
}

Expand Down Expand Up @@ -80,3 +128,9 @@ impl From<TokenWithSpan> for AttachedToken {
AttachedToken(value)
}
}

impl From<AttachedToken> for TokenWithSpan {
fn from(value: AttachedToken) -> Self {
value.0
}
}
16 changes: 14 additions & 2 deletions src/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -596,9 +596,21 @@ pub enum CeilFloorKind {

/// An SQL expression of any type.
///
/// # Semantics / Type Checking
///
/// The parser does not distinguish between expressions of different types
/// (e.g. boolean vs string), so the caller must handle expressions of
/// inappropriate type, like `WHERE 1` or `SELECT 1=1`, as necessary.
/// (e.g. boolean vs string). The caller is responsible for detecting and
/// validating types as necessary (for example `WHERE 1` vs `SELECT 1=1`)
/// See the [README.md] for more details.
///
/// [README.md]: https://github.com/apache/datafusion-sqlparser-rs/blob/main/README.md#syntax-vs-semantics
///
/// # Equality and Hashing Does not Include Source Locations
///
/// The `Expr` type implements `PartialEq` and `Eq` based on the semantic value
/// of the expression (not bitwise comparison). This means that `Expr` instances
/// that are semantically equivalent but have different spans (locations in the
/// source tree) will compare as equal.
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(
Expand Down
5 changes: 3 additions & 2 deletions src/ast/query.rs
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ impl fmt::Display for Table {
pub struct Select {
/// Token for the `SELECT` keyword
pub select_token: AttachedToken,
/// `SELECT [DISTINCT] ...`
pub distinct: Option<Distinct>,
/// MSSQL syntax: `TOP (<N>) [ PERCENT ] [ WITH TIES ]`
pub top: Option<Top>,
Expand Down Expand Up @@ -511,7 +512,7 @@ impl fmt::Display for NamedWindowDefinition {
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct With {
// Token for the "WITH" keyword
/// Token for the "WITH" keyword
pub with_token: AttachedToken,
pub recursive: bool,
pub cte_tables: Vec<Cte>,
Expand Down Expand Up @@ -564,7 +565,7 @@ pub struct Cte {
pub query: Box<Query>,
pub from: Option<Ident>,
pub materialized: Option<CteAsMaterialized>,
// Token for the closing parenthesis
/// Token for the closing parenthesis
pub closing_paren_token: AttachedToken,
}

Expand Down
50 changes: 40 additions & 10 deletions src/ast/spans.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,21 +21,51 @@ use super::{

/// Given an iterator of spans, return the [Span::union] of all spans.
fn union_spans<I: Iterator<Item = Span>>(iter: I) -> Span {
iter.reduce(|acc, item| acc.union(&item))
.unwrap_or(Span::empty())
Span::union_iter(iter)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was useful enough I felt making it a method in Span with more documentation and examples would make it easier to find

}

/// A trait for AST nodes that have a source span for use in diagnostics.
/// Trait for AST nodes that have a source location information.
///
/// Source spans are not guaranteed to be entirely accurate. They may
/// be missing keywords or other tokens. Some nodes may not have a computable
/// span at all, in which case they return [`Span::empty()`].
/// # Notes:
///
/// Source [`Span`] are not yet complete. They may be missing:
///
/// 1. keywords or other tokens
/// 2. span information entirely, in which case they return [`Span::empty()`].
///
/// Note Some impl blocks (rendered below) are annotated with which nodes are
/// missing spans. See [this ticket] for additional information and status.
///
/// [this ticket]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548
///
/// # Example
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt an example will help people understand how to use the Spans

/// ```
/// # use sqlparser::parser::{Parser, ParserError};
/// # use sqlparser::ast::Spanned;
/// # use sqlparser::dialect::GenericDialect;
/// # use sqlparser::tokenizer::Location;
/// # fn main() -> Result<(), ParserError> {
/// let dialect = GenericDialect {};
/// let sql = r#"SELECT *
/// FROM table_1"#;
/// let statements = Parser::new(&dialect)
/// .try_with_sql(sql)?
/// .parse_statements()?;
/// // Get the span of the first statement (SELECT)
/// let span = statements[0].span();
/// // statement starts at line 1, column 1 (1 based, not 0 based)
/// assert_eq!(span.start, Location::new(1, 1));
/// // statement ends on line 2, column 15
/// assert_eq!(span.end, Location::new(2, 15));
/// # Ok(())
/// # }
/// ```
///
/// Some impl blocks may contain doc comments with information
/// on which nodes are missing spans.
pub trait Spanned {
/// Compute the source span for this AST node, by recursively
/// combining the spans of its children.
/// Return the [`Span`] (the minimum and maximum [`Location`]) for this AST
/// node, by recursively combining the spans of its children.
///
/// [`Location`]: crate::tokenizer::Location
fn span(&self) -> Span;
}

Expand Down
59 changes: 58 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
//! 1. [`Parser::parse_sql`] and [`Parser::new`] for the Parsing API
//! 2. [`ast`] for the AST structure
//! 3. [`Dialect`] for supported SQL dialects
//! 4. [`Spanned`] for source text locations (see "Source Spans" below for details)
//!
//! [`Spanned`]: ast::Spanned
//!
//! # Example parsing SQL text
//!
Expand Down Expand Up @@ -61,13 +64,67 @@
//! // The original SQL text can be generated from the AST
//! assert_eq!(ast[0].to_string(), sql);
//! ```
//!
//! [sqlparser crates.io page]: https://crates.io/crates/sqlparser
//! [`Parser::parse_sql`]: crate::parser::Parser::parse_sql
//! [`Parser::new`]: crate::parser::Parser::new
//! [`AST`]: crate::ast
//! [`ast`]: crate::ast
//! [`Dialect`]: crate::dialect::Dialect
//!
//! # Source Spans
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This content is largely migrated from docs/source_spans.md

//!
//! Starting with version `0.53.0` sqlparser introduced source spans to the
//! AST. This feature provides source information for syntax errors, enabling
//! better error messages. See [issue #1548] for more information and the
//! [`Spanned`] trait to access the spans.
//!
//! [issue #1548]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548
//! [`Spanned`]: ast::Spanned
//!
//! ## Migration Guide
//!
//! For the next few releases, we will be incrementally adding source spans to the
//! AST nodes, trying to minimize the impact on existing users. Some breaking
//! changes are inevitable, and the following is a summary of the changes:
//!
//! #### New fields for spans (must be added to any existing pattern matches)
//!
//! The primary change is that new fields will be added to AST nodes to store the source `Span` or `TokenWithLocation`.
//!
//! This will require
//! 1. Adding new fields to existing pattern matches.
//! 2. Filling in the proper span information when constructing AST nodes.
//!
//! For example, since `Ident` now stores a `Span`, to construct an `Ident` you
//! must provide now provide one:
//!
//! Previously:
//! ```text
//! # use sqlparser::ast::Ident;
//! Ident {
//! value: "name".into(),
//! quote_style: None,
//! }
//! ```
//! Now
//! ```rust
//! # use sqlparser::ast::Ident;
//! # use sqlparser::tokenizer::Span;
//! Ident {
//! value: "name".into(),
//! quote_style: None,
//! span: Span::empty(),
//! };
//! ```
//!
//! Similarly, when pattern matching on `Ident`, you must now account for the
//! `span` field.
//!
//! #### Misc.
//! - [`TokenWithLocation`] stores a full `Span`, rather than just a source location.
//! Users relying on `token.location` should use `token.location.start` instead.
//!
//![`TokenWithLocation`]: tokenizer::TokenWithLocation

#![cfg_attr(not(feature = "std"), no_std)]
#![allow(clippy::upper_case_acronyms)]
Expand Down
Loading