Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add a SQL IR and factor out optimizations. #80

Merged
merged 3 commits into from
Apr 27, 2023

Conversation

jacksonrnewhouse
Copy link
Contributor

This introduces arroyo-sql/src/plan_graph.rs, which is a new intermediate representation for our SQL planning. Prior to this change we were converting from DataFusion's logical plan to a SqlOperator and then the SqlOperator produced an arroyo_datastream::Program. However, because the SqlOperator is not in a graph it was challenging to perform optimization passes. In order to capture some optimizations they would happen in-line with the conversions from LogicalPlan to SqlOperator. As we added more logic like this it was getting unwieldy. The new sequence for compiling SQL to a pipeline is.

  • Parse the request using DataFusion's PostgresSqlDialect.
  • Have data fusion translate this to a logical plan.
  • Run DataFusion optimization over the logical plan.
  • Convert the logical plan to a SqlOperator.
  • Convert the SqlOperator to a PlanGraph.
  • Run arroyo-sql optimizations over PlanGraph.
  • Convert to arroyo_datastream::Program.
  • Run arroyo-api optimizations over that program.
  • Produce the crate for the pipeline.
  • Compile the crate.

It's a lot of steps, but firming up what happens where should allow us to continue to improve our engine.

@jacksonrnewhouse jacksonrnewhouse requested a review from mwylde April 26, 2023 17:25
@mwylde
Copy link
Member

mwylde commented Apr 26, 2023

Getting a compilation error for this query:

WITH bids as (SELECT bid.auction as auction, bid.datetime as datetime
        FROM (select bid from nexmark) where bid is not null)
        SELECT AuctionBids.auction as auction, AuctionBids.num as count
        FROM (
          SELECT
            B1.auction,
            HOP(INTERVAL '2' SECOND, INTERVAL '10' SECOND) as window,
            count(*) AS num
          FROM bids B1
          GROUP BY 1,2) AS AuctionBids
        JOIN (
          SELECT max(num) AS maxn, window
          FROM (
            SELECT count(*) AS num, HOP(INTERVAL '2' SECOND, INTERVAL '10' SECOND) AS window
            FROM bids B2
            GROUP BY B2.auction, 2
        ) AS CountBids
          GROUP BY 2
        ) AS MaxBids
        ON
           AuctionBids.num = MaxBids.maxn
           and AuctionBids.window = MaxBids.window;

@mwylde
Copy link
Member

mwylde commented Apr 26, 2023

This query compiles (interestingly enough, it fails on master) but doesn't produce any output:

SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (
        PARTITION BY window 
        ORDER BY bids DESC) as row_num   
    FROM (SELECT count(*) as bids,
        bid.auction as auction_id,
        hop(interval '2 seconds', interval '1 minute') as window
            FROM nexmark
            WHERE bid is not null
            group by auction_id, window)) WHERE row_num <= 3```

@jacksonrnewhouse
Copy link
Contributor Author

Looking into those.

@jacksonrnewhouse
Copy link
Contributor Author

This is working now.

filter: None,
}) => {
if args.len() != 1 {
bail!("unexpected arg length");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an error that would be sent back to the user? or does it imply a bug in our code? if it's the latter, then it should probably be a panic instead.

},
JoinListFlatten(JoinType, StructPair),
JoinPairFlatten(JoinType, StructPair),
// TODO: figure out naming of various things called 'window'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlWindowFunction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants