Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON version of display_indent() #2889

Closed
avantgardnerio opened this issue Jul 12, 2022 · 2 comments · Fixed by #2892
Closed

JSON version of display_indent() #2889

avantgardnerio opened this issue Jul 12, 2022 · 2 comments · Fixed by #2892
Labels
enhancement New feature or request

Comments

@avantgardnerio
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

A significant portion of my time on #2885 was spent manually formatting subqueries like:

Projection: #SUM(lineitem.l_extendedprice) / Float64(7) AS avg_yearly
  Aggregate: groupBy=[[]], aggr=[[SUM(#lineitem.l_extendedprice)]]
    Filter: #part.p_brand = Utf8("Brand#23") AND #part.p_container = Utf8("MED BOX") AND #lineitem.l_quantity < (Subquery: Projection: Float64(0.2) * #AVG(lineitem.l_quantity)
  Aggregate: groupBy=[[]], aggr=[[AVG(#lineitem.l_quantity)]]
    Filter: #lineitem.l_partkey = #part.p_partkey
      TableScan: lineitem)
      Inner Join: #lineitem.l_partkey = #part.p_partkey
        TableScan: lineitem
        TableScan: part

into

Projection: #SUM(lineitem.l_extendedprice) / Float64(7) AS avg_yearly
  Aggregate: groupBy=[[]], aggr=[[SUM(#lineitem.l_extendedprice)]]
    Filter: #part.p_brand = Utf8("Brand#23") AND #part.p_container = Utf8("MED BOX") 
        AND #lineitem.l_quantity < (
            Subquery: Projection: Float64(0.2) * #AVG(lineitem.l_quantity)
                Aggregate: groupBy=[[]], aggr=[[AVG(#lineitem.l_quantity)]]
                    Filter: #lineitem.l_partkey = #part.p_partkey
                        TableScan: lineitem
        )
      Inner Join: #lineitem.l_partkey = #part.p_partkey
        TableScan: lineitem
        TableScan: part

because the indents aren't trustworthy when subqueries are taken into account.

Also, I've had a hard time writing unit tests for optimizers because building them programmatically is verbose and error-prone.

Describe the solution you'd like

If we were to create a visitor method like display_json(), and an equivalent LogicalPlan::from_json() then both problems would be solved fairly elegantly. As a bonus, plans would be readable and manipulable by common off the the shelf tools and readable by the uninitiated.

Describe alternatives you've considered

Fixing the indentation on the formatter, writing different types of tests, crying into my beer.

@avantgardnerio avantgardnerio added the enhancement New feature or request label Jul 12, 2022
@avantgardnerio
Copy link
Contributor Author

As a potential implementation option, we could use something like https://docs.serde.rs/serde_json/ with our existing visitor pattern. I know some folks are reticent to add dependencies though, so I thought I would open this up for discussion. I'm after the ends, the means I'm flexible on.

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Jul 13, 2022
tustvold added a commit that referenced this issue Jul 17, 2022
* Add optional serde support to datafusion-proto (#2889)

* Add public methods for JSON serde (#64)

* Misc suggestions

* Update datafusion/proto/Cargo.toml

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

* Fixes

* Fixup Cargo.toml

* Format Cargo.toml

Co-authored-by: Brent Gardner <bgardner@squarelabs.net>
@avantgardnerio
Copy link
Contributor Author

Thank you!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant