Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds parsing and modeling for INSERT INTO <table name> #1621

Draft
wants to merge 2 commits into
base: prep-v0_14_10
Choose a base branch
from

Conversation

johnedquinn
Copy link
Member

Relevant Issues

Description

  • Adds parsing and modeling for INSERT INTO <table name>
  • Updates release version to 0.14.10 and updates README/CHANGELOG.

Low Level Details

  • ANTLR
    • Adds <table name> and <identifier chain> rules to PartiQLParser.g4, which largely matches the SQL:1999 EBNF.
  • PIG AST
    • Adds table_name and identifier_chain rules to partiql.ion.
    • table_name really just wraps an identifier_chain.
    • insert's target is now a table_name -- not an expression (which realistically was always expr.id, which isn't right because it held a scope qualifier (AKA @).
    • I didn't update insert_value, remove, or set -- since it's not part of the feature request. However, we might want to consider updating these in the future.
  • PIG LOGICAL
    • Modifies dml_target to hold a table_name -- no longer holds an unqualified identifier.
    • This has direct implications on dml_insert and dml_update and indirect implications on UPSERT/REPLACE since these are modeled using insert.
  • LOGICAL TO LOGICAL RESOLVED
    • I had to update the GlobalVariableResolver to allow for the resolution of qualified identifiers, also resulting in a BindingId, which is essentially a qualified BindingName.
    • I also needed to add a new GlobalResolutionResult that allows for namespaced names. See the Javadocs there.

Why in draft?

  • I'm waiting for some preliminary feedback before I write more comprehensive tests for this.

Other Information

  • Updated Unreleased Section in CHANGELOG: YES

  • Any backward-incompatible changes? [YES/NO]

    • < If YES, why? >
    • < For this purpose, we define backward-incompatible changes as changes that—when consumed—can potentially result in
      errors for users that are using our public APIs or the entities that have public visibility in our code-base. >
  • Any new external dependencies? [YES/NO]

    • < If YES, which ones and why? >
    • < In addition, please also mention any other alternatives you've considered and the reason they've been discarded >
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES/NO]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@johnedquinn johnedquinn requested a review from dlurton October 22, 2024 17:11
Copy link
Member

@dlurton dlurton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my I do

partiql-ast/src/main/pig/partiql.ion Outdated Show resolved Hide resolved
@@ -165,9 +202,26 @@ removeCommand
insertCommandReturning
: INSERT INTO pathSimple VALUE value=expr ( AT pos=expr )? onConflictLegacy? returningClause?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be a reason you're leaving pathSimple here instead of changing it to tableName like the other DML statements? Not that I need this for my product...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limited scope, since this change is for your use-case.

partiql-ast/src/main/pig/partiql.ion Outdated Show resolved Hide resolved
Comment on lines +27 to +33
/**
* When attempting to resolve a qualified identifier, say `FROM cat1.schema1.table1.attr1.attr2`, the [GlobalVariableResolver]
* _may_ resolve only the first few steps of the identifier, say `cat1.schema1.table1`. Therefore, it must
* return the resolved [uniqueId] (say `cat1:::schema1:::table1`) and the remaining steps `attr1.attr2` in order for the planner
* to convert these to path expressions.
*/
abstract fun getRemainingSteps(): List<BindingName>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this might be a little too forward thinking... i.e. trying to design for a future we know little about. Or is there more to this than I am aware of?

In my case, all parts of a table identifier will be completely resolved or we will abort w/undefined table error. This might make sense in the event that we have only part of the schema... but that is not in my foreseeable future. I suggest dropping this.

If this is dropped, I don't think we need a distinction between a GlobalVariable and a NamespacedVariable either... GlobalVariable will still be fine for our case... Even though we're kind of using it in a way not entirely consistent with the name. Maybe, "ResolvedVariable" is a better name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again about this... was the intention to handle the scenario when some of the parts of the qualified identifier could be resolved but others not? I guess I'm not 100% sure I understand the intent here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again about this... was the intention to handle the scenario when some of the parts of the qualified identifier could be resolved but others not?

Yes, that is the intention. In the INSERT INTO <table name> use-case, you wouldn't allow remainingSteps to have anything. But, in a query such as the one mentioned in the Javadocs, it is useful to know what is actually a global vs what is a path expression. This allows for the engine-implementer to enforce the accuracy of the path expression and for the global-variable-resolver implementer to only take care of returning "global" variables.

This is something that has been added to v1.

@@ -56,6 +72,18 @@ fun interface GlobalVariableResolver {
*/
fun resolveGlobal(bindingName: BindingName): GlobalResolutionResult
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop this--it is redundant when you have resolveGlobal(BindingId). Or, more accurately, drop this function and move its documentation (with modification) to the other overload. Then, remove the default implementation of resolveGlobal(BindingId) so we're forced to implement it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backwards compatibility purposes, when adding new APIs, we've been adding default implementations that "may" utilize existing implemented methods, so that users who don't want to update their code don't have to. That being said, you're the only customer here, so it's up to you.

@@ -17,6 +18,21 @@ sealed class GlobalResolutionResult {
*/
data class GlobalVariable(val uniqueId: String) : GlobalResolutionResult()
Copy link
Member

@dlurton dlurton Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change I've been wanting to make forever is to make uniqueId here an IonElement so we can put arbitrary Ion data here instead of having to "abuse" this field by putting Ion-text here instead. I'll leave a note in the main PIG domain where we need to change this as well.... EDIT: github won't let me comment there--to do this we would also need to change partiql_logical_resolved.expr.global_id.unique_id to the ion type.

Since we're breaking this API anyway, now might be a good time to do that as well...

Updates products to records in PIG

Updates naming of root/parts/qualifier

Simplifies BindingId and moves to separate file

Simplifies modeling of identifier chain
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants