Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL-chunking algorithm may treat invalid SQL as a comment #14

Open
koyae opened this issue May 17, 2016 · 7 comments
Open

SQL-chunking algorithm may treat invalid SQL as a comment #14

koyae opened this issue May 17, 2016 · 7 comments

Comments

@koyae
Copy link

koyae commented May 17, 2016

At least on my own system, the following passes with exit-code 0:

SELECT '
-- this is not really a comment' AS c;
SELECT '
-- neither is this' AS c spam;

Despite the fact the above statements should actually fail with something like:

psql:shouldfail3.sql:4: ERROR:  syntax error at or near "spam"
LINE 2: -- neither is this' AS c spam;

No error is emitted because the statement is incorrectly prepared as:

EXEC SQL SELECT '
-- this is not really a comment' AS c;
SELECT '
-- neither is this' AS c spam;

I presume that ECPG just ignores everything after the first semicolon above, and therefore the invalid SQL in this example is simply not encountered.

I discovered this when I was trying to understand how the chunking algorithm did its thing, having wanted to grok it before trying to implement the ability to step over psql-commands.

More realistic instances similar to the above scenario likely exist under use of $$.

Short of fully integrating a tokenizer to get around this, I could probably just up the rock-paper-scissors game that's already going on in prepare_sql() with the if not in_block_comment business. Doing so should allow psql-stuff to be handled more readily.

What's your take?

@markdrago
Copy link
Owner

It's funny - when I was reading this code while reviewing your earlier PR I was kind of surprised that there was no quote token in the list of tokens. I guess I just shrugged it off, but that seems to be the issue here.

If you're willing to do the rock-paper-scissor upgrade then I say go for it. At some point this whole thing should get some attention, but it can probably stand up to a little bit more complexity before it needs to be gutted. It's your call.

@koyae
Copy link
Author

koyae commented May 18, 2016

Here's the blueprint. This burns down any evil tricks I can come up with.
pgsanity state diagram
I'll probably do a naïve implementation first (just reading one character at a time). If that stands up to functional testing but runs sluggishly, we can bring regexes and token-hopping into the equation similar to how you had it already.

@markdrago
Copy link
Owner

Nice diagram! This sounds good but I'm curious about a few things, some of which may be due to me misunderstanding the diagram so feel free to correct me.

Does it handle the following:

  • a block comment coming to an end with */
  • quotes that may appear in a comment //here's a comment with a quote
  • within double dollar signs I'm pretty sure anything goes as you could be putting non-SQL in there

Also, the goal of the original sql_prep is to replace certain parts of the SQL with their C equivalents (mostly comments) - is that represented here? How is that being handled?

@koyae
Copy link
Author

koyae commented May 19, 2016

To address your points:

  • Yes. C-like block comments started with /* do indeed come to an end with */, sorry; I had a typo where I'd notated it as both starting _and_ stopping with /*. I've edited my earlier comment to update the graphic. (_NOTE:_ for the moment, we'll be ignoring the psql -related stuff until we have the essentials down.)
  • I tried commenting with // against a Postgres database and it doesn't appear to be treated specially as anything other than a syntax error. I assume you meant -- here's a comment with a quote. If that's the case, we can handle that just fine; note that the only way for the machine's state to transition away from the ---island is to encounter \n; if other characters are encountered in the meantime, they will be consumed, but the state will not change. *
  • Yep. The only thing that breaks you out of $$ is another $$. The $$-island will _mostly_ just help the chunking loop identify semicolons which don't actually represent the end of an SQL-statement. The states for -- and /* have the same purpose, as do those for ' and " although the transitions to-and-from them are slightly more complicated since they are escape-characters depending on context. *

* I have added the self-transitions which should help to illustrate the behavior described. I had notated _some_ of them in the previous version of the diagram but omitted others. They should now all be present.

Correct me if I'm wrong below... The goal of implementing the map of states and transitions – illustrated in my previous comment – should be to:

  1. Allow the machine to reliably identify where an SQL-chunk should actually stop, so that individual, correctly-separated SQL-statements e.g. INSERT INTO mouth VALUES ('food'); can all be passed to ECPG without giving it bad information about the full contents of a given statement.

    ...To be specific, each complete SQL-statement ought to be transformed to EXEC SQL <full SQL-statement>; ECPG may give false positives/negatives if statements are not correctly identified and presented in this way.
  2. Allow the machine to remove certain elements which are not valid, such as the directives that are meant for psql, and – potentially – comments in general. **

** I was leaning towards simply stripping out all the block _and_ single-line comments and not even giving them to ECPG; it seems slightly easier than doing a transformation on them _in some cases_ (-- -> //), and leaving them as-is in others (/* -> /*).

@markdrago
Copy link
Owner

Yes. This sounds great. Thanks for updating the diagram - it's much more clear to me now. All of your plans look sound to me. I'll leave it up to you, but the reason I opted to convert -- to // was that I felt it left less room for error. However, finding the next \n and stripping the contents should be pretty easy to get right. Like I said - your call. I'm looking forward to seeing what you come up with!

koyae added a commit to koyae/pgsanity that referenced this issue May 30, 2016
@koyae
Copy link
Author

koyae commented May 30, 2016

I forgot to state it in my commit-comment, but 46573e8 fixes this issue.

@koyae
Copy link
Author

koyae commented May 30, 2016

Although I did end up stripping the contents of end-of-line -- comments, I discovered that expressions like /*/**/ are indeed syntax errors, at least when tested against Postgres 9.4:

ERROR:  unterminated /* comment at or near "/*/**/"
LINE 1: /*/**/
        ^

For that reason, I decided to leave block comments alone when preprocessing input so that any related errors can be correctly detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants