Comments at the top of the file are deleted #23

ghost · 2019-04-22T22:28:23Z

-- hello

select 1

Running pgpp master on the file above returns

SELECT 1

I would prefer that it retain the comments :-)

The text was updated successfully, but these errors were encountered:

lelit · 2019-04-23T05:55:52Z

Unfortunately that's difficult: see libpg_query's #15 for details.

As Lukas suggests, with lot of gymnastics we could

with a pre-parser replace all comments with equivalent-length string of spaces, keeping them safe with their original position
parse the resulting comment-free statement
while printing back, we look at each node position, and if it "higher" than the current comment, we emit the comment first, removing it from the "queue"

As an alternative approach, replace point 3 with

inject "syntetic" Comment nodes in the parsed tree, again using original position to find the right place
properly implement Comment printer

Hultner · 2020-10-25T12:47:58Z

I discussed this briefly with Lele over email and would like to add an excerpt from reply here.

I looked at the issues you referenced and what you mention in #23 as an alternative approach is very similar to the way yapf goes about it from the reference in my previous email. I do agree and think that altering the PG parser is likely to require significant effort, and the alternative approach of “re-injecting” the comment nodes into the AST sounds like a much more viable starting-point to achieve something like that. I am not very familiar with lbpg_query but as long as we have a reference between AST-nodes and their original place in the code doing a injection should be reasonable without too much hacking about, I might even be able to hack a PoC together in a fork if such is the case.

Here's the mentioned comment splicer from yapf. From my point of view this approach is very similar to 3 & 4 above and seems like a reasonable way to move forward. Would love to hear your comments on this.

lelit · 2020-10-25T12:59:31Z

I think a first step toward the goal is implementing the pre-parser speculated in point 1, able to extract only the comments and keeping them apart, with their absolute position in the original statement.

SKalt · 2021-04-20T18:11:06Z

Looks like the linked libpg_query issue #15 on preserving comments was closed as implemented:

Closing this, since this has been implemented in the scanner method released in pg_query 2.0:
https://github.com/pganalyze/libpg_query#usage-scanning-a-query-into-its-tokens-using-the-postgresql-scannerlexer
(note it doesn't really fit the logic to put this information in the parser, so you would have to use both the scanner and the parser method in some cases)
-- pganalyze/libpg_query#15 (comment)

Does this news make implementing comment preservation easier? If so, what would the steps be to using the new pg_query functionaility via libpg_query?

lelit · 2021-04-21T05:37:39Z

I will try to understand the effort needed to expose the pg_query_scan() function as time permits.

This is a preliminary implementation of the first step mentioned in commit #23 (comment) of issue #23.

lelit · 2021-04-25T12:53:22Z

I've implemented a preliminary scan() function in the parser module, that exposes libpgquery's pg_query_scan(). This is just the first step...

SKalt · 2021-04-25T14:24:19Z

I'm seeing commit 55aed47 on the v3 branch. What would the next steps be? I can contribute test cases if nothing else.

lelit · 2021-04-25T17:33:42Z

The plan is: add a boolean option to the prettify() function, that when True it first collects all comments in the original statement the pass them to the IndentedStream constructor. At that point, its print_node() method should emit in a way to be determined the comments as soon as the node carries a location greater than that of the comments.

…tring The C functions exposed by the underlying libpg_query operates on the UTF-8 encoded C string, and thus both the nodes "location" and the possible errors position are relative to that, not to the original Python string: when it contains non-ASCII, the offsets are clearly different. This fixes the issue with the exceptions raised in case of errors and with the tokens info returned by the new scan() function, but more work is needed to adjust the "location" slot of the AST nodes: we need that to make further steps of issue #23.

This complement commit 795f32c, passing down the needed information to the node's constructors. Now we can move on to the easy parts of issue #23.

This implements the basic behaviour requested in issue #23: it isn't neither perfect nor beautiful, but... what is?

lelit · 2021-05-03T12:53:29Z

A first cut of this feature is present in the just released 3.0dev0 version.
I'm not really satisfied of the outcome, but it's a start... Any feedback is appreciated!

SKalt · 2021-05-04T15:40:01Z

Initial feedback:

I'd like the comments to be left in their original -- form if possible; maybe test for newlines in the comment body?
some comments get added to the next line when they follow a ,-preceded item.

demos

echo '-- hello

select 1
' > ./normal.sql;

echo 'CREATE TABLE foo(
  bar INT -- an informative comment;
  , baz TEXT -- another comment,
);' > ./tricky.sql

echo '
CREATE OR REPLACE FUNCTION my_func(x INT) RETURNS TABLE(x INT) LANGUAGE plpgsql AS $$ BEGIN return query select * from foo; END $$;
'  > ./plpg.sql

docker run -it -v $(pwd):/workspace --workdir /worksapce python:3.8-buster bash
pip install pglast==3.0.dev0 

function compare_pre_post() {
  local file_to_format=${1:?missing required positional argument}; shift;
  echo "before ----------------------";
  cat $file_to_format | tee /tmp/pre;
  echo "after ------------------------";
  python -m pglast $@ $file_to_format /dev/stdout | tee /tmp/post;
  echo "compared ------------------";
  diff -u /tmp/pre /tmp/post;
}

compare_pre_post ./normal.sql --preserve-comments | sed 's/^/# /g'
# before ----------------------
# -- hello
# 
# select 1
# 
# after ------------------------
# /* hello */
# SELECT 1
# compared ------------------
# --- /tmp/pre  2021-05-04 15:09:26.569431088 +0000
# +++ /tmp/post 2021-05-04 15:09:26.726431096 +0000
# @@ -1,4 +1,2 @@
# --- hello
# -
# -select 1
# -
# +/* hello */
# +SELECT 1
compare_pre_post ./tricky.sql --preserve-comments | sed 's/^/# /g'
# before ----------------------
# CREATE TABLE foo(
#   bar INT -- an informative comment;
#   , baz TEXT -- another comment,
# );
# after ------------------------
# CREATE TABLE foo (
#     bar integer
#   , /* an informative comment; */ baz text
# ) /* another comment, */ 
# compared ------------------
# --- /tmp/pre  2021-05-04 15:10:35.471343512 +0000
# +++ /tmp/post 2021-05-04 15:10:35.639343520 +0000
# @@ -1,4 +1,4 @@
# -CREATE TABLE foo(
# -  bar INT -- an informative comment;
# -  , baz TEXT -- another comment,
# -);
# +CREATE TABLE foo (
# +    bar integer
# +  , /* an informative comment; */ baz text
# +) /* another comment, */ 
compare_pre_post ./malformed.plpg.sql --preserve-comments | sed 's/^/# /g'
# before ----------------------
# 
# CREATE OR REPLACE FUNCTION my_func(x INT) RETURNS TABLE(x INT) LANGUAGE plpgsql AS $$ BEGIN return query select * from foo; END $$;
# 
# after ------------------------
# CREATE OR REPLACE FUNCTION my_func(x integer)
# RETURNS TABLE (x integer)LANGUAGE plpgsql
# AS $$ BEGIN return query select * from foo; END $$
# compared ------------------
# --- /tmp/pre  2021-05-04 15:38:25.591646027 +0000
# +++ /tmp/post 2021-05-04 15:38:25.747646035 +0000
# @@ -1,3 +1,3 @@
# -
# -CREATE OR REPLACE FUNCTION my_func(x INT) RETURNS TABLE(x INT) LANGUAGE plpgsql AS $$ BEGIN return query select * from foo; END $$;
# -
# +CREATE OR REPLACE FUNCTION my_func(x integer)
# +RETURNS TABLE (x integer)LANGUAGE plpgsql
# +AS $$ BEGIN return query select * from foo; END $$

lelit · 2021-05-04T16:34:04Z

Thank you.

The first is doable, at least when one using the prettifying printer, but not when using the compact one: the latter does not emit newlines, so embedded comments obviously must be in the C format.

The second is not possible, as not all nodes carry their original position (for example literal values), so pglast cannot determine whether the comment is before of after them.

lelit · 2021-05-04T16:39:19Z

The logic is pretty simple: the print_comment() method emits the text, and it gets called by print_node() when the given node was originally after the next available comment.
Maybe you can do some experiment and suggest different approaches?

Keep the original comment style, at least when prettifying code. See issue #23.

lelit · 2021-05-14T17:46:37Z

I improved the printing of comments, maintaining the original style.

lelit · 2021-05-14T17:48:07Z

See here for an example.

SKalt · 2021-05-14T18:36:30Z

Looks good to me! I'll see if I can get a chance to play around with the code further in the near future.

lelit · 2021-05-17T06:58:00Z

Present in release v3.0.dev1.
Feel free to re-open this (or a new issue) if there is something else that can be done.
Thank you!

lelit added the cantdo Cannot be done label Dec 7, 2019

lelit closed this as completed Dec 7, 2019

lelit reopened this Oct 25, 2020

lelit added difficult Quite hard to achieve and removed cantdo Cannot be done labels Oct 26, 2020

lelit added a commit that referenced this issue Apr 25, 2021

Expose the pg_query_scan() function as parser.scan()

55aed47

This is a preliminary implementation of the first step mentioned in commit #23 (comment) of issue #23.

lelit added a commit that referenced this issue Apr 30, 2021

Adjust also the nodes location

cc5409c

This complement commit 795f32c, passing down the needed information to the node's constructors. Now we can move on to the easy parts of issue #23.

lelit added a commit that referenced this issue May 3, 2021

Preliminary implementation of the "preserve comments" feature

649ecb5

This implements the basic behaviour requested in issue #23: it isn't neither perfect nor beautiful, but... what is?

lelit added a commit that referenced this issue May 14, 2021

Refine printing of SQL comments

3be9045

Keep the original comment style, at least when prettifying code. See issue #23.

lelit closed this as completed May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments at the top of the file are deleted #23

Comments at the top of the file are deleted #23

ghost commented Apr 22, 2019

lelit commented Apr 23, 2019

Hultner commented Oct 25, 2020

lelit commented Oct 25, 2020

SKalt commented Apr 20, 2021

lelit commented Apr 21, 2021

lelit commented Apr 25, 2021

SKalt commented Apr 25, 2021

lelit commented Apr 25, 2021

lelit commented May 3, 2021

SKalt commented May 4, 2021

lelit commented May 4, 2021

lelit commented May 4, 2021

lelit commented May 14, 2021

lelit commented May 14, 2021

SKalt commented May 14, 2021

lelit commented May 17, 2021