Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments at the top of the file are deleted #23

Closed
ghost opened this issue Apr 22, 2019 · 16 comments
Closed

Comments at the top of the file are deleted #23

ghost opened this issue Apr 22, 2019 · 16 comments
Labels
difficult Quite hard to achieve

Comments

@ghost
Copy link

ghost commented Apr 22, 2019

-- hello

select 1

Running pgpp master on the file above returns

SELECT 1

I would prefer that it retain the comments :-)

@lelit
Copy link
Owner

lelit commented Apr 23, 2019

Unfortunately that's difficult: see libpg_query's #15 for details.

As Lukas suggests, with lot of gymnastics we could

  1. with a pre-parser replace all comments with equivalent-length string of spaces, keeping them safe with their original position
  2. parse the resulting comment-free statement
  3. while printing back, we look at each node position, and if it "higher" than the current comment, we emit the comment first, removing it from the "queue"

As an alternative approach, replace point 3 with

  1. inject "syntetic" Comment nodes in the parsed tree, again using original position to find the right place
  2. properly implement Comment printer

@lelit lelit added the cantdo Cannot be done label Dec 7, 2019
@lelit lelit closed this as completed Dec 7, 2019
@Hultner
Copy link

Hultner commented Oct 25, 2020

I discussed this briefly with Lele over email and would like to add an excerpt from reply here.

I looked at the issues you referenced and what you mention in #23 as an alternative approach is very similar to the way yapf goes about it from the reference in my previous email. I do agree and think that altering the PG parser is likely to require significant effort, and the alternative approach of “re-injecting” the comment nodes into the AST sounds like a much more viable starting-point to achieve something like that. I am not very familiar with lbpg_query but as long as we have a reference between AST-nodes and their original place in the code doing a injection should be reasonable without too much hacking about, I might even be able to hack a PoC together in a fork if such is the case.

Here's the mentioned comment splicer from yapf. From my point of view this approach is very similar to 3 & 4 above and seems like a reasonable way to move forward. Would love to hear your comments on this.

@lelit
Copy link
Owner

lelit commented Oct 25, 2020

I think a first step toward the goal is implementing the pre-parser speculated in point 1, able to extract only the comments and keeping them apart, with their absolute position in the original statement.

@lelit lelit reopened this Oct 25, 2020
@lelit lelit added difficult Quite hard to achieve and removed cantdo Cannot be done labels Oct 26, 2020
@SKalt
Copy link

SKalt commented Apr 20, 2021

Looks like the linked libpg_query issue #15 on preserving comments was closed as implemented:

Closing this, since this has been implemented in the scanner method released in pg_query 2.0:
https://github.com/pganalyze/libpg_query#usage-scanning-a-query-into-its-tokens-using-the-postgresql-scannerlexer
(note it doesn't really fit the logic to put this information in the parser, so you would have to use both the scanner and the parser method in some cases)
-- pganalyze/libpg_query#15 (comment)

Does this news make implementing comment preservation easier? If so, what would the steps be to using the new pg_query functionaility via libpg_query?

@lelit
Copy link
Owner

lelit commented Apr 21, 2021

I will try to understand the effort needed to expose the pg_query_scan() function as time permits.

lelit added a commit that referenced this issue Apr 25, 2021
This is a preliminary implementation of the first step mentioned in commit
#23 (comment) of issue #23.
@lelit
Copy link
Owner

lelit commented Apr 25, 2021

I've implemented a preliminary scan() function in the parser module, that exposes libpgquery's pg_query_scan(). This is just the first step...

@SKalt
Copy link

SKalt commented Apr 25, 2021

I'm seeing commit 55aed47 on the v3 branch. What would the next steps be? I can contribute test cases if nothing else.

@lelit
Copy link
Owner

lelit commented Apr 25, 2021

The plan is: add a boolean option to the prettify() function, that when True it first collects all comments in the original statement the pass them to the IndentedStream constructor. At that point, its print_node() method should emit in a way to be determined the comments as soon as the node carries a location greater than that of the comments.

lelit added a commit that referenced this issue Apr 28, 2021
…tring

The C functions exposed by the underlying libpg_query operates on the UTF-8 encoded C string,
and thus both the nodes "location" and the possible errors position are relative to that, not
to the original Python string: when it contains non-ASCII, the offsets are clearly different.

This fixes the issue with the exceptions raised in case of errors and with the tokens info
returned by the new scan() function, but more work is needed to adjust the "location" slot of
the AST nodes: we need that to make further steps of issue #23.
lelit added a commit that referenced this issue Apr 30, 2021
This complement commit 795f32c, passing
down the needed information to the node's constructors.

Now we can move on to the easy parts of issue #23.
lelit added a commit that referenced this issue May 3, 2021
This implements the basic behaviour requested in issue #23: it isn't neither perfect nor
beautiful, but... what is?
@lelit
Copy link
Owner

lelit commented May 3, 2021

A first cut of this feature is present in the just released 3.0dev0 version.
I'm not really satisfied of the outcome, but it's a start... Any feedback is appreciated!

@SKalt
Copy link

SKalt commented May 4, 2021

Initial feedback:

  • I'd like the comments to be left in their original -- form if possible; maybe test for newlines in the comment body?
  • some comments get added to the next line when they follow a ,-preceded item.
demos
echo '-- hello

select 1
' > ./normal.sql;

echo 'CREATE TABLE foo(
  bar INT -- an informative comment;
  , baz TEXT -- another comment,
);' > ./tricky.sql

echo '
CREATE OR REPLACE FUNCTION my_func(x INT) RETURNS TABLE(x INT) LANGUAGE plpgsql AS $$ BEGIN return query select * from foo; END $$;
'  > ./plpg.sql

docker run -it -v $(pwd):/workspace --workdir /worksapce python:3.8-buster bash
pip install pglast==3.0.dev0 

function compare_pre_post() {
  local file_to_format=${1:?missing required positional argument}; shift;
  echo "before ----------------------";
  cat $file_to_format | tee /tmp/pre;
  echo "after ------------------------";
  python -m pglast $@ $file_to_format /dev/stdout | tee /tmp/post;
  echo "compared ------------------";
  diff -u /tmp/pre /tmp/post;
}

compare_pre_post ./normal.sql --preserve-comments | sed 's/^/# /g'
# before ----------------------
# -- hello
# 
# select 1
# 
# after ------------------------
# /* hello */
# SELECT 1
# compared ------------------
# --- /tmp/pre  2021-05-04 15:09:26.569431088 +0000
# +++ /tmp/post 2021-05-04 15:09:26.726431096 +0000
# @@ -1,4 +1,2 @@
# --- hello
# -
# -select 1
# -
# +/* hello */
# +SELECT 1
compare_pre_post ./tricky.sql --preserve-comments | sed 's/^/# /g'
# before ----------------------
# CREATE TABLE foo(
#   bar INT -- an informative comment;
#   , baz TEXT -- another comment,
# );
# after ------------------------
# CREATE TABLE foo (
#     bar integer
#   , /* an informative comment; */ baz text
# ) /* another comment, */ 
# compared ------------------
# --- /tmp/pre  2021-05-04 15:10:35.471343512 +0000
# +++ /tmp/post 2021-05-04 15:10:35.639343520 +0000
# @@ -1,4 +1,4 @@
# -CREATE TABLE foo(
# -  bar INT -- an informative comment;
# -  , baz TEXT -- another comment,
# -);
# +CREATE TABLE foo (
# +    bar integer
# +  , /* an informative comment; */ baz text
# +) /* another comment, */ 
compare_pre_post ./malformed.plpg.sql --preserve-comments | sed 's/^/# /g'
# before ----------------------
# 
# CREATE OR REPLACE FUNCTION my_func(x INT) RETURNS TABLE(x INT) LANGUAGE plpgsql AS $$ BEGIN return query select * from foo; END $$;
# 
# after ------------------------
# CREATE OR REPLACE FUNCTION my_func(x integer)
# RETURNS TABLE (x integer)LANGUAGE plpgsql
# AS $$ BEGIN return query select * from foo; END $$
# compared ------------------
# --- /tmp/pre  2021-05-04 15:38:25.591646027 +0000
# +++ /tmp/post 2021-05-04 15:38:25.747646035 +0000
# @@ -1,3 +1,3 @@
# -
# -CREATE OR REPLACE FUNCTION my_func(x INT) RETURNS TABLE(x INT) LANGUAGE plpgsql AS $$ BEGIN return query select * from foo; END $$;
# -
# +CREATE OR REPLACE FUNCTION my_func(x integer)
# +RETURNS TABLE (x integer)LANGUAGE plpgsql
# +AS $$ BEGIN return query select * from foo; END $$

@lelit
Copy link
Owner

lelit commented May 4, 2021

Thank you.

The first is doable, at least when one using the prettifying printer, but not when using the compact one: the latter does not emit newlines, so embedded comments obviously must be in the C format.

The second is not possible, as not all nodes carry their original position (for example literal values), so pglast cannot determine whether the comment is before of after them.

@lelit
Copy link
Owner

lelit commented May 4, 2021

The logic is pretty simple: the print_comment() method emits the text, and it gets called by print_node() when the given node was originally after the next available comment.
Maybe you can do some experiment and suggest different approaches?

lelit added a commit that referenced this issue May 14, 2021
Keep the original comment style, at least when prettifying code.

See issue #23.
@lelit
Copy link
Owner

lelit commented May 14, 2021

I improved the printing of comments, maintaining the original style.

@lelit
Copy link
Owner

lelit commented May 14, 2021

See here for an example.

@SKalt
Copy link

SKalt commented May 14, 2021

Looks good to me! I'll see if I can get a chance to play around with the code further in the near future.

@lelit
Copy link
Owner

lelit commented May 17, 2021

Present in release v3.0.dev1.
Feel free to re-open this (or a new issue) if there is something else that can be done.
Thank you!

@lelit lelit closed this as completed May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficult Quite hard to achieve
Projects
None yet
Development

No branches or pull requests

3 participants