Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P4fmt]: attaching comments to IR Nodes #4845

Merged
merged 15 commits into from
Oct 24, 2024
Merged

Conversation

snapdgn
Copy link
Contributor

@snapdgn snapdgn commented Aug 1, 2024

No description provided.

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved
@snapdgn snapdgn force-pushed the comments-handling branch 3 times, most recently from 82edb45 to 37344eb Compare August 8, 2024 13:13
@snapdgn snapdgn marked this pull request as ready for review August 8, 2024 13:14
@snapdgn snapdgn marked this pull request as draft August 8, 2024 14:33
@snapdgn snapdgn marked this pull request as ready for review August 8, 2024 14:43
@snapdgn snapdgn marked this pull request as draft August 8, 2024 14:49
@snapdgn
Copy link
Contributor Author

snapdgn commented Aug 8, 2024

This takes care of attachment of comments to basic parent Node types(Typedef, Type_Name, control blocks, structs etc).
The Node List is not exhaustive and underdeveloped, and support for further nodes will be added gradually.
It currently attempts to store the comments related to each node in two lists, 'before' (holds comments that are directly above the nodes) and after(contains comments that are on the same line as the nodes).
Plan is to hold all the comments related to a node and attach them at all at appropriate positions based on column information.
The comment list will store a list of code points, with each entry containing a pair of the SourcePosition and the corresponding comment.

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
}

//////////////////////// TYPES ////////////////////
bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the criterion for a supported node here? Otherwise I would use IR::Statement, IR::Type_Declaration, or IR::StatOrDecl here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the algorithm presented in the blog post (the Bazel approach) is that the preorder traversal will visit all AST nodes, and it can pick the right AST node to attach comments to automatically. The first node with a comment right before it will get that comment attached to it, and that comment will be removed from the list of comments to be attached, so that any child AST node on the same line won't get the same comment attached again.

This algorithm can guarantee a pretty reasonable attachment of line comments. And a similar algorithm can be applied in a postorder traversal to attach the inline trailing/suffix comments.

@fruffy I don't have enough knowledge on this, but is there a more convenient way in P4C to express that I want to do the same thing for all AST nodes regardless of its type? Maybe as you mentioned in another comment, adding IR::Node to a preorder is the way to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *) and postoder(Node *). Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ... one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder would likely place it. But I guess that can't be really said in general and either will be good enough.

Copy link
Contributor

@vlstill vlstill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the point of the compiler in general, I think there should not the the mutable in the SourceInfo and I am concern if this could impact compiler performance.

Other than that, I don't think naming all the node types is a good approach.

lib/source_file.h Outdated Show resolved Hide resolved
lib/source_file.h Outdated Show resolved Hide resolved
Comment on lines 219 to 223
mutable std::vector<Util::Comment *> before;
mutable std::vector<Util::Comment *> after;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point -- this make the SourceInfo, which is ubiquitous in the compiler, quite a bit larger (by 6 pointers with the common impl of vector). It would be good to have some benchmarks to see this does not unduly show down the compiler (not p4fmt backend) or increase its memory consumption too much.

Copy link
Contributor

@asl asl Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd go this way, I'd strongly suggest using small vectors that could hold up to 1 entry inline. But overall I concur that enlarging every IR node by this amount should be carefully benchmarked. We do create lots of nodes. There is huge malloc traffic everywhere. And while another 48 bytes might look small, it is an overhead paid on every node. Even if there are zero comments.

Also, comments are quite rare and here the price is paid by every node. Maybe the better solution is to use the technique similar to TrailingObjects in LLVM / clang.

Copy link
Contributor Author

@snapdgn snapdgn Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of storing those prefix and suffix comments as a part of SourceInfo, I was thinking how about storing that extra object in a hashmap, Hashmap<node-id, extra-comment-object>. This would avoid the unnecessary overhead to the compiler.
Would this be a good and straightforward solution for the time being?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your plan to update this side map on IR change? Overall, side long-living maps should be discouraged as they could easily go stale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to add some more context. We were thinking about using this side map only in p4fmt. With this constraint, there probably isn't any IR change expected I guess.

@snapdgn has also looked into using TrailingObjects. I agree that would be a better solution. My suggestion is to go with a simple solution for now, to unblock further p4fmt experimentation, as long as the solution doesn't influence the rest of P4C. Then switching to the TrailingObjects technique could be left as a future optimization.

@asl What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this side map is local, then certainly we're fine.

lib/source_file.h Outdated Show resolved Hide resolved
lib/source_file.h Outdated Show resolved Hide resolved
lib/source_file.h Outdated Show resolved Hide resolved
lib/source_file.h Show resolved Hide resolved
}

//////////////////////// TYPES ////////////////////
bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *) and postoder(Node *). Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ... one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder would likely place it. But I guess that can't be really said in general and either will be good enough.

@snapdgn snapdgn force-pushed the comments-handling branch 4 times, most recently from 2aff0aa to da4b314 Compare August 20, 2024 08:30
@snapdgn snapdgn marked this pull request as ready for review August 21, 2024 05:52
@snapdgn
Copy link
Contributor Author

snapdgn commented Aug 22, 2024

Comments are stored in a side map for now, as discussed.

Copy link
Collaborator

@fruffy fruffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick round of comments.

It would be good to have a reference output here. We can use the checker that you wrote. This way we can see the current behavior.

lib/source_file.cpp Outdated Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
lib/source_file.h Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Show resolved Hide resolved
backends/p4fmt/attach.cpp Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
@qobilidop
Copy link
Member

There are merge conflicts with the main branch. Please rebase/merge and resolve them.

@snapdgn snapdgn force-pushed the comments-handling branch 4 times, most recently from b87606d to 0dfc44a Compare August 26, 2024 10:04
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. There are no tests as well.

backends/p4fmt/attach.h Outdated Show resolved Hide resolved
@asl
Copy link
Contributor

asl commented Sep 12, 2024

@asl @vlstill Any remaining concerns? I'll try to merge this if no new issues are raised.

@qobilidop See my comments on concerns

@snapdgn
Copy link
Contributor Author

snapdgn commented Sep 13, 2024

See comments. There are no tests as well.

The tests are intended to be included as part of this PR.

@snapdgn snapdgn requested a review from asl September 24, 2024 15:50
private:
/// This Hashmap tracks each comment’s attachment status to IR nodes. Initially, all comments
/// are set to 'false'.
std::unordered_map<const Util::Comment *, bool> processedComments;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need bool value here? Why can't you simply have std::unordered_set<Util::Comment *>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I deleted the comment from the set once it was attached, but modifying the container while iterating over it didn't seem like a good approach. I could use a separate 'Used Comments' set for this, if that seems better? Open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see why you need to modify this set during the iteration. Essentially, your code is:

    for (auto &[comment, isAttached] : processedComments) {
        // Skip if already attached
        if (isAttached) {
            continue;
        }

        switch (...) {
        case A:
            ...
            isAttached = true;
        case B:
            ...
            isAttached = true;
        default:
            error();
        }
}

Why do you need to modify anything?

  • The first continue seems to be redundant to me, as you always either attaching everything or bail out
  • You are always attaching the whole set.

So why can't you simply attach everything and then do processedComments.clear()?

Copy link
Contributor Author

@snapdgn snapdgn Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the dealyed reply, missed the notification :( . Also Sorry for not clarifying enough earlier. processedComments is a shared list of all the comments from the file that may get attached to different nodes. I'm using the marking thing to ensure that the same comment won't get attached to another node later. Wouldn't clearing it remove everything from the container, making it difficult to use them for attaching comments to other nodes in the future ? Please let me know if I'm still missing something. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asl any further comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fruffy I need to study the logic again, will try to do today

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snapdgn Ok, so essentially you're having "working set" and "visited list" and you're fusing all of them in the single map.

It seems you're making the copy of this map, is this intentional? Your top-level globalCommentsMap will still have all the things there. Maybe you'd need to move it inside? Or take by reference?

Copy link
Contributor Author

@snapdgn snapdgn Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asl Yeah, it was not intentional. Modified to take it by ref now. Thanks

@qobilidop qobilidop added the p4fmt Topics related to P4 formatter. label Sep 30, 2024
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>

Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Copy link
Contributor

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

@fruffy fruffy enabled auto-merge October 24, 2024 16:11
@fruffy fruffy added this pull request to the merge queue Oct 24, 2024
Merged via the queue into p4lang:main with commit eb7a897 Oct 24, 2024
18 checks passed
@snapdgn snapdgn deleted the comments-handling branch October 25, 2024 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p4fmt Topics related to P4 formatter.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants