-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[P4fmt]: attaching comments to IR Nodes #4845
Conversation
82edb45
to
37344eb
Compare
This takes care of attachment of comments to basic parent Node types(Typedef, Type_Name, control blocks, structs etc). |
backends/p4fmt/attach.cpp
Outdated
} | ||
|
||
//////////////////////// TYPES //////////////////// | ||
bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the criterion for a supported node here? Otherwise I would use IR::Statement
, IR::Type_Declaration
, or IR::StatOrDecl
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the algorithm presented in the blog post (the Bazel approach) is that the preorder traversal will visit all AST nodes, and it can pick the right AST node to attach comments to automatically. The first node with a comment right before it will get that comment attached to it, and that comment will be removed from the list of comments to be attached, so that any child AST node on the same line won't get the same comment attached again.
This algorithm can guarantee a pretty reasonable attachment of line comments. And a similar algorithm can be applied in a postorder traversal to attach the inline trailing/suffix comments.
@fruffy I don't have enough knowledge on this, but is there a more convenient way in P4C to express that I want to do the same thing for all AST nodes regardless of its type? Maybe as you mentioned in another comment, adding IR::Node
to a preorder
is the way to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *)
and postoder(Node *)
. Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ...
one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder
would likely place it. But I guess that can't be really said in general and either will be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the point of the compiler in general, I think there should not the the mutable
in the SourceInfo
and I am concern if this could impact compiler performance.
Other than that, I don't think naming all the node types is a good approach.
lib/source_file.h
Outdated
mutable std::vector<Util::Comment *> before; | ||
mutable std::vector<Util::Comment *> after; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point -- this make the SourceInfo
, which is ubiquitous in the compiler, quite a bit larger (by 6 pointers with the common impl of vector
). It would be good to have some benchmarks to see this does not unduly show down the compiler (not p4fmt
backend) or increase its memory consumption too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd go this way, I'd strongly suggest using small vectors that could hold up to 1 entry inline. But overall I concur that enlarging every IR node by this amount should be carefully benchmarked. We do create lots of nodes. There is huge malloc traffic everywhere. And while another 48 bytes might look small, it is an overhead paid on every node. Even if there are zero comments.
Also, comments are quite rare and here the price is paid by every node. Maybe the better solution is to use the technique similar to TrailingObjects
in LLVM / clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of storing those prefix and suffix comments as a part of SourceInfo
, I was thinking how about storing that extra object in a hashmap, Hashmap<node-id, extra-comment-object>
. This would avoid the unnecessary overhead to the compiler.
Would this be a good and straightforward solution for the time being?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is your plan to update this side map on IR change? Overall, side long-living maps should be discouraged as they could easily go stale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to add some more context. We were thinking about using this side map only in p4fmt. With this constraint, there probably isn't any IR change expected I guess.
@snapdgn has also looked into using TrailingObjects
. I agree that would be a better solution. My suggestion is to go with a simple solution for now, to unblock further p4fmt experimentation, as long as the solution doesn't influence the rest of P4C. Then switching to the TrailingObjects
technique could be left as a future optimization.
@asl What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this side map is local, then certainly we're fine.
backends/p4fmt/attach.cpp
Outdated
} | ||
|
||
//////////////////////// TYPES //////////////////// | ||
bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *)
and postoder(Node *)
. Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ...
one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder
would likely place it. But I guess that can't be really said in general and either will be good enough.
2aff0aa
to
da4b314
Compare
Comments are stored in a side map for now, as discussed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick round of comments.
It would be good to have a reference output here. We can use the checker that you wrote. This way we can see the current behavior.
There are merge conflicts with the main branch. Please rebase/merge and resolve them. |
b87606d
to
0dfc44a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. There are no tests as well.
@qobilidop See my comments on concerns |
The tests are intended to be included as part of this PR. |
backends/p4fmt/attach.h
Outdated
private: | ||
/// This Hashmap tracks each comment’s attachment status to IR nodes. Initially, all comments | ||
/// are set to 'false'. | ||
std::unordered_map<const Util::Comment *, bool> processedComments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need bool
value here? Why can't you simply have std::unordered_set<Util::Comment *>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I deleted the comment from the set once it was attached, but modifying the container while iterating over it didn't seem like a good approach. I could use a separate 'Used Comments' set for this, if that seems better? Open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see why you need to modify this set during the iteration. Essentially, your code is:
for (auto &[comment, isAttached] : processedComments) {
// Skip if already attached
if (isAttached) {
continue;
}
switch (...) {
case A:
...
isAttached = true;
case B:
...
isAttached = true;
default:
error();
}
}
Why do you need to modify anything?
- The first
continue
seems to be redundant to me, as you always either attaching everything or bail out - You are always attaching the whole set.
So why can't you simply attach everything and then do processedComments.clear()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the dealyed reply, missed the notification :( . Also Sorry for not clarifying enough earlier. processedComments
is a shared list of all the comments from the file that may get attached to different nodes. I'm using the marking thing to ensure that the same comment won't get attached to another node later. Wouldn't clearing it remove everything from the container, making it difficult to use them for attaching comments to other nodes in the future ? Please let me know if I'm still missing something. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asl any further comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fruffy I need to study the logic again, will try to do today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snapdgn Ok, so essentially you're having "working set" and "visited list" and you're fusing all of them in the single map.
It seems you're making the copy of this map, is this intentional? Your top-level globalCommentsMap
will still have all the things there. Maybe you'd need to move it inside? Or take by reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asl Yeah, it was not intentional. Modified to take it by ref now. Thanks
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me> Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
5c5feae
to
a3c5c10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
No description provided.