Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(System)Verilog: escaped identifiers (LRM 5.6.1) #4129

Merged
merged 1 commit into from
Dec 8, 2024

Conversation

cousteaulecommandant
Copy link
Contributor

@cousteaulecommandant cousteaulecommandant commented Nov 25, 2024

Add support for escaped identifiers in Verilog and SystemVerilog: \ + zero or more non-whitespace characters + whitespace.

Note that the \ itself isn't part of the identifier, and that \foo is the same as just foo (unlike in VHDL), but that identifiers identical to keywords such as \begin are also allowed.

This definition also theoretically allows an empty identifier, \, which ctags doesn't support.
To avoid issues, empty identifiers (\) or identifiers that start with (a second) \ keep the leading \ even if it's technically not part of the name.

Escaped identifiers are defined in the standard in section 3.7.1 (Verilog) / 5.6.1 (SystemVerilog).

Support for escaped identifiers was listed in the Verilog TODO list in #2674.

Copy link

codecov bot commented Nov 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.90%. Comparing base (e5650e9) to head (040f96c).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4129   +/-   ##
=======================================
  Coverage   85.90%   85.90%           
=======================================
  Files         239      239           
  Lines       58727    58733    +6     
=======================================
+ Hits        50447    50453    +6     
  Misses       8280     8280           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cousteaulecommandant
Copy link
Contributor Author

By the way, I also added a unit test for the feature. I don't know what's the protocol here, if I add the unit test myself to demonstrate the feature or if someone else should add it to verify that what I did works.

@masatake
Copy link
Member

The error looks strange. I will look into this.

@cousteaulecommandant
Copy link
Contributor Author

cousteaulecommandant commented Nov 27, 2024

The error looks strange. I will look into this.

Which error do you mean?

If you mean the CI error, I checked and it seems to be a false positive on an unrelated test I didn't touch. (CI does that sometimes.)
I ran the unit tests locally and they worked.

@cousteaulecommandant cousteaulecommandant force-pushed the v_escaped branch 2 times, most recently from 38eda10 to 5dc2b1f Compare December 1, 2024 15:38
@leleliu008
Copy link
Member

The CI failure has been fixed in master branch. please rebase master.

@cousteaulecommandant
Copy link
Contributor Author

Rebased and all CI checks passed 😃 Thanks!

c = vGetc (); // skip leading '\'
// A single `\` would result in an empty identifier, which is unsupported.
// Add the initial `\` in that case, and also in case it starts with `\\`.
if (!isgraph (c) || c == '\\')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if C is EOF (== -1).
isgraph(c) may return false. Then \ will be appended to token->name. It's not so harmful that I am afraid of. But putting something like goto end; will be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behavior is correct.

We treat the sequence \<EOF> the same as \<whitespace>, i.e. the token finishes right there so it's just a single "\". This would generate an empty identifier which ctags doesn't like, so as suggested by @roccomao we add the leading \ to that corner case, and also to all identifier names starting with \. Note that this is only done for the first character.

If I'm interpreting the C standard, §7.4 correctly, passing EOF to isgraph() et al is valid and would return false, since EOF is not a graph character. So EOF will be treated the same as space, whitespace, control characters, etc., which is what we want.

(PS: Note that I'm using while and not do while to handle this case, since the sequence of characters after the \ may be empty, whereas in the case of regular identifiers there's one or more characters, hence do while)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

masatake
masatake previously approved these changes Dec 3, 2024
@hirooih
Copy link
Contributor

hirooih commented Dec 3, 2024

First, thank you for your great contribution.

To avoid issues, empty identifiers () or identifiers that start with (a second) \ keep the leading \ even if it's technically not part of the name.

I doubt that this specification is correct.


This definition also theoretically allows an empty identifier,

LRM just says:

Neither the leading backslash character nor the terminating white space is considered to be part of the identifier. Therefore, an escaped identifier \cpu3 is treated the same as a nonescaped identifier cpu3.

It is not clear an empty identifier is allowed.
Other other hand, "\\ should be treated the same as a nonescaped identifier \" from this definition..

tags (5)

The tags file is a list of lines, each line in the format:

{tagname}<Tab>{tagfile}<Tab>{tagaddress}

{tagname}

Any identifier, not containing white space..

EXCEPTION: Universal Ctags violates this item of the proposal; tagname may contain spaces. However, tabs are not allowed.

This format can support an empty identifier "theoretically". Even if it were not currently supported, it would not be worth the bother to support it, I think.

@cousteaulecommandant
Copy link
Contributor Author

To avoid issues, empty identifiers (\) or identifiers that start with (a second) \ keep the leading \ even if it's technically not part of the name.

I doubt that this specification is correct.

Well, it's what the standard says; the \ is not considered to be part of the identifier, so it's there just to indicate that an escaped identifier follows. But we're just discussing semantics at this point.
(Although you may argue that ctags should explicitly indicate "dangerous" identifiers such as \end, \1+1, or \\x by explicitly adding the leading \ always, to remind the user that they need to include the \ to refer to the identifier, with the exception being identifiers that CAN be written without the \, such as \foo. That behavior could be implemented; I think ctags provides all the tools for that; but I think it's overcomplicating things.)

I can remove that part from the commit message if you prefer.

It is not clear an empty identifier is allowed.

Not in that paragraph, but from the LRM,1 section A.9.3:

escaped_identifier ::= \ {any_printable_ASCII_character_except_white_space} white_space

with { } meaning zero or more. If they meant one or more, they would've put an extra character explicitly (they do so e.g. in the definition of system_task_identifier).

On the other hand, "\\ should be treated the same as a nonescaped identifier \" from this definition.

But \ is not a (valid) nonescaped identifier, so that statement doesn't apply. Same for things like \1+1 or \#CS. It does apply to cpu3 which is a valid nonescaped identifier though.

This format can support an empty identifier "theoretically". Even if it were not currently supported, it would not be worth the bother to support it, I think.

Yeah, definitely not. As I said, I don't think ctags must encode exactly what the standard says the identifier is, and even if it could, I imagine it would cause all sorts of trouble. So adding the \ is fine in my opinion.

Footnotes

  1. The Verilog-2005 standard says it's "any character except white space", and doesn't specify that it has to be printable, but I'm ignoring that detail and assuming they didn't mean to include all the ASCII control characters so isgraph() will do.

@hirooih
Copy link
Contributor

hirooih commented Dec 3, 2024

escaped_identifier ::= \ {any_printable_ASCII_character_except_white_space} white_space

OK. It is clear an empty identifier is allowed.

But \ is not a (valid) nonescaped identifier, so that statement doesn't apply.

I don't understand what you mean, here...

On the other hand, "\\ should be treated the same as a nonescaped identifier \" from this definition.

I meant;

localparam \\ = \ ;

this line should emit;

\\	input.v	/^localparam \\\\ = \\ ;$/;"	c	module:themodule#(x=42)

instead of (the line 6 of expected.tags);

\\\\	input.v	/^localparam \\\\ = \\ ;$/;"	c	module:themodule#(x=42)

How can we explain that the latter follows the LRM?

@masatake
Copy link
Member

masatake commented Dec 4, 2024

You can make a tag with an empty string when you set true to ParserDefinition::allowNullTag.
If you don't want to touch the parser-wide configuration, we can add a new member allowNullTag to tagEntryInfo.

@masatake masatake self-requested a review December 4, 2024 01:12
@masatake masatake dismissed their stale review December 4, 2024 01:13

The issue solved in this pull request is more complex than I assumed.

@roccomao
Copy link
Contributor

roccomao commented Dec 4, 2024

How can we explain that the latter follows the LRM?

This is indeed not in accordance with LRM. But let me explain the previous suggestion in detail and sort out the different opinions at first. Then determine how should we make a trade-off?

Because of the ambiguity of LRM, the code in my previous comment was actually verified in the QuestaSim and it works fine. I haven't tested other compilers, but this should give us a reference. Because we don't care what its official name is, we just give each identifier a tag name, too.

Firstly, This is an example given by the LRM 5.6.1 chapter. The document only describes that escaped keyword is an exception.

| declaration            | LRM identifier        |
| :--------------------- | :-------------------- |
| \busa+index            | busa+index            |
| \-clock                | -clock                |
| \***error-condition*** | ***error-condition*** |
| \net1/\net2            | net1/\net2            |
| \{a,b}                 | {a,b}                 |
| \a*(b+c)               | a*(b+c)               |
| \net                   | \net                  |

The compiler does the opposite. It does not consider keywords. It only looks at whether there are special characters in the identifier to determine whether there should have a leading \ character.

| declaration            | compiler identifier    |
| :--------------------- | :--------------------- |
| \busa+index            | \busa+index            |
| \-clock                | \-clock                |
| \***error-condition*** | \***error-condition*** |
| \net1/\net2            | \net1/\net2            |
| \{a,b}                 | \{a,b}                 |
| \a*(b+c)               | \a*(b+c)               |
| \net                   | net                    |

Next, let's look at the empty identifier (a single \), again using my previous example to test:

image

  1. The compiler supports the empty identifier (a single \ followed by space).
  2. If there are special characters at any position, we should add \ at the beginning (like \r\n, \test*var).
  3. Normal identifiers, even keywords, ignore the leading \ (like \begin, \test).

So, finally, how do we create tag names?

  1. We can follow LRM completely, that is, remove the leading \ from all identifiers except \keyword (escaped keyword). The example is as follows:

    (identifier) ->  (tag name)
    \busa+index  ->  busa+index
    \net1/\net2  ->  net1/\net2
    \*           ->  *
    \\           ->  \
    \\\          ->  \\
    \test        ->  test
    \begin       ->  \begin
    \            ->  ERROR
    

    But this will require comparing all keywords, and will cause problems if a single \, the empty identifier, is parsed.

  2. We can also follow the compiler's parsing ideas and only consider whether there are special characters in any position, regardless of keywords. The example is as follows:

    (identifier) ->  (tag name)
    \busa+index  ->  \busa+index
    \net1/\net2  ->  \net1/\net2
    \*           ->  \*
    \\           ->  \\
    \\\          ->  \\\
    \test        ->  test
    \begin       ->  begin
    \            ->  \
    

    This solves the problem of empty identifiers (a single \), allowing us to create a tag name \ for empty identifier, but this implementation requires us to scan the entire identifier to see if it contains special characters to decide whether to add a leading \ to the tag name.

  3. So my personal idea and previous suggestions combine the above two points, mainly to make the parsing of tags easier. The example is as follows:

    (identifier) ->  (tag name)
    \busa+index  ->  busa+index
    \net1/\net2  ->  net1/\net2
    \*           ->  *
    \\           ->  \\
    \\\          ->  \\\
    \test        ->  test
    \begin       ->  begin
    \            ->  \
    

    That is, by default, we do not consider keywords, and then follow LRM: "Neither the leading backslash character nor the terminating white space is considered to be part of the identifier". But in order to accept the empty identifier (a single \) and give it the tag name \, we also need to add an extra \ to all identifiers that start with \, otherwise, for example, \ and \\ will have the same tag name \.

    The advantage of this idea is that it accepts empty identifiers (a single \). At the same time, there is no need to compare whether the entire identifier is a keyword (escaped keyword), and there is no need to scan the entire identifier to see if it contains special characters. It only needs to determine whether the character after the beginning \ character is also \. This is also what the current latest PR implements, and it is probably the view that @cousteaulecommandant agrees with. @hirooih what do you think?

@cousteaulecommandant
Copy link
Contributor Author

How can we explain that the latter follows the LRM?

My bad; I misunderstood what you meant, but I get you now.

As @roccomao already mentioned, the addition of the extra \ was added following their suggestion to do so; the original commit didn't have it and just removed the \ from ALL escaped identifiers, even the empty one. Ctags didn't like this and complained. So I thought that the suggestion to add the \ to handle that case was a good idea (and to any tag starting with \, because otherwise \ and \\ would be indistinguishable).

The options I see to handle escaped identifiers are:

  1. Follow LRM strictly and always remove the \, even for empty identifiers. We should modify the behavior so that empty identifiers are allowed.
  2. Follow LRM strictly but ignore the \ identifier; if someone decides to use it, don't list it as a tag.
  3. Add a leading \ only to identifiers starting with \ to disambiguate (@roccomao's suggestion).
  4. Add a leading \ to all identifiers that aren't valid nonescaped identifiers by checking their characters, but not to identifiers that look like keywords (e.g. \end -> end). (Note that my current code will still interpret these as identifiers and not keywords because I'm assigning the token kind explicitly.)
  5. Add a leading \ to all identifiers that need to be escaped, either because they contain special characters or because they're keywords. This approach matches "what you would need to write" to use the identifier.

Option 1 is the most strictly compliant with the LRM "semantic" definition of an identifier, and option 5 is technically also what the LRM defines as an escaped_identifier (but this is from a syntax point of view). The rest are less "compliant" compromise solutions.

If, as @masatake said, setting allowNullTag is an option (and doesn't involve changing anything outside of the Verilog parser and can be enabled for Verilog only), then I'd say that (1.) is the best approach.

@hirooih
Copy link
Contributor

hirooih commented Dec 4, 2024

@roccomao and @cousteaulecommandant,

Thank you for your detailed explanation.

If, as @masatake said, setting allowNullTag is an option (and doesn't involve changing anything outside of the Verilog parser and can be enabled for Verilog only), then I'd say that (1.) is the best approach.

@cousteaulecommandant

Could you try "1." with the following patch?

diff --git a/parsers/verilog.c b/parsers/verilog.c
index 7e69640ef..754c16026 100644
--- a/parsers/verilog.c
+++ b/parsers/verilog.c
@@ -2192,6 +2192,7 @@ extern parserDefinition* VerilogParser (void)
        def->parser     = findVerilogTags;
        def->initialize = initializeVerilog;
        def->selectLanguage  = selectors;
+       def->allowNullTag = true;
        return def;
 }
 
@@ -2208,5 +2209,6 @@ extern parserDefinition* SystemVerilogParser (void)
        def->extensions = extensions;
        def->parser     = findVerilogTags;
        def->initialize = initializeSystemVerilog;
+       def->allowNullTag = true;
        return def;
 }

The current main passed all tests (which don't have any Null Tags.) with this patch.

@masatake
Copy link
Member

masatake commented Dec 4, 2024

If you want, I will add this change.

commit d78b623bee0d963f39274ce28cfd2d5318e14fbf (HEAD -> master)
Author: Masatake YAMATO <yamato@redhat.com>
Date:   Wed Dec 4 22:05:10 2024 +0900

    main: add allowNullTag flag to tagEntryInfo
    
    We have allowNullTag in parserDefinition already but enabling the new
    flag has a much smaller impact.
    
    Signed-off-by: Masatake YAMATO <yamato@redhat.com>

diff --git a/main/entry.c b/main/entry.c
index 6795c1b82..bdd509271 100644
--- a/main/entry.c
+++ b/main/entry.c
@@ -1938,7 +1938,7 @@ extern int makeTagEntry (const tagEntryInfo *const tag)
 
    if (tag->name [0] == '\0' && (!tag->placeholder))
    {
-       if (!doesInputLanguageAllowNullTag())
+       if (! (doesInputLanguageAllowNullTag() || tag->allowNullTag))
            error (NOTICE, "ignoring null tag in %s(line: %lu, language: %s)",
                   getInputFileName (), tag->lineNumber,
                   getLanguageName (tag->langType));
diff --git a/main/entry.h b/main/entry.h
index 19ac0865d..f600fa652 100644
--- a/main/entry.h
+++ b/main/entry.h
@@ -76,6 +76,7 @@ struct sTagEntryInfo {
                                             * Set in the cork queue; don't touch this.*/
    unsigned int boundaryInfo: 2; /* info about nested input stream */
    unsigned int inIntevalTab:1;
+   unsigned int allowNullTag:1;    /* allow a tag with an empty string. */
 
    unsigned long lineNumber;     /* line number of tag;
                                     use updateTagLine() for updating this member. */

@hirooih
Copy link
Contributor

hirooih commented Dec 4, 2024

@masatake san,

Thank you. But I don't think the verilog parser does not need this.

I far as I think of, the escaped identifier which we are discussing in this thread is the only tag which can be Null in Verilog and SystemVerilog. We don't have to specify it for each tag.

@cousteaulecommandant
Copy link
Contributor Author

cousteaulecommandant commented Dec 4, 2024

@cousteaulecommandant

Could you try "1." with the following patch?

I tested it with a null tag and it didn't create an entry for it. Is that expected? (It did suppress the warning message though; without @hirooih's patch it displays the warning.)

@@ -2,11 +2,10 @@ themodule#(x=42)	input.v	/^module \\themodule#(x=42) ($/;"	m
 clk,rst,	input.v	/^    input wire \\clk,rst, , \\d ,$/;"	p	module:themodule#(x=42)
 d	input.v	/^    input wire \\clk,rst, , \\d ,$/;"	p	module:themodule#(x=42)
 1+1=2	input.v	/^    output reg \\1+1=2$/;"	p	module:themodule#(x=42)
-\\	input.v	/^localparam \\ = 1;$/;"	c	module:themodule#(x=42)
-\\\\	input.v	/^localparam \\\\ = \\ ;$/;"	c	module:themodule#(x=42)
-\\\\\\	input.v	/^localparam \\\\\\ = \\\\ ;$/;"	c	module:themodule#(x=42)
+\\	input.v	/^localparam \\\\ = \\ ;$/;"	c	module:themodule#(x=42)
+\\\\	input.v	/^localparam \\\\\\ = \\\\ ;$/;"	c	module:themodule#(x=42)
 r\\n	input.v	/^localparam \\r\\n = \\\\\\ ;$/;"	c	module:themodule#(x=42)
-\\\\r\\n	input.v	/^localparam \\\\r\\n = \\r\\n ;$/;"	c	module:themodule#(x=42)
-\\\\\\r\\n	input.v	/^localparam \\\\\\r\\n = \\\\r\\n ;$/;"	c	module:themodule#(x=42)
+\\r\\n	input.v	/^localparam \\\\r\\n = \\r\\n ;$/;"	c	module:themodule#(x=42)
+\\\\r\\n	input.v	/^localparam \\\\\\r\\n = \\\\r\\n ;$/;"	c	module:themodule#(x=42)
 end	input.v	/^always @(posedge \\clk,rst, ) begin : \\end$/;"	b	module:themodule#(x=42)
-\\\\end	input.v	/^    if (\\\\\\r\\n ) begin : \\\\end$/;"	b	block:themodule#(x=42).end
+\\end	input.v	/^    if (\\\\\\r\\n ) begin : \\\\end$/;"	b	block:themodule#(x=42).end

If this is the expected behavior I'll push it.

(PS: please don't pay attention to how I accidentally closed and reopened this PR.)

@cousteaulecommandant
Copy link
Contributor Author

In fact, is allowNullTag tested at all? The only other language that uses it is JSON and the unit tests don't include empty labels.

{
  "": "empty",
  "x": "nonempty"
}

only yields:

x	input.json	/^  "x": "nonempty"$/;"	s

@masatake
Copy link
Member

masatake commented Dec 5, 2024

I misunderstood. Ctags cannot make an empty tag.
The allowNulTag = true suppresses warnings when a parser tries to make an empty tag. Sorry for making you confused.

@cousteaulecommandant
Copy link
Contributor Author

Don't worry :)

So what do we do then? Option 1 is not available. Shall we go for option 2? That means not creating tags for the null identifier, i.e., there is one identifier ctags will ignore. This can be done via the allowNullTag option (the name is misleading; should be "tagMayBeNull"), explicitly in the parsing code (if identifier is empty, return without creating a tag) or simply trigger a warning because we want to inform the user that a tag was dropped.

If we do want to include a tag for \, that leaves us on options 3 to 5 – we have to give it a made up name as a replacement for the empty string. It doesn't even have to start with \; it could be (null) or (null identifier) (the latter contains a space so it's impossible that it is mistaken for a literal \(null)).

@hirooih
Copy link
Contributor

hirooih commented Dec 5, 2024

That means not creating tags for the null identifier, i.e., there is one identifier ctags will ignore.

I think this is the way to go.

This can be done via the allowNullTag option (the name is misleading; should be "tagMayBeNull"), explicitly in the parsing code (if identifier is empty, return without creating a tag)

I think ignoreNullTag is a better name.

or simply trigger a warning because we want to inform the user that a tag was dropped.

I think this is better, but just ignoring is enough.


And I hope someone will implement the true allowNullTag. Don't you? Masatake-san :-)

@cousteaulecommandant
Copy link
Contributor Author

I think ignoreNullTag is a better name.

Perhaps, yeah.

or simply trigger a warning because we want to inform the user that a tag was dropped.

I think this is better, but just ignoring is enough.

I mentioned this because triggering a warning was the behavior you got in my original commit, since the default behavior for ctags is to display the message "Notice: ignoring null tag". If I remove the def->allowNullTag = true; I have added (but not pushed yet) we'll get this behavior.
Then again, keeping it enabled "just in case support for null tags is added in a future" might be a good idea.
Whichever you decide I'll do :)

And I hope someone will implement the true allowNullTag. Don't you? Masatake-san :-)

This could be useful for a variety of languages. JSON is currently the only one that enables this, but e.g. Tcl allows empty variable/array/function names. (No idea how this would be handled internally, but I'll let the gurus of ctags figure out.) :)

@hirooih
Copy link
Contributor

hirooih commented Dec 5, 2024

If I remove the def->allowNullTag = true; I have added (but not pushed yet) we'll get this behavior.

I see. Let's remove it!

@cousteaulecommandant
Copy link
Contributor Author

cousteaulecommandant commented Dec 6, 2024

Do I remove the localparam \ from the unit test as well?
EDIT: I've preserved it for the time being, since it doesn't interfere with the unit tests and may result in a pleasant surprise in a future when these identifiers become supported.
EDIT 2: Nope! Seems to cause CI tests to fail (even though it worked for me locally). Removed. (And rebased master.)

@cousteaulecommandant cousteaulecommandant force-pushed the v_escaped branch 2 times, most recently from 3fd3c3c to fb0084d Compare December 6, 2024 00:56
Add support for escaped identifiers in Verilog and SystemVerilog:
`\` + zero or more non-whitespace characters + whitespace.

Note that the `\` itself isn't part of the identifier, and that `\foo`
is the same as just `foo` (unlike in VHDL), but that identifiers
identical to keywords such as `\begin` are also allowed.
This definition also theoretically allows an empty identifier, `\`,
which ctags currently ignores, issuing a notice.

Escaped identifiers are defined in the standard in section
3.7.1 (Verilog) / 5.6.1 (SystemVerilog).
@masatake
Copy link
Member

masatake commented Dec 7, 2024

I have two concerns:

  1. The current implementation of libreadtags skips lines with empty names.
[yamato@dev64]~/var/ctags-github% cat tags | tail -3
x	/tmp/foo.sh	/^function x$/;"	f
	/tmp/foo.sh	/^function ""$/;"	f
y	/tmp/foo.sh	/^function y$/;"	f
[yamato@dev64]~/var/ctags-github% ./readtags -t tags -l
x	/tmp/foo.sh	/^function x$/
y	/tmp/foo.sh	/^function y$/

libreadtags skips such lines at

static int readTagLine (tagFile *const file, int *err)
{
	int result;
	do
	{
		result = readTagLineRaw (file, err);
	} while (result && *file->name.buffer == '\0');
	return result;
}

The line comparing with '\0' comes from the exuberant ctags.
Our ancestors assume an empty tag is invalid.

  1. the other file formats

u-ctags supports etags and xref. Do they consider null tags?

Could you open a separate issue to support a null tag?

@cousteaulecommandant
Copy link
Contributor Author

I have two concerns:

This is regarding the idea of handling null tags as a future addition and not something blocking this PR I need to handle right now, right?

Could you open a separate issue to support a null tag?

I suppose this was addressed at @hirooih, right?

@hirooih hirooih mentioned this pull request Dec 7, 2024
Copy link
Contributor

@hirooih hirooih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks a lot!

c = vGetc (); // skip leading '\'
// A single `\` would result in an empty identifier, which is unsupported.
// Add the initial `\` in that case, and also in case it starts with `\\`.
if (!isgraph (c) || c == '\\')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.6.1 Escaped identifiers

any of the printable ASCII characters in an identifier (the decimal values 33 through 126, or 21 through 7E in hexadecimal).

the C standard, §7.4

3 The term printing character refers to a member of a locale-specific set of characters, each of which occupies one printing position on a display device;
...
7.4.1.6 [The isgraph function]
,,,
2 The isgraph function tests for any printing character except space (' ').

Considering locales excepting C, I think isgraph() can not be used here.

parsers/verilog.c Show resolved Hide resolved
@hirooih hirooih merged commit 4d5547e into universal-ctags:master Dec 8, 2024
75 checks passed
@hirooih
Copy link
Contributor

hirooih commented Dec 8, 2024

I've merged this PR.

@cousteaulecommandant,

Again thank you very much!

@cousteaulecommandant
Copy link
Contributor Author

Nice! And thank you for all the feedback :)

@cousteaulecommandant cousteaulecommandant deleted the v_escaped branch December 13, 2024 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants