Fixed ANTLRInputStream and ANTLRFileStream #3113

mike-lischke · 2021-03-09T08:47:50Z

A previous change to add std::string_view support to ANTLRInputStream for C++17 caused some trouble because of ABI changes. This has been changed to define 2 constructors, one for std::string_view (for C++17) and the original one for std::string.
This is turn caused an error in ANTLRFileStream, which also takes a string in the constructor (but handling it as file name instead of input). To make this clearer the c-tor taking a std::string has been deleted in ANTLRFileStream and the class now requires to load input via the loadFile() function. This might cause some trouble for those users who had used the std::string constructor of ANTLRFileStream, but I think the better error reporting outweighs the little annoyance.

This is supposed to handle issue #3109.

A previous change to add std::string_view support to ANTLRInputStream for C++17 caused some trouble because of ABI changes. This has been changed to define 2 constructors, one for std::string_view (for C++17) and the original one for std::string. This is turn caused an error in ANTLRFileStream, which also takes a string in the constructor (but handling it as file name instead of input). To make this clearer the c-tor taking a std::string has been deleted in ANTLRFileStream and the class now requires to load input via the loadFile() function. This might cause some trouble for those users who had used the std::string constructor of ANTLRFileStream, but I think the better error reporting outweighs the little annoyance.

mike-lischke · 2021-03-09T14:53:29Z

We have again a timeout in one of the C++ builds (@ericvergnaud, your box again?) . Unfortunately, I'm not allowed to rerun that task.

ericvergnaud · 2021-03-09T15:14:42Z

@mike-lischke not one of my boxes
I restarted it, not sure why you're not able to do that...

mike-lischke · 2021-03-09T15:28:27Z

I see the dropdown but all entries are disabled for me. And it looks as if the restarted build will again fail.

- Added a default c-tor to the input stream to avoid an ambiquity. - Changed the input stream API so that it can take a string pointer + length and use that for UTF conversion, avoiding so unnecessary copies. Convenience methods exist to use a std::string or a std::string_view. - With that only a single load() method is necessary. - In ANTLRFileStream the other c-tors are now also deleted, as they make no sense there.

mike-lischke · 2021-03-10T10:22:47Z

@xTachyon I picked up your suggestions and also fixed a number of issues I found.

runtime/Cpp/runtime/src/ANTLRInputStream.cpp

xTachyon · 2021-03-10T10:35:33Z

Looks good except that comment I made.

mike-lischke · 2021-03-10T10:42:55Z

@ericvergnaud You added Github actions to this repo in January, so you may be able to answer a question in this regard. How's that supposed to work for forks? I constantly get check errors now when I push something to my fork of antlr4. Any tip you can give me to handle this correctly?

xTachyon · 2021-03-10T10:59:59Z

runtime/Cpp/runtime/src/ANTLRInputStream.cpp

-  if (input.compare(0, 3, bom, 3) == 0)
-    _data = antlrcpp::utf8_to_utf32(input.data() + 3, input.data() + input.size());
+  const char *bom = "\xef\xbb\xbf";
+  if (length > 3 && strncmp(data, bom, 3) == 0)


Maybe I'm just nitpicking at this point but wouldn't an empty bom be a valid empty string that could be valid for some grammars? So the check would be length >= 3 to accept and skip if it's just the bom. Or maybe I don't know my unicode. Sorry for all the disturbance.

No worries, a good code review is often inconvenient.

The approach there is that if there's enough to have a possible BOM then check it, if not just go ahead and load what you got. So even an empty string is "loaded" correctly in the else branch.

I'm assuming here that by "empty bom" you actually mean an empty input string.

Here's what I mean: if only "\xef\xbb\xbf" is passed in the function, length will be 3 and the else branch will be taken. Then, at least when used with utfcpp conversion functions, it will throw utf8::invalid_code_point because it can't parse the bom.

Yes, I also thought about this case and found it would not be worth to be considered as it would mean no useful input was given. However, there could be a workflow where a BOM is always attached automatically, regardless of what input was given and empty input is often valid input. So it would make sense to strip off the BOM and deal with the empty input anyway. I changed the check therefore.

C++ is not <any other language where you don't need a useless constructor>

mike-lischke · 2021-03-10T14:23:41Z

@parrt Here's a new C++ patch to fix ABI problems for file + input stream. Ready to be merged.

mike-lischke added 4 commits January 3, 2021 19:05

Merge branch 'master-upstream'

e24f97d

Merge branch 'master-upstream'

8391136

Merge branch 'master-upstream'

faa64fd

mike-lischke mentioned this pull request Mar 9, 2021

[C++] string_view for all target compilations #3109

Closed

mike-lischke added 2 commits March 9, 2021 11:01

Fixed C++ tests and build warnings

93c0621

Wrong load call fixed.

f881e3e

xTachyon reviewed Mar 10, 2021

View reviewed changes

runtime/Cpp/runtime/src/ANTLRInputStream.cpp Outdated Show resolved Hide resolved

mike-lischke added 2 commits March 10, 2021 11:49

Added a sanity check for input size.

5731e64

Build fix

9d1737f

xTachyon reviewed Mar 10, 2021

View reviewed changes

mike-lischke added 2 commits March 10, 2021 12:07

Another build fix

ff629d5

C++ is not <any other language where you don't need a useless constructor>

Small improvement

4431f1f

xTachyon approved these changes Mar 10, 2021

View reviewed changes

parrt added the target:cpp label Mar 10, 2021

parrt merged commit d889ba8 into antlr:master Mar 10, 2021

parrt added this to the 4.9.2 milestone Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed ANTLRInputStream and ANTLRFileStream #3113

Fixed ANTLRInputStream and ANTLRFileStream #3113

mike-lischke commented Mar 9, 2021

mike-lischke commented Mar 9, 2021

ericvergnaud commented Mar 9, 2021

mike-lischke commented Mar 9, 2021

mike-lischke commented Mar 10, 2021

xTachyon commented Mar 10, 2021

mike-lischke commented Mar 10, 2021

xTachyon Mar 10, 2021 •

edited

Loading

mike-lischke Mar 10, 2021

xTachyon Mar 10, 2021

mike-lischke Mar 10, 2021 •

edited

Loading

mike-lischke commented Mar 10, 2021

Fixed ANTLRInputStream and ANTLRFileStream #3113

Fixed ANTLRInputStream and ANTLRFileStream #3113

Conversation

mike-lischke commented Mar 9, 2021

mike-lischke commented Mar 9, 2021

ericvergnaud commented Mar 9, 2021

mike-lischke commented Mar 9, 2021

mike-lischke commented Mar 10, 2021

xTachyon commented Mar 10, 2021

mike-lischke commented Mar 10, 2021

xTachyon Mar 10, 2021 • edited Loading

Choose a reason for hiding this comment

mike-lischke Mar 10, 2021

Choose a reason for hiding this comment

xTachyon Mar 10, 2021

Choose a reason for hiding this comment

mike-lischke Mar 10, 2021 • edited Loading

Choose a reason for hiding this comment

mike-lischke commented Mar 10, 2021

xTachyon Mar 10, 2021 •

edited

Loading

mike-lischke Mar 10, 2021 •

edited

Loading