-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed ANTLRInputStream and ANTLRFileStream #3113
Conversation
A previous change to add std::string_view support to ANTLRInputStream for C++17 caused some trouble because of ABI changes. This has been changed to define 2 constructors, one for std::string_view (for C++17) and the original one for std::string. This is turn caused an error in ANTLRFileStream, which also takes a string in the constructor (but handling it as file name instead of input). To make this clearer the c-tor taking a std::string has been deleted in ANTLRFileStream and the class now requires to load input via the loadFile() function. This might cause some trouble for those users who had used the std::string constructor of ANTLRFileStream, but I think the better error reporting outweighs the little annoyance.
We have again a timeout in one of the C++ builds (@ericvergnaud, your box again?) . Unfortunately, I'm not allowed to rerun that task. |
@mike-lischke not one of my boxes |
I see the dropdown but all entries are disabled for me. And it looks as if the restarted build will again fail. |
- Added a default c-tor to the input stream to avoid an ambiquity. - Changed the input stream API so that it can take a string pointer + length and use that for UTF conversion, avoiding so unnecessary copies. Convenience methods exist to use a std::string or a std::string_view. - With that only a single load() method is necessary. - In ANTLRFileStream the other c-tors are now also deleted, as they make no sense there.
@xTachyon I picked up your suggestions and also fixed a number of issues I found. |
Looks good except that comment I made. |
@ericvergnaud You added Github actions to this repo in January, so you may be able to answer a question in this regard. How's that supposed to work for forks? I constantly get check errors now when I push something to my fork of antlr4. Any tip you can give me to handle this correctly? |
if (input.compare(0, 3, bom, 3) == 0) | ||
_data = antlrcpp::utf8_to_utf32(input.data() + 3, input.data() + input.size()); | ||
const char *bom = "\xef\xbb\xbf"; | ||
if (length > 3 && strncmp(data, bom, 3) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm just nitpicking at this point but wouldn't an empty bom be a valid empty string that could be valid for some grammars? So the check would be length >= 3
to accept and skip if it's just the bom. Or maybe I don't know my unicode. Sorry for all the disturbance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, a good code review is often inconvenient.
The approach there is that if there's enough to have a possible BOM then check it, if not just go ahead and load what you got. So even an empty string is "loaded" correctly in the else branch.
I'm assuming here that by "empty bom" you actually mean an empty input string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I mean: if only "\xef\xbb\xbf"
is passed in the function, length
will be 3 and the else
branch will be taken. Then, at least when used with utfcpp conversion functions, it will throw utf8::invalid_code_point
because it can't parse the bom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also thought about this case and found it would not be worth to be considered as it would mean no useful input was given. However, there could be a workflow where a BOM is always attached automatically, regardless of what input was given and empty input is often valid input. So it would make sense to strip off the BOM and deal with the empty input anyway. I changed the check therefore.
C++ is not <any other language where you don't need a useless constructor>
@parrt Here's a new C++ patch to fix ABI problems for file + input stream. Ready to be merged. |
A previous change to add std::string_view support to ANTLRInputStream for C++17 caused some trouble because of ABI changes. This has been changed to define 2 constructors, one for std::string_view (for C++17) and the original one for std::string.
This is turn caused an error in ANTLRFileStream, which also takes a string in the constructor (but handling it as file name instead of input). To make this clearer the c-tor taking a std::string has been deleted in ANTLRFileStream and the class now requires to load input via the loadFile() function. This might cause some trouble for those users who had used the std::string constructor of ANTLRFileStream, but I think the better error reporting outweighs the little annoyance.
This is supposed to handle issue #3109.