Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed ANTLRInputStream and ANTLRFileStream #3113

Merged
merged 11 commits into from
Mar 10, 2021
Merged

Fixed ANTLRInputStream and ANTLRFileStream #3113

merged 11 commits into from
Mar 10, 2021

Conversation

mike-lischke
Copy link
Member

A previous change to add std::string_view support to ANTLRInputStream for C++17 caused some trouble because of ABI changes. This has been changed to define 2 constructors, one for std::string_view (for C++17) and the original one for std::string.
This is turn caused an error in ANTLRFileStream, which also takes a string in the constructor (but handling it as file name instead of input). To make this clearer the c-tor taking a std::string has been deleted in ANTLRFileStream and the class now requires to load input via the loadFile() function. This might cause some trouble for those users who had used the std::string constructor of ANTLRFileStream, but I think the better error reporting outweighs the little annoyance.

This is supposed to handle issue #3109.

A previous change to add std::string_view support to ANTLRInputStream for C++17 caused some trouble because of ABI changes. This has been changed to define 2 constructors, one for std::string_view (for C++17) and the original one for std::string.
This is turn caused an error in ANTLRFileStream, which also takes a string in the constructor (but handling it as file name instead of input). To make this clearer the c-tor taking a std::string has been deleted in ANTLRFileStream and the class now requires to load input via the loadFile() function. This might cause some trouble for those users who had used the std::string constructor of ANTLRFileStream, but I think the better error reporting outweighs the little annoyance.
@mike-lischke
Copy link
Member Author

We have again a timeout in one of the C++ builds (@ericvergnaud, your box again?) . Unfortunately, I'm not allowed to rerun that task.

@ericvergnaud
Copy link
Contributor

@mike-lischke not one of my boxes
I restarted it, not sure why you're not able to do that...

@mike-lischke
Copy link
Member Author

I see the dropdown but all entries are disabled for me. And it looks as if the restarted build will again fail.

- Added a default c-tor to the input stream to avoid an ambiquity.
- Changed the input stream API so that it can take a string pointer + length and use that for UTF conversion, avoiding so unnecessary copies. Convenience methods exist to use a std::string or a std::string_view.
- With that only a single load() method is necessary.
- In ANTLRFileStream the other c-tors are now also deleted, as they make no sense there.
@mike-lischke
Copy link
Member Author

@xTachyon I picked up your suggestions and also fixed a number of issues I found.

@xTachyon
Copy link
Contributor

Looks good except that comment I made.

@mike-lischke
Copy link
Member Author

@ericvergnaud You added Github actions to this repo in January, so you may be able to answer a question in this regard. How's that supposed to work for forks? I constantly get check errors now when I push something to my fork of antlr4. Any tip you can give me to handle this correctly?

if (input.compare(0, 3, bom, 3) == 0)
_data = antlrcpp::utf8_to_utf32(input.data() + 3, input.data() + input.size());
const char *bom = "\xef\xbb\xbf";
if (length > 3 && strncmp(data, bom, 3) == 0)
Copy link
Contributor

@xTachyon xTachyon Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm just nitpicking at this point but wouldn't an empty bom be a valid empty string that could be valid for some grammars? So the check would be length >= 3 to accept and skip if it's just the bom. Or maybe I don't know my unicode. Sorry for all the disturbance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, a good code review is often inconvenient.

The approach there is that if there's enough to have a possible BOM then check it, if not just go ahead and load what you got. So even an empty string is "loaded" correctly in the else branch.

I'm assuming here that by "empty bom" you actually mean an empty input string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I mean: if only "\xef\xbb\xbf" is passed in the function, length will be 3 and the else branch will be taken. Then, at least when used with utfcpp conversion functions, it will throw utf8::invalid_code_point because it can't parse the bom.

Copy link
Member Author

@mike-lischke mike-lischke Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also thought about this case and found it would not be worth to be considered as it would mean no useful input was given. However, there could be a workflow where a BOM is always attached automatically, regardless of what input was given and empty input is often valid input. So it would make sense to strip off the BOM and deal with the empty input anyway. I changed the check therefore.

C++ is not <any other language where you don't need a useless constructor>
@mike-lischke
Copy link
Member Author

@parrt Here's a new C++ patch to fix ABI problems for file + input stream. Ready to be merged.

This was referenced Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants