-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Since boost 1.72 spirit::unicode::char_ fails to parse non-ASCII #678
Comments
|
#define BOOST_SPIRIT_UNICODE
#include <boost/spirit/include/qi.hpp>
int main()
{
typedef std::string::const_iterator iterator_type;
namespace qi = boost::spirit::qi;
namespace unicode = boost::spirit::unicode;
std::string input("\"Test \xe2\x8f\xb3\"");
qi::rule<iterator_type, std::string(), unicode::space_type> quoted_string = qi::lexeme['"' >> +(unicode::char_ - '"') >> '"'];
iterator_type iter = input.begin();
iterator_type end = input.end();
std::string output;
bool r = phrase_parse(iter, end, quoted_string, unicode::space, output);
if (r && iter == end)
std::cout << "successfully parsed " << input << " to " << output << std::endl;
else
std::cout << "failed to parse " << input << std::endl;
return 0;
} Edit: Typo. |
This sounds as a duplicate of #675
Use UTF-8 to UTF-32 conversion iterator, if it does not help try to replace |
I suppose this is not a duplicate. While the reason is basically the same (and it is well described here), mine is about character classification checks, and this one happens before that, when both Qi and X3 implicitly convert value of signed type to spirit/include/boost/spirit/home/support/char_encoding/unicode.hpp Lines 41 to 46 in db8bdf3
(0xE2 For example, spirit/include/boost/spirit/home/support/char_encoding/standard.hpp Lines 36 to 42 in db8bdf3
@timo-schluessler you example would work if you use Also, as it appeared to be, my issue with |
Thanks for your replies. |
I'm also hit by this while upgrading to boost 1.76.0. I'm still bit puzzled after reading documentation as there is minimum info about character encoding namespaces. What's the difference between How I see it is that |
Obviously the difference is encoding.
|
How to print unicode _attr?
#define BOOST_SPIRIT_X3_UNICODE
#include <boost/spirit/home/x3.hpp>
auto f1 = [](auto& ctx){std::cout << _attr(ctx) << std::endl;}
x3::rule<class tree, ast::tree> const tree = "tree";
auto const tree_def =
lexeme[+(char_ -(eol))][f1]
>> int_
; |
Any updates on this issue? Will this be fixed or is there any workaround? I use code like this:
Does this mean I have to change my char into some UTF8 type now to have a basic_istream for UTF8? While the design decision might make sense, it breaks older code. |
@tdauth I would suggest porting your code to X3, and use the char-related parsers provided in the correct namespaces, like @Kojoley mentioned: #678 (comment) Also: (quoting from #678 (comment))
I feel that the problem on this Issue is thoroughly explained here, and I think that there's no actual issue in the Spirit's implementation. As a sidenote:
|
So for UTF-8 files I could simply use std::u8string and char8_t instead of std::string and char in Boost Spirit 2? I don't know yet how to migrate to X3, so I am looking for the easiest solution. |
@tdauth Yes, you should pass unicode iterators to Spirit. |
Sample code to reproduce the issue:
Thanks to sehe who bisected the issue down to commit 16159fb.
Maybe this behavior is by intention - then I simply don't get the use and meaning of spirit::unicode and BOOST_SPIRIT_UNICODE.
The text was updated successfully, but these errors were encountered: