-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReaderError with ASCII-8BIT encoding. #142
Comments
This can be simplified to the following:
The problem is in RDF::NTriples::Reader#read_literal, which is failing when it tries to match on LITERAL_PLAIN. Must be some other place where the input is not transformed to UTF-8, which is certainly annoying. |
The RDF reader chokes on certain character combinations if the raw data is not encoded UTF-8. See ruby-rdf/rdf#142 Fedora does not store character encoding, so by default they come back as ASCII-8BIT.
The RDF reader chokes on certain character combinations if the raw data is not encoded UTF-8. See ruby-rdf/rdf#142 Fedora does not store character encoding, so by default they come back as ASCII-8BIT.
The problem seems to be that the object is not valid ASCII-8BIT. The #force_encoding masks this, but if you use #encode instead, you get an undefined conversion error. What seems to be tripping up is the transformation between formats of an illegal format. As you note in your patch, always giving RDF.rb UTF-8 is probably the right thing. In any case, RDF Literals only cover Unicode (see RDF Concepts), so giving them something else is sort of non-sensical. |
The RDF reader chokes on certain character combinations if the raw data is not encoded UTF-8. See ruby-rdf/rdf#142 Fedora does not store character encoding, so by default they come back as ASCII-8BIT.
The RDF reader chokes on certain character combinations if the raw data is not encoded UTF-8. See ruby-rdf/rdf#142 Fedora does not store character encoding, so by default they come back as ASCII-8BIT.
The RDF reader chokes on certain character combinations if the raw data is not encoded UTF-8. See ruby-rdf/rdf#142 Fedora does not store character encoding, so by default they come back as ASCII-8BIT.
The RDF reader chokes on certain character combinations if the raw data is not encoded UTF-8. See ruby-rdf/rdf#142 Fedora does not store character encoding, so by default they come back as ASCII-8BIT.
The RDF reader chokes on certain character combinations if the raw data is not encoded UTF-8. See ruby-rdf/rdf#142 Fedora does not store character encoding, so by default they come back as ASCII-8BIT.
This code works:
This causes an error:
The text was updated successfully, but these errors were encountered: