Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ox SAX Parser not Raising Errors #55

Closed
mgeyer opened this issue Apr 11, 2017 · 9 comments
Closed

Ox SAX Parser not Raising Errors #55

mgeyer opened this issue Apr 11, 2017 · 9 comments

Comments

@mgeyer
Copy link

mgeyer commented Apr 11, 2017

Hi There,

I was debugging some fault XML (invalid byte sequence for UTF-8) and noticed that Saxerator just ignored the error and stopped processing. It would process all nodes successfully up to that point, and then stop finding more elements.

At first I thought that was Saxerator having an issue, but upon closer inspection it turns out that is how the Ox SAX parser is configured by default. In short, unless an error method is defined for the Ox SAX parser it will silently close the remaining tags. You can check out (ohler55/ox#166) to get more details.

How do people feel about adding the error method and raising an exception when an error is found by the Ox parser?

@fanantoxa
Copy link
Collaborator

@mgeyer I'll research about that for OX, but also take a look how Rexml and Nokogiri will work on this kind of situation. Response with proposals after. Shouldn't take much time.

@soulcutter What do you think about it?

@mgeyer
Copy link
Author

mgeyer commented Apr 14, 2017

@fanantoxa Thanks for taking a look :) I would hope that both Rexml and Nokogiri at least have support for throwing errors on parsing errors. If they do then we should make sure that either all adapters throw errors or none do.

@soulcutter
Copy link
Owner

I think it makes sense for Saxerator to implement the #error method and raise an error.

@fanantoxa
Copy link
Collaborator

@soulcutter, @mgeyer Nokogiri and Ox have a error calbek. I have to test manualy with REXML, since error calback not documenter. Same for Added recently oga parser

@soulcutter
Copy link
Owner

I'm not -too- concerned that all parsers raise errors (or none of them) - I think the expectation is that different parsers may have slightly different behavior based on what they provide. If we can raise an error, I would like to, though, since I think it's a better practice than being silent.

@mgeyer
Copy link
Author

mgeyer commented Apr 28, 2017

Agreed, or at least a way to configure the parsers to raise errors.

@fanantoxa
Copy link
Collaborator

@mgeyer , @soulcutter Yep. Same for me. As I understand Oga raise and exception on error.
Didn't check REXML yet, but it might be the same.
I think the best idea here would be to catch this exceptions (or add error callback for first 2 parser) and raise our exception like Saxerator::DataCorruptedException. In this case any parser on error would behave the same.
How you look on that?

@fanantoxa
Copy link
Collaborator

@mgeyer, @soulcutter Finished in master

@mgeyer
Copy link
Author

mgeyer commented Jun 20, 2017

Thanks for the hard work @fanantoxa :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants