-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using DOMDocument recovery mode #8
Comments
The problem is that false results could lead to subsequent errors in parsing and handling of the entire feed. Maybe it's an option to inject your own Originally posted by @froschdesign at zendframework/zend-feed#73 (comment) |
You can use the recovery mode yourself: // Import by URI
$httpClient = Zend\Feed\Reader\Reader::getHttpClient();
$response = $httpClient->get(
'https://github.com/zendframework/zend-feed/releases.atom'
);
$xmlString = $response->getBody();
// Create DOMDocument
$dom = new DOMDocument;
$dom->recover = true;
$dom->loadXML(trim($xmlString));
// Detect type
$type = Zend\Feed\Reader\Reader::detectType($dom);
// Create reader
if (0 === strpos($type, 'rss')) {
$reader = new Zend\Feed\Reader\Feed\Rss($dom, $type);
}
if (0 === strpos($type, 'atom')) {
$reader = new Zend\Feed\Reader\Feed\Atom($dom, $type);
}
var_dump($reader->getTitle()); // "Release notes from zend-feed" Originally posted by @froschdesign at zendframework/zend-feed#73 (comment) |
Thanks for help! This is indeed what I ended up doing: Originally posted by @Isinlor at zendframework/zend-feed#73 (comment) |
@Isinlor Can you provide a link to a feed which is malformed and needs the recovery mode? Originally posted by @froschdesign at zendframework/zend-feed#73 (comment) |
Here is one example: http://itbrokeand.ifixit.com/atom.xml Code I used for testing: <?php
$libxmlErrflag = libxml_use_internal_errors(true);
$oldValue = libxml_disable_entity_loader(true);
$dom = new \DOMDocument;
//$dom->recover = true; // Allows to parse slightly malformed feeds
$status = $dom->loadXML(file_get_contents("http://itbrokeand.ifixit.com/atom.xml"));
if (!$status) {
// Build error message
$error = libxml_get_last_error();
if ($error instanceof \LibXMLError && $error->message != '') {
$error->message = trim($error->message);
$errormsg = "DOMDocument cannot parse XML: {$error->message}";
} else {
$errormsg = "DOMDocument cannot parse XML: Please check the XML document's validity";
}
throw new Exception($errormsg);
} Originally posted by @Isinlor at zendframework/zend-feed#73 (comment) |
@Isinlor Originally posted by @froschdesign at zendframework/zend-feed#73 (comment) |
I think your initial reaction was correct.
I missed it when I was working on it myself. But indeed, even tough I'm really curious how Firefox handle it, because I have no issues if I open:
Originally posted by @Isinlor at zendframework/zend-feed#73 (comment) |
@Isinlor Originally posted by @froschdesign at zendframework/zend-feed#73 (comment) |
https://blog.noredink.com/rssThere were some problems, but now I have not found anything. http://itbrokeand.ifixit.com/atom.xmlProblem is (Also fails in a browser.) http://aasnova.org/feed/Two problems: 403 and wrong header. (Also fails in a browser. [Download]) https://blog.floydhub.com/rss/Many feeds contain characters out of the legal range. Try the following preg_replace(
'/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+/u',
' ',
$string
) This should eliminate problems like "CData section not finished". (Also fails in a browser.) Thanks for the examples. At the moment I do not know if we should do something in zend-feed, because it opens the door to many pitfalls or ugly workarounds. I see the benefit for the user but also the problem of maintain. I remain open to suggestions and improvements. Originally posted by @froschdesign at zendframework/zend-feed#73 (comment) |
See stack overflow for details: https://stackoverflow.com/a/9281963/893222
The idea is to handle malformed XML thanks to recovery option in libxml that is implemented in userland:
Originally posted by @Isinlor at zendframework/zend-feed#73
The text was updated successfully, but these errors were encountered: