-
-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete empty lines in retrieved HTML code #187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok but could you add a test about it?
Note: unit tests are failing (as could be expected) because of code not having empty lines any more. |
Sorry, did not see your review. I am absolutely not comfortable with unit tests (I know what it is, but never wrote one, especially in PHP...). |
Maybe because it might not be the best way to fix it? I didn't took time to investigate just asking. |
I am trying a new regex right now, which might be less of a problem. (And sorry for spamming with commits, but I don't know how to run my test locally, so I push and wait for Github's Travis to do the job 😏...). In any way, I think that from the beginning going at the HTML code with regex is a call for future disaster (as eloquently put in php-readability : HACK: dirty cleanup to replace some stuff; shouldn't use regexps with HTML but well...). Edit: It seems that the new regex allows for fixing the issue without breaking the tests. |
It reminds me that post on SO https://stackoverflow.com/q/1732348/569101 🤕 |
It would be interesting to see if it's possible to improve graby & php-readability to get rid of regexes (except for replacements in site configuration files). I'll try to have a go at it someday. |
Delete empty lines in retrieved HTML code to avoid runaway evaluation of empty node stripping regex on badly coded websites.
Related wallabag/wallabag#3825