Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete empty lines in retrieved HTML code #187

Merged
merged 2 commits into from
Jan 4, 2019
Merged

Conversation

techexo
Copy link
Contributor

@techexo techexo commented Jan 4, 2019

Delete empty lines in retrieved HTML code to avoid runaway evaluation of empty node stripping regex on badly coded websites.

Related wallabag/wallabag#3825

Copy link
Owner

@j0k3r j0k3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok but could you add a test about it?

@coveralls
Copy link

coveralls commented Jan 4, 2019

Coverage Status

Coverage increased (+0.002%) to 98.473% when pulling 6b86ad1 on techexo:master into 5c9b33d on j0k3r:master.

@techexo
Copy link
Contributor Author

techexo commented Jan 4, 2019

Note: unit tests are failing (as could be expected) because of code not having empty lines any more.

@techexo
Copy link
Contributor Author

techexo commented Jan 4, 2019

Sorry, did not see your review. I am absolutely not comfortable with unit tests (I know what it is, but never wrote one, especially in PHP...).
What would you like it to test? Runaway evaluation or just line stripping?

@j0k3r
Copy link
Owner

j0k3r commented Jan 4, 2019

Note: unit tests are failing (as could be expected) because of code not having empty lines any more.

Maybe because it might not be the best way to fix it? I didn't took time to investigate just asking.

@techexo
Copy link
Contributor Author

techexo commented Jan 4, 2019

I am trying a new regex right now, which might be less of a problem. (And sorry for spamming with commits, but I don't know how to run my test locally, so I push and wait for Github's Travis to do the job 😏...).

In any way, I think that from the beginning going at the HTML code with regex is a call for future disaster (as eloquently put in php-readability : HACK: dirty cleanup to replace some stuff; shouldn't use regexps with HTML but well...).

Edit: It seems that the new regex allows for fixing the issue without breaking the tests.

@j0k3r
Copy link
Owner

j0k3r commented Jan 4, 2019

It reminds me that post on SO https://stackoverflow.com/q/1732348/569101 🤕

@j0k3r j0k3r merged commit fd793a3 into j0k3r:master Jan 4, 2019
@techexo
Copy link
Contributor Author

techexo commented Jan 6, 2019

It would be interesting to see if it's possible to improve graby & php-readability to get rid of regexes (except for replacements in site configuration files). I'll try to have a go at it someday.

@j0k3r j0k3r changed the title Fix wallabag#3825 Delete empty lines in retrieved HTML code Jan 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants