Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplehtmldom update #959

Closed
em92 opened this issue Dec 11, 2018 · 2 comments
Closed

simplehtmldom update #959

em92 opened this issue Dec 11, 2018 · 2 comments

Comments

@em92
Copy link
Contributor

em92 commented Dec 11, 2018

@logmanoriginal just checked out this: https://sourceforge.net/p/simplehtmldom/news/2018/12/php-simple-html-dom-parser-17/

So, I remind you to update vendor directory.

Btw, https://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.7/

str_get_html() is deprecated and should be replaced by new simple_html_dom()

I suggest not to remove str_get_html, not to break code in already existing projects.

@logmanoriginal
Copy link
Contributor

So, I remind you to update vendor directory.

Thanks for the reminder 👍

Actually, I haven't merged it on purpose so we can finish the release of this month before (possibly) breaking things again.

I suggest not to remove str_get_html, not to break code in already existing projects.

This is not the right place to discussing it, but I see your point.

That being said, anyone could add a simple wrapper to their project to replace it, so no harm done. Unfortunately, in its current implementation str_get_html prevents you from loading strings longer than 600 000 characters by default (RSS-Bridge increased that limit to 10 000 000 manually). I'm still unsure why that limit even exists in the first place. Than again, if you remove it, that function basically calls new simple_html_dom 🤷‍♂️

logmanoriginal added a commit that referenced this issue Dec 11, 2018
- Update parser to version 1.7
https://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.7/

References #959

-------------------- CHANGELOG --------------------

- Added code documentation to improve readability
- Added unit tests for `simple_html_dom::$self_closing_tags`
- Added unit tests for `simple_html_dom::$optional_closing_tags`
- Added unit tests for bug reports
  - Added test for bug [#56](https://sourceforge.net/p/simplehtmldom/bugs/56/)
  - Added test for bug [#97](https://sourceforge.net/p/simplehtmldom/bugs/97/)
  - Added test for bug [#116](https://sourceforge.net/p/simplehtmldom/bugs/116/)
  - Added test for bug [#121](https://sourceforge.net/p/simplehtmldom/bugs/127/)
  - Added test for bug [#127](https://sourceforge.net/p/simplehtmldom/bugs/127/)
  - Added test for bug [#154](https://sourceforge.net/p/simplehtmldom/bugs/154/)
  - Added test for bug [#160](https://sourceforge.net/p/simplehtmldom/bugs/160/)
- Added unit tests for memory management of the parser
- Added bit flags to `simple_html_dom::load()`
  - Added bit flag `HDOM_SMARTY_AS_TEXT` to optionally filter Smarty scripts (#154)\
  **Note**: Smarty scripts are no longer filtered by default!\
- Added build script to automate releases
- Added support for attributes without whitespace to separate them
- Improved documentation and readability for `$self_closing_tags`
- Improved documentation and readability for `$block_tags`
- Improved documentation and readability for `$optional_closing_tags`
- Updated list of `simple_html_dom::$self_closing_tags`
  - Removed 'spacer' (obsolete)
  - Added 'area'
  - Added 'col'
  - Added 'meta'
  - Added 'param'
  - Added 'source'
  - Added 'track'
  - Added 'wbr'
- Updated list of `simple_html_dom::$optional_closing_tags`
  - Removed "nobr" (obsolete)
  - Added 'th' as closable element to 'td'
  - Added 'td' as closable element to 'th'
  - Added 'optgroup' with 'optgroup' and 'option' as closable elements
  - Added 'optgroup' as closable element to 'option'
  - Added 'rp' with 'rp' and 'rt' as closable elements
  - Added 'rt' with 'rt' and 'rp' as closable elements
- Clarified meaning of `simple_html_dom->parent`
- Changed default `$offset` for `file_get_html()` from -1 to 0 (#161)
- Changed `simple_html_dom::load()` to remove script tags before replacing newline characters
- `simple_html_dom_node::text()` no longer adds whitespace to top level span elements (only to sub-elements)
- `simple_html_dom_node::text()` adds blank lines between paragraphs
- Normalized line endings in the repository to LF via `.gitattributes`
- Improved performance of `simple_html_dom::parse_charset()` by approximately 25%
- Improved performance of `simple_html_dom::parse()` by approximately 10%
- `str_get_html()` is deprecated and should be replaced by `new simple_html_dom()`
- Removed protected function `simple_html_dom::copy_until_char_escaped()`
- Fixed compatibility issues with PHP 7.3
- Fixed typo (#147)
- Fixed handling of incorrectly escaped text (#160)
- Restore functionality of `$maxLen` in `file_get_html()`
- Fixed load_file breaks if an error ocurred in another script
@logmanoriginal
Copy link
Contributor

Done.

Note to self: Never include the changelog of another project in the commit message 🤦‍♂️

infominer33 pushed a commit to web-work-tools/rss-bridge that referenced this issue Apr 17, 2020
- Update parser to version 1.7
https://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.7/

References RSS-Bridge#959

-------------------- CHANGELOG --------------------

- Added code documentation to improve readability
- Added unit tests for `simple_html_dom::$self_closing_tags`
- Added unit tests for `simple_html_dom::$optional_closing_tags`
- Added unit tests for bug reports
  - Added test for bug [RSS-Bridge#56](https://sourceforge.net/p/simplehtmldom/bugs/56/)
  - Added test for bug [RSS-Bridge#97](https://sourceforge.net/p/simplehtmldom/bugs/97/)
  - Added test for bug [RSS-Bridge#116](https://sourceforge.net/p/simplehtmldom/bugs/116/)
  - Added test for bug [RSS-Bridge#121](https://sourceforge.net/p/simplehtmldom/bugs/127/)
  - Added test for bug [RSS-Bridge#127](https://sourceforge.net/p/simplehtmldom/bugs/127/)
  - Added test for bug [RSS-Bridge#154](https://sourceforge.net/p/simplehtmldom/bugs/154/)
  - Added test for bug [RSS-Bridge#160](https://sourceforge.net/p/simplehtmldom/bugs/160/)
- Added unit tests for memory management of the parser
- Added bit flags to `simple_html_dom::load()`
  - Added bit flag `HDOM_SMARTY_AS_TEXT` to optionally filter Smarty scripts (RSS-Bridge#154)\
  **Note**: Smarty scripts are no longer filtered by default!\
- Added build script to automate releases
- Added support for attributes without whitespace to separate them
- Improved documentation and readability for `$self_closing_tags`
- Improved documentation and readability for `$block_tags`
- Improved documentation and readability for `$optional_closing_tags`
- Updated list of `simple_html_dom::$self_closing_tags`
  - Removed 'spacer' (obsolete)
  - Added 'area'
  - Added 'col'
  - Added 'meta'
  - Added 'param'
  - Added 'source'
  - Added 'track'
  - Added 'wbr'
- Updated list of `simple_html_dom::$optional_closing_tags`
  - Removed "nobr" (obsolete)
  - Added 'th' as closable element to 'td'
  - Added 'td' as closable element to 'th'
  - Added 'optgroup' with 'optgroup' and 'option' as closable elements
  - Added 'optgroup' as closable element to 'option'
  - Added 'rp' with 'rp' and 'rt' as closable elements
  - Added 'rt' with 'rt' and 'rp' as closable elements
- Clarified meaning of `simple_html_dom->parent`
- Changed default `$offset` for `file_get_html()` from -1 to 0 (RSS-Bridge#161)
- Changed `simple_html_dom::load()` to remove script tags before replacing newline characters
- `simple_html_dom_node::text()` no longer adds whitespace to top level span elements (only to sub-elements)
- `simple_html_dom_node::text()` adds blank lines between paragraphs
- Normalized line endings in the repository to LF via `.gitattributes`
- Improved performance of `simple_html_dom::parse_charset()` by approximately 25%
- Improved performance of `simple_html_dom::parse()` by approximately 10%
- `str_get_html()` is deprecated and should be replaced by `new simple_html_dom()`
- Removed protected function `simple_html_dom::copy_until_char_escaped()`
- Fixed compatibility issues with PHP 7.3
- Fixed typo (RSS-Bridge#147)
- Fixed handling of incorrectly escaped text (RSS-Bridge#160)
- Restore functionality of `$maxLen` in `file_get_html()`
- Fixed load_file breaks if an error ocurred in another script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants