Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider making html5lib.tokenizer public #532

Open
mgrandi opened this issue Apr 12, 2021 · 3 comments
Open

consider making html5lib.tokenizer public #532

mgrandi opened this issue Apr 12, 2021 · 3 comments

Comments

@mgrandi
Copy link

mgrandi commented Apr 12, 2021

Hello,

In version https://github.com/html5lib/html5lib-python/releases/tag/0.999999999 , html5lib.tokenizer was made private

The wpull project (https://github.com/ArchiveTeam/wpull ) uses this library, and if we were to ever migrate to using the 1.X versions, it would negatively impact the application, because instead of just tokenizing a webpage (see https://github.com/ArchiveTeam/wpull/blob/a4ff4a93f613ce18ad3c515aa3d4f5848a88b98c/wpull/document/htmlparse/html5lib_.py ), we would have to use the full tree parsing which is slower and uses more ram

is there any reason this was made private when the 1.x branch was released?

@theRealProHacker
Copy link
Contributor

I don't understand what you mean with private. How can something be made private in Python?

@mgrandi
Copy link
Author

mgrandi commented Feb 20, 2023

This project seems abandoned....but by private I mean that obviously yes you can't make it private it actually in python, but I do mean that it changes location, and usually denoted with an underscore, and that means there is no guarantee that it will be in the same place / renamed / etc in future releases

Making it public = making it part of the public API so that way even if the underlying implementation changes , the API stays the same

@theRealProHacker
Copy link
Contributor

Yeah, it's really annoying. If you are a maintainer of a project, then at least answer some issues, every month. Even if you don't want to write any new code.

I now understand what you mean. A simple solution would be to pin the version and then just use the undocumented, or to use your words, private part of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants