Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instantiate parsers only once #153

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Oct 1, 2019

Changes:

  • A new module, parsel.parser, hosts two new classes: HTMLParser and XMLParser. These classes have a single, common method, parse. They are wrappers for the corresponding lxml parsers.

  • Two new constants have been created: parsel.parser.html.HTML_PARSER and parsel.parser.xml.XML_PARSER, which are instances of those classes. They exist in separate modules so that they are only loaded on demand, lazily, and only instantiated once.

  • _ctgroup[*]['_parser'] is now a string with the import path of one of those constants, which the Selector constructor loads. As a result, when you create multiple Selector objects from text, a new parser is no longer created each time; the same parser is reused instead.

  • parsel.selector.create_root_node and parsel.selector.SafeXMLParser are now deprecated.

  • Now that parser constructor parameters are defined on the new wrapping classes (parsel.parser.HTMLParser and parsel.parser.XMLParser), it’s easier to implement new wrappers for parsers that do not support the same constructor parameters (recover, encoding), such as an HTML5 parser.

@codecov
Copy link

codecov bot commented Oct 1, 2019

Codecov Report

Merging #153 into master will decrease coverage by 0.54%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #153      +/-   ##
==========================================
- Coverage   99.66%   99.11%   -0.55%     
==========================================
  Files           5        8       +3     
  Lines         296      338      +42     
  Branches       52       53       +1     
==========================================
+ Hits          295      335      +40     
- Misses          1        2       +1     
- Partials        0        1       +1
Impacted Files Coverage Δ
parsel/parser/__init__.py 100% <100%> (ø)
parsel/parser/xml.py 100% <100%> (ø)
parsel/parser/html.py 100% <100%> (ø)
parsel/selector.py 98.82% <100%> (-1.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15e4326...0e5f2aa. Read the comment docs.

@Gallaecio
Copy link
Member Author

The missing coverage is due to a line of the deprecated create_root_node which I don’t know how to reach. Since the implementation of the function has not changed and should not change before it’s removed, I think it’s safe to ignore.

@eliasdorneles
Copy link
Member

I haven't had time to check this with depth.

But it will need updating docs, maybe we should remove create_root_node from the docs to make explicit that deprecation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants