Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Python objects to C objects for further speedups #1

Open
sylvinus opened this issue Feb 7, 2016 · 3 comments
Open

Convert Python objects to C objects for further speedups #1

sylvinus opened this issue Feb 7, 2016 · 3 comments

Comments

@sylvinus
Copy link
Contributor

sylvinus commented Feb 7, 2016

It is already done for nesting_limit. Each of the options should be transformed into C objects during init so that _traverse_node() uses as few Python objects as possible.

The gumbocy.html file generated after make cythonize is useful for seeing which lines use Python objects.

What would be the most efficient C type for lookups, to replace the Python sets like attributes_whitelist?

@sylvinus
Copy link
Contributor Author

sylvinus commented Jul 6, 2016

Most of the options have been converted to C variables.

There are probably some more optimizations left in the parsing of CSS class names (split in C instead of using Python's re?), but we should do more profiling first to see where the real bottlenecks are.

From my tests, >80% of the time is usually spent in gumbo.parse, not sure what we can do about it but look upstream for the largest speedups.

@sylvinus
Copy link
Contributor Author

sylvinus commented Jul 6, 2016

This one is also a good candidate for micro-optimization: 8e864c8#diff-51db9a1af8644d65b7f79981d2b0a7c2R62

@sylvinus sylvinus changed the title Convert options to C objects for further speedups Convert Python objects to C objects for further speedups Jul 14, 2016
@sylvinus
Copy link
Contributor Author

A huge general speedup was gained thanks to #8, but it also re-introduced a lot of Python objects in the code.

There are a bunch of places where we go through Python strings for instance just to lowercase them. The attribute values are also stored as a Python dict, but a C++ map would probably be much faster (mostly because it would keep all its values as char*).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant