Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement #284

Open
Mulugruntz opened this issue Mar 14, 2018 · 0 comments
Open

Performance improvement #284

Mulugruntz opened this issue Mar 14, 2018 · 0 comments

Comments

@Mulugruntz
Copy link

Hi,

Just checking this "High Performance" port, I noticed a few things:

inlineTags ( https://github.com/syrusakbary/pyjade/blob/master/pyjade/compiler.py#L18 )
selfClosing ( https://github.com/syrusakbary/pyjade/blob/master/pyjade/compiler.py#L40 )
autocloseCode ( https://github.com/syrusakbary/pyjade/blob/master/pyjade/compiler.py#L51 )
should be sets and not lists.
We seem to only do lookups. This costs O(n) and a set would make it O(1).
Replacing [...] by {...} would make the change. I haven't checked, but if they're constants, then, we should replace [...] by frozenset([...]) instead.
As autocloseCode can mutate, it should be a set and the register method would become:

    @classmethod
    def register_autoclosecode(cls, name):
        cls.autocloseCode.add(name)

In https://github.com/syrusakbary/pyjade/blob/master/pyjade/compiler.py#L199
I believe textOnly = tag.textOnly or not bool(len(tag.block.nodes)) could be replaced by textOnly = tag.textOnly or not tag.block.nodes. The truthy value of a container is already False if it's empty, True otherwise. But maybe I've missed something.

About format_path ( https://github.com/syrusakbary/pyjade/blob/master/pyjade/compiler.py#L273 ):
Checking if a character is in a string (especially one that is likely to be at the end of the string) may be slightly costly. Well, it's not that costly, but for extension, we have a function made for that job: os.path.splitext.

I've created the following snippet. I encourage you to run it on your own system.

from os.path import splitext, basename
import os.path
import timeit

def format_path(path, extension):
    has_extension = '.' in os.path.basename(path)
    if not has_extension:
        path += extension
    return path

def format_path2(path, extension):
    has_extension = '.' in basename(path)
    if not has_extension:
        path += extension
    return path

def format_path3(path, extension):
    base, ext = splitext(path)
    return '.'.join([base, ext if ext else extension])

if __name__ == '__main__':
    import timeit
    print 'format_path (noext)  ', timeit.timeit("format_path('path/to/file', 'rar')", setup="import os.path;from __main__ import format_path", number=100000)
    print 'format_path2 (noext) ', timeit.timeit("format_path2('path/to/file', 'rar')", setup="from os.path import basename;from __main__ import format_path2", number=100000)
    print 'format_path3 (noext) ', timeit.timeit("format_path3('path/to/file', 'rar')", setup="from os.path import splitext;from __main__ import format_path3", number=100000)
    print 'format_path (ext)    ', timeit.timeit("format_path('path/to/file.rar', 'rar')", setup="import os.path;from __main__ import format_path", number=100000)
    print 'format_path2 (ext)   ', timeit.timeit("format_path2('path/to/file.rar', 'rar')", setup="from os.path import basename;from __main__ import format_path2", number=100000)
    print 'format_path3 (ext)   ', timeit.timeit("format_path3('path/to/file.rar', 'rar')", setup="from os.path import splitext;from __main__ import format_path3", number=100000)
$> python format_path_time.py
format_path (noext)   0.206955702071
format_path2 (noext)  0.208543457039
format_path3 (noext)  0.150185659479
format_path (ext)     0.243948536038
format_path2 (ext)    0.245148734991
format_path3 (ext)    0.173444946305

It's a 25-30% performance gain.

In https://github.com/syrusakbary/pyjade/blob/master/pyjade/compiler.py#L296 TYPE_CODE is a constant and should be declared outside of the function (to avoid recreating it for every function call).

Another place about set/list:
https://github.com/syrusakbary/pyjade/blob/master/pyjade/compiler.py#L307


def compare_list(s):
    return s in ['if', 'unless']
def compare_set(s):
    return s in {'if', 'unless'}

word_list = ['if', 'unless']
word_set = {'if', 'unless'}
def compare_list_external(s):
    return s in word_list
def compare_set_external(s):
    return s in word_set

if __name__ == '__main__':
    import timeit
    tests = [
        [
            ("'if' in word_list                   ", "word_list = ['if', 'unless']"),
            ("'unless' in word_list               ", "word_list = ['if', 'unless']"),
            ("'unknown' in word_list              ", "word_list = ['if', 'unless']"),

            ("'if' in word_set                    ", "word_set = {'if', 'unless'}"),
            ("'unless' in word_set                ", "word_set = {'if', 'unless'}"),
            ("'unknown' in word_set               ", "word_set = {'if', 'unless'}"),
        ], [
            ("compare_list('if')                  ", "from __main__ import compare_list"),
            ("compare_list('unless')              ", "from __main__ import compare_list"),
            ("compare_list('unknown')             ", "from __main__ import compare_list"),

            ("compare_set('if')                   ", "from __main__ import compare_set"),
            ("compare_set('unless')               ", "from __main__ import compare_set"),
            ("compare_set('unknown')              ", "from __main__ import compare_set"),
        ], [
            ("compare_list_external('if')         ", "from __main__ import compare_list_external, word_list"),
            ("compare_list_external('unless')     ", "from __main__ import compare_list_external, word_list"),
            ("compare_list_external('unknown')    ", "from __main__ import compare_list_external, word_list"),

            ("compare_set_external('if')          ", "from __main__ import compare_set_external, word_set"),
            ("compare_set_external('unless')      ", "from __main__ import compare_set_external, word_set"),
            ("compare_set_external('unknown')     ", "from __main__ import compare_set_external, word_set"),
        ]
    ]
    for test_suite in tests:
        for test in test_suite:
            print test[0], timeit.timeit(test[0], setup=test[1], number=1000000)
        print
$>python format_path_time.py
'if' in word_list                    0.033300187615
'unless' in word_list                0.0445049416355
'unknown' in word_list               0.0521335926522
'if' in word_set                     0.0403978750395
'unless' in word_set                 0.0390250339062
'unknown' in word_set                0.0376751063804

compare_list('if')                   0.113347689894
compare_list('unless')               0.125640445203
compare_list('unknown')              0.133013886041
compare_set('if')                    0.183080118218
compare_set('unless')                0.18463073734
compare_set('unknown')               0.187830741133

compare_list_external('if')          0.121922119809
compare_list_external('unless')      0.132501885434
compare_list_external('unknown')     0.140963722624
compare_set_external('if')           0.118811005011
compare_set_external('unless')       0.11445860479
compare_set_external('unknown')      0.118301769839

What does it mean?

  • Unless the first term is a match, a set will always be faster than a list to lookup. Especially if there are non-matching candidates.
  • As expected, the set always costs the same.
  • However, a set is more costly to build than a list.
  • If we can store this constant container outside the function, it won't recreate it every time the function gets called. The set becomes cheaper again.

Here again, we can expect up to 17% performance improvement (if there are many misses).

That's pretty much it, in terms of what pops to the eyes, in compiler.py.

Keep up the good work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant