Use for comprehension to speed up python code #191

mgzuber · 2024-12-05T10:21:06Z

Hi,

I noticed your python could use with some simple improvements to speed up execution time. For loops in python are slow, but for comprehensions help out a lot. On my machine, this implementation speed things up by about 2x.

bddicken · 2024-12-06T18:42:38Z

Thanks!

Can you (or someone) provide a deeper explanation of this change?

mmurrian · 2024-12-06T21:30:16Z

Maybe it would be helpful to see a literal C translation of the given Python algorithm? (...or maybe not at all)

The operational change is this: a = [sum([j % u for j in range(100_000)]) + r for _ in range(10_000)]

And breaking it down from inner-most scope outwards...

[j % u for j in range(100_000)] has this literal C equivalent:

int jmodu[100000];
for (int j = 0; j < 100000; j++) {
  jmodu[j] = j % u;
}

And sum(...) is this:

int sum = 0;
for (int j = 0; j < 100000; j++) {
  sum += jmodu[j];
}

Finally, a = [sum(...) + r for _ in range(10_000)] is this:

int a[10000];
for (int i = 0; i < 10000; i++) {
  a[i] = sum + r;
}

I will say that it appears Python recomputes sum 10,000 times, instead of only once. So, there is at least no unfair advantage hiding there.

Altogether, the literal C translation might look like this:

int a[10000] = {0};
for (int i = 0; i < 10000; i++) {
  int jmodu[100000];
  for (int j = 0; j < 100000; j++) {
    jmodu[j] = j % u;
  }
  for (int j = 0; j < 100000; j++) {
    a[i] += jmodu[j];
  }
  a[i] += r;
}

poudro · 2024-12-07T09:32:16Z

Python optimises list comprehensions compared to for loops, here is a video that describes this https://www.youtube.com/watch?v=U88M8YbAzQk

Using the dis (disassemble) package we can compare the bytecode between

Your code:

    a = [0] * 10000  # Array of 10k elements initialized to 0
    for i in range(10000):  # 10k outer loop iterations
        for j in range(100000):  # 100k inner loop iterations, per outer loop iteration
            a[i] += j%u  # Simple sum
        a[i] += r  # Add a random value to each element in array

which is

        >>  136 FOR_        >>  136 FOR_ITER                47 (to 234)
            140 STORE_FAST               3 (i)

 10         142 LOAD_GLOBAL             11 (NULL + range)
            152 LOAD_CONST               4 (100000)
            154 CALL                     1
            162 GET_ITER
        >>  164 FOR_ITER                18 (to 204)
            168 STORE_FAST               4 (j)

 11         170 LOAD_FAST                2 (a)
            172 LOAD_FAST                3 (i)
            174 COPY                     2
            176 COPY                     2
            178 BINARY_SUBSCR
            182 LOAD_FAST                4 (j)
            184 LOAD_FAST                0 (u)
            186 BINARY_OP                6 (%)
            190 BINARY_OP               13 (+=)
            194 SWAP                     3
            196 SWAP                     2
            198 STORE_SUBSCR
            202 JUMP_BACKWARD           20 (to 164)

 10     >>  204 END_FOR

 12         206 LOAD_FAST                2 (a)
            208 LOAD_FAST                3 (i)
            210 COPY                     2
            212 COPY                     2
            214 BINARY_SUBSCR
            218 LOAD_FAST                1 (r)
            220 BINARY_OP               13 (+=)
            224 SWAP                     3
            226 SWAP                     2
            228 STORE_SUBSCR
            232 JUMP_BACKWARD           49 (to 136)

  9     >>  234 END_FOR

and the list comprehension that has the same output

a = [sum(j%u for j in range(100000)) + r for i in range(10000)]

which is

        >>  134 FOR_ITER                34 (to 206)
            138 STORE_FAST               1 (i)
            140 LOAD_GLOBAL             13 (NULL + sum)
            150 LOAD_CLOSURE             3 (u)
            152 BUILD_TUPLE              1
            154 LOAD_CONST               4 (<code object <genexpr> at 0x1021ccc60, file "languages/loops/py/code.py", line 8>)
            156 MAKE_FUNCTION            8 (closure)
            158 LOAD_GLOBAL             11 (NULL + range)
            168 LOAD_CONST               5 (100000)
            170 CALL                     1
            178 GET_ITER
            180 CALL                     0
            188 CALL                     1
            196 LOAD_FAST                0 (r)
            198 BINARY_OP                0 (+)
            202 LIST_APPEND              2
            204 JUMP_BACKWARD           36 (to 134)
        >>  206 END_FOR

with the call to inner loop

Disassembly of <code object <genexpr> at 0x1021ccc60, file "languages/loops/py/code.py", line 8>:
              0 COPY_FREE_VARS           1

  8           2 RETURN_GENERATOR
              4 POP_TOP
              6 RESUME                   0
              8 LOAD_FAST                0 (.0)
        >>   10 FOR_ITER                 9 (to 32)
             14 STORE_FAST               1 (j)
             16 LOAD_FAST                1 (j)
             18 LOAD_DEREF               2 (u)
             20 BINARY_OP                6 (%)
             24 YIELD_VALUE              1
             26 RESUME                   1
             28 POP_TOP
             30 JUMP_BACKWARD           11 (to 10)
        >>   32 END_FOR
             34 RETURN_CONST             0 (None)
        >>   36 CALL_INTRINSIC_1         3 (INTRINSIC_STOPITERATION_ERROR)
             38 RERAISE                  1

The extra optimisation makes the comprehension about twice as fast. Most python practitionners know this optimisation pretty well so will prefer to write comprehensions whenever possible.

Obviously it's still a long way away from C/Rust 😅

mgzuber · 2024-12-07T10:37:24Z

Thank you @mmurrian and @poudro for those wonderful explanations! I'll add my bit.

A python list comprehension of the form:

my_list = [item for item in other_list]

has the same effect as

my_list = []
for item in other_list:
    my_list.append(item)

So starting from the inner list:

[j % u for j in range(100_000)]

has the same effect as

my_list = []
for j in range(100_000):
    my_list.append(j % u)

Taking the sum of this list is therefore the same as:

val = 0
for j in range(100_000):
    val += j % u

Which is the same as the inner loop in the original code, where val = a[i]. Adding r to the sum is therefore the same as a[i] += r. This constructs each element in the array, so the outer list comprehension replaces the outer loop. We don't actually need to keep track of the i variable, so it is replaced with _ to give:

a = [sum([j % u for j in range(100_000)]) + r for _ in range(10_000)]

As explained by @poudro, most python practitioners are well versed in the fact list comprehensions are faster than python for loops, and so will use these whenever possible. In this case, it leads to less code (only 1 line!) and faster performance.

axman6 · 2024-12-08T23:59:19Z

It's fun seeing how much this looks like the (un-needlessly obfuscated) Haskell implementation.

artemisart · 2024-12-09T16:45:28Z

You can use sum(...) instead of sum([...]) to avoid allocating the list.

bddicken added the needs updates / explanation label Dec 6, 2024

Use for comprehension to speed up python code

b87ec4e

mgzuber force-pushed the python_speedup branch from d43c8a6 to b87ec4e Compare December 7, 2024 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use for comprehension to speed up python code #191

Use for comprehension to speed up python code #191

mgzuber commented Dec 5, 2024

bddicken commented Dec 6, 2024

mmurrian commented Dec 6, 2024 •

edited

Loading

poudro commented Dec 7, 2024 •

edited

Loading

mgzuber commented Dec 7, 2024

axman6 commented Dec 8, 2024

artemisart commented Dec 9, 2024

Use for comprehension to speed up python code #191

Are you sure you want to change the base?

Use for comprehension to speed up python code #191

Conversation

mgzuber commented Dec 5, 2024

bddicken commented Dec 6, 2024

mmurrian commented Dec 6, 2024 • edited Loading

poudro commented Dec 7, 2024 • edited Loading

mgzuber commented Dec 7, 2024

axman6 commented Dec 8, 2024

artemisart commented Dec 9, 2024

mmurrian commented Dec 6, 2024 •

edited

Loading

poudro commented Dec 7, 2024 •

edited

Loading