Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant slowdown on nested tables depending on syntax #193

Closed
magistere opened this issue May 17, 2022 · 4 comments · Fixed by #324
Closed

Significant slowdown on nested tables depending on syntax #193

magistere opened this issue May 17, 2022 · 4 comments · Fixed by #324
Assignees

Comments

@magistere
Copy link

Reading TOML files with nested tables with same meaning, but different syntax makes a big difference in speed. Using full names is much slower (~300x on my machine) than using square brackets for common part:

# slow
[options]
a.b.c.d.min1 = '1'
a.b.c.d.typ1 = '2'
a.b.c.d.max1 = '3'
a.b.c.d.min2 = '4'
a.b.c.d.typ2 = '5'
a.b.c.d.max2 = '6'
a.b.c.d.min3 = '7'
a.b.c.d.typ3 = '8'
a.b.c.d.max3 = '9'
a.b.c.d.min4 = '10'
a.b.c.d.typ4 = '11'
a.b.c.d.max4 = '12'
# fast
[options.a.b.c.d]
min1 = '1'
typ1 = '2'
max1 = '3'
min2 = '4'
typ2 = '5'
max2 = '6'
min3 = '7'
typ3 = '8'
max3 = '9'
min4 = '10'
typ4 = '11'
max4 = '12'
@frostming
Copy link
Contributor

The former performs a lot of table merging when parsing, while the latter is a single table with plain table keys.

@magistere
Copy link
Author

magistere commented May 18, 2022

Yes, but seems the number of such mergings grows exponentially with input size. Looking into the profiler output we can see millions of calls for reading only 12 values.

Wed May 18 09:50:06 2022    stats_slow.dat

         5347517 function calls (5112573 primitive calls) in 4.658 seconds

   Ordered by: call count
   List reduced from 467 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1202989    0.676    0.000    1.479    0.000 {built-in method builtins.isinstance}
   841743    0.413    0.000    0.803    0.000 C:\Program Files\Python39\lib\abc.py:117(__instancecheck__)
   841743    0.390    0.000    0.390    0.000 {built-in method _abc._abc_instancecheck}
322156/322051    0.068    0.000    0.068    0.000 {built-in method builtins.len}
320452/320448    0.086    0.000    0.086    0.000 {built-in method builtins.hash}
   320251    0.187    0.000    0.273    0.000 C:\Users\PGR\AppData\Roaming\Python\Python39\site-packages\tomlkit\items.py:375(__hash__)
   171312    0.046    0.000    0.046    0.000 {method 'append' of 'list' objects}
    92702    0.030    0.000    0.030    0.000 C:\Users\PGR\AppData\Roaming\Python\Python39\site-packages\tomlkit\items.py:1200(value)
    89019    0.182    0.000    0.225    0.000 C:\Users\PGR\AppData\Roaming\Python\Python39\site-packages\tomlkit\container.py:756(_previous_item_with_index)
    89019    0.087    0.000    0.311    0.000 C:\Users\PGR\AppData\Roaming\Python\Python39\site-packages\tomlkit\container.py:768(_previous_item)
Wed May 18 09:49:39 2022    stats_fast.dat

         22214 function calls (21661 primitive calls) in 0.038 seconds

   Ordered by: call count
   List reduced from 455 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2300    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
1837/1737    0.000    0.000    0.000    0.000 {built-in method builtins.len}
     1627    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
      771    0.000    0.000    0.000    0.000 C:\Program Files\Python39\lib\sre_parse.py:233(__next)
      643    0.000    0.000    0.000    0.000 {method 'endswith' of 'str' objects}
      621    0.000    0.000    0.001    0.000 C:\Program Files\Python39\lib\sre_parse.py:254(get)
      605    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
      527    0.000    0.000    0.001    0.000 C:\Program Files\Python39\lib\sre_parse.py:164(__getitem__)
      442    0.000    0.000    0.000    0.000 C:\Users\PGR\AppData\Roaming\Python\Python39\site-packages\tomlkit\parser.py:80(_current)
      442    0.000    0.000    0.000    0.000 C:\Users\PGR\AppData\Roaming\Python\Python39\site-packages\tomlkit\source.py:96(current)

@davemc0
Copy link

davemc0 commented Oct 19, 2023

@frostming or others, could I ask what are the chances of fixing this soon? I'm getting six minute(!) runtimes parsing a toml file. See #319. It's a problem so serious that I will have to switch to a different library soon if a fix will not be coming.

The workaround of switching from dotted keys to dotted groups doesn't work on my case because I need to be able to sort the toml file, so each key-value pair needs to be context-free, and groups prevent that.

There are other ways to handle merging tables that don't require exponential growth. I'm happy to try to help, if needed. If you could point me to the place in the code that does this I could take a look.

@frostming
Copy link
Contributor

I won't be able to in the short term, or you can try researching it yourself. I would be happy to review.

The parsing logic is a bit convoluted for such a style-preserving TOML library. I wonder if switching to a non-style parser like tomli would be an option?

@frostming frostming self-assigned this Nov 7, 2023
frostming added a commit that referenced this issue Nov 7, 2023
Fixes #193

Signed-off-by: Frost Ming <me@frostming.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants