[hlopt] cache based on opcode array #11797

yuxiaomao · 2024-10-22T10:35:51Z

Related to #8082

Use function path - or function position if the path is empty - as cache key, then compare the whole opcode array (except constant/function indexes) to ensure the cache is valid before replace with the cached result.

The performance is not easy to measure but I did some tests on different game project.
For a game taking around 6s in generate-hl (2-2.5s measured for hlopt), this add 1s overhead on the first run (not all of them are measured inside hlopt, but is likely related to Array.copy on the opcode array), and it will reduce the subsequent runs to 3.5-4.5s (gain 2s compared to no cache).

Currently the cache can be disabled by -D hl_no_opt_cache. I would like to only enable the cache when running with a compilation server, but I didn't find out how to do it.

Simn · 2024-10-22T10:57:12Z

It's a curious approach to avoid looking at every opcode by looking at every opcode... Is the goal here to map unoptimized code to optimized code? In that case I wonder why you even need to identify the function, in theory multiple functions could have the same code which then ends up as the same optimized code.

yuxiaomao · 2024-10-22T12:32:38Z

Yes the goal is to map unoptimized code (f.code before _optimize/ c_old_code) to optimized code (f.code after _optimize / c_code), because optimize can take longer time (it's maybe possible to do some opti there too, but I choose begin with this cache which is almost ready to use).
My other attempts trying to create some hash/id has higher overhead, that's why I end up comparing opcode x)

You're right, multiple functions could have the same code. That would require a more complexe data structure for the cache and I'm not sure if it's a true gain, let me check.

Simn · 2024-10-22T12:37:41Z

There might be a world where this is implemented as a (opcode list,opcode list) Hashtbl.t. The opcode data structure is fairly atomic with the one exception being OType as far as I can tell. If this was changed to carry some index instead of a ttype then I think it could work.

I didn't exactly think this through though.

Where f has more unused regs at the end but the code is the same.

yuxiaomao · 2024-10-24T10:06:28Z

I tried several implementation, the performance is very hard to measure so I also tried to explain my feelings.
For Hashtbl (opcode array, cache): subsequent run is at least 0.2s slower than the current solution. I'm wondering if it's related to Hashtbl's nature (slower on search compared to Map), or the fact that I first "remove" all indexes and use the new array as lookup key.
For Tree + Hashtbl/List: performance is clearly worse. Probably due to complexe node creation for the first run, and wolking the tree for subsequent runs.
(I have found a very edge case bug thanks to these implementation.)

I think that use a pure code based cache can only reduce memory footprint and can potentially have lower timing overhead for the first compilation, especially for very small function (40% functions share the "same code" mostly < 10 opcodes I think); it's slower for subsequent runs' lookup due to the creation of a new array.
I don't think it worth to further optimize the Hashtbl array solution.

Simn · 2024-10-24T10:20:16Z

src/generators/hlopt.ml

@@ -1051,121 +1051,24 @@ let _optimize (f:fundecl) =

 let same_op op1 op2 =
 	match op1, op2 with
-	| OMov (a1,b1), OMov (a2, b2) -> a1 = a2 && b1 = b2
 	| OInt (r1,_), OInt (r2, _) -> r1 = r2


Looking at this and other lines here, I don't understand why OInt(1, 1) and OInt(1, 2) should be considered the same op.

It's OInt reg * int index, what only matters in optimized code is the used register / control flow, and not the index in global int table (e.g. in two different run, the same int 55 can have index 10 or 35). They are "fixed" by code related to c_remap_indexes.

I should double check if the replacement is always good, I'm trying to also remap field index but there are some errors x(

same_op is confusing. I should probably rename the function but I don't have a good idea. Maybe same_op_except_index

unused args' reg should remain

code can be send as optimize result as-is if no reg map and no nop operations. Make a code copy if the cache entry is used in this run.

yuxiaomao · 2024-10-25T13:50:33Z

I think now I need to bind the hl_no_opt_cache option with server mode (e.g. only activate the cache if --connect is in params), do you have any suggestion?

[hlopt] cache based on opcode array

e5e86f2

yuxiaomao added 3 commits October 24, 2024 09:49

Fix last_used position

ea0e41d

Fix reg_map segfault in an extreme case

103711e

Where f has more unused regs at the end but the code is the same.

Simplify same_op by using op1=op2

338a9e6

Simn reviewed Oct 24, 2024

View reviewed changes

yuxiaomao added 3 commits October 24, 2024 15:47

Fix nargs differs with unused args in a special case

e9a3cec

unused args' reg should remain

Fix code reuse in the same run

bf6aa9a

code can be send as optimize result as-is if no reg map and no nop operations. Make a code copy if the cache entry is used in this run.

Rename same_op so its more clear

00963c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hlopt] cache based on opcode array #11797

[hlopt] cache based on opcode array #11797

yuxiaomao commented Oct 22, 2024

Simn commented Oct 22, 2024

yuxiaomao commented Oct 22, 2024

Simn commented Oct 22, 2024

yuxiaomao commented Oct 24, 2024

Simn Oct 24, 2024

yuxiaomao Oct 24, 2024

yuxiaomao Oct 24, 2024 •

edited

Loading

yuxiaomao commented Oct 25, 2024

[hlopt] cache based on opcode array #11797

Are you sure you want to change the base?

[hlopt] cache based on opcode array #11797

Conversation

yuxiaomao commented Oct 22, 2024

Simn commented Oct 22, 2024

yuxiaomao commented Oct 22, 2024

Simn commented Oct 22, 2024

yuxiaomao commented Oct 24, 2024

Simn Oct 24, 2024

Choose a reason for hiding this comment

yuxiaomao Oct 24, 2024

Choose a reason for hiding this comment

yuxiaomao Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

yuxiaomao commented Oct 25, 2024

yuxiaomao Oct 24, 2024 •

edited

Loading