Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hlopt] cache based on opcode array #11797

Open
wants to merge 7 commits into
base: development
Choose a base branch
from

Conversation

yuxiaomao
Copy link
Contributor

Related to #8082

Use function path - or function position if the path is empty - as cache key, then compare the whole opcode array (except constant/function indexes) to ensure the cache is valid before replace with the cached result.

The performance is not easy to measure but I did some tests on different game project.
For a game taking around 6s in generate-hl (2-2.5s measured for hlopt), this add 1s overhead on the first run (not all of them are measured inside hlopt, but is likely related to Array.copy on the opcode array), and it will reduce the subsequent runs to 3.5-4.5s (gain 2s compared to no cache).

Currently the cache can be disabled by -D hl_no_opt_cache. I would like to only enable the cache when running with a compilation server, but I didn't find out how to do it.

@Simn
Copy link
Member

Simn commented Oct 22, 2024

It's a curious approach to avoid looking at every opcode by looking at every opcode... Is the goal here to map unoptimized code to optimized code? In that case I wonder why you even need to identify the function, in theory multiple functions could have the same code which then ends up as the same optimized code.

@yuxiaomao
Copy link
Contributor Author

Yes the goal is to map unoptimized code (f.code before _optimize/ c_old_code) to optimized code (f.code after _optimize / c_code), because optimize can take longer time (it's maybe possible to do some opti there too, but I choose begin with this cache which is almost ready to use).
My other attempts trying to create some hash/id has higher overhead, that's why I end up comparing opcode x)

You're right, multiple functions could have the same code. That would require a more complexe data structure for the cache and I'm not sure if it's a true gain, let me check.

@Simn
Copy link
Member

Simn commented Oct 22, 2024

There might be a world where this is implemented as a (opcode list,opcode list) Hashtbl.t. The opcode data structure is fairly atomic with the one exception being OType as far as I can tell. If this was changed to carry some index instead of a ttype then I think it could work.

I didn't exactly think this through though.

@yuxiaomao
Copy link
Contributor Author

I tried several implementation, the performance is very hard to measure so I also tried to explain my feelings.
For Hashtbl (opcode array, cache): subsequent run is at least 0.2s slower than the current solution. I'm wondering if it's related to Hashtbl's nature (slower on search compared to Map), or the fact that I first "remove" all indexes and use the new array as lookup key.
For Tree + Hashtbl/List: performance is clearly worse. Probably due to complexe node creation for the first run, and wolking the tree for subsequent runs.
(I have found a very edge case bug thanks to these implementation.)

I think that use a pure code based cache can only reduce memory footprint and can potentially have lower timing overhead for the first compilation, especially for very small function (40% functions share the "same code" mostly < 10 opcodes I think); it's slower for subsequent runs' lookup due to the creation of a new array.
I don't think it worth to further optimize the Hashtbl array solution.

@@ -1051,121 +1051,24 @@ let _optimize (f:fundecl) =

let same_op op1 op2 =
match op1, op2 with
| OMov (a1,b1), OMov (a2, b2) -> a1 = a2 && b1 = b2
| OInt (r1,_), OInt (r2, _) -> r1 = r2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this and other lines here, I don't understand why OInt(1, 1) and OInt(1, 2) should be considered the same op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OInt reg * int index, what only matters in optimized code is the used register / control flow, and not the index in global int table (e.g. in two different run, the same int 55 can have index 10 or 35). They are "fixed" by code related to c_remap_indexes.

I should double check if the replacement is always good, I'm trying to also remap field index but there are some errors x(

Copy link
Contributor Author

@yuxiaomao yuxiaomao Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same_op is confusing. I should probably rename the function but I don't have a good idea. Maybe same_op_except_index

code can be send as optimize result as-is if no reg map and no nop
operations. Make a code copy if the cache entry is used in this run.
@yuxiaomao
Copy link
Contributor Author

I think now I need to bind the hl_no_opt_cache option with server mode (e.g. only activate the cache if --connect is in params), do you have any suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants