Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: alias analysis: track each individual table/heap separately #4166

Open
cfallin opened this issue May 19, 2022 · 4 comments
Open
Labels
cranelift:goal:optimize-speed Focus area: the speed of the code produced by Cranelift. cranelift Issues related to the Cranelift code generator enhancement

Comments

@cfallin
Copy link
Member

cfallin commented May 19, 2022

In #4163 we are introducing an alias analysis and redundant-load elimination / store-to-load-forwarding transform.

This initial implementation categorizes all memory accesses as one of four kinds: to a "heap", to a "table", to the "vmctx", or to everything else. These four categories are allowed to be optimized separately from each other; so e.g. a store to a table does not prevent a load from a heap from being merged with an earlier load, if otherwise to the same address.

This is correct, and simple, and allows us to keep just four bits in MemFlags and four u32s for the "last store" vector, per instruction. However, it is somewhat more imprecise than we would like, especially in the future when we expect multiple modules, memories, tables, etc. to become more common.

Thus, we should investigate ways of efficiently representing an arbitrary number of heaps or tables as separate categories of abstract state. This may require an extended MemFlags, or indirection of some kind, or some limit (first 16, 32, ... memories are privileged).

@cfallin cfallin added enhancement cranelift Issues related to the Cranelift code generator cranelift:goal:optimize-speed Focus area: the speed of the code produced by Cranelift. labels May 19, 2022
@fitzgen
Copy link
Member

fitzgen commented May 20, 2022

One possibility is that we have "heap0", "heap1", "heap2", and finally "heap_other" (or even just heap0 and heap_other).

The CG has talked about using hints for which memories need to be fast and use virtual memory tricks in browsers which can't use those tricks for every memory. Maybe we could use those same hints to map onto heap0/1/2 vs other.

@fitzgen
Copy link
Member

fitzgen commented May 20, 2022

or some limit (first 16, 32, ... memories are privileged).

Ah I think this is the same thing I was getting at with heap0/1/2 vs heap_other.

@bjorn3
Copy link
Contributor

bjorn3 commented May 20, 2022

One possibility is that we have "heap0", "heap1", "heap2", and finally "heap_other" (or even just heap0 and heap_other).

That won't help for stack slots though. Those are really important for cg_clif. Maybe we could have a side table recording for each instruction which alias set it is part of?

@cfallin
Copy link
Member Author

cfallin commented May 20, 2022

@bjorn3 yes, that could work, as long as it is optional (for memory-overhead reasons). The advantage of MemFlags now is that it's a u8 (or maybe extended to 16 or 32 bits) that can ride along in the InstructionData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:goal:optimize-speed Focus area: the speed of the code produced by Cranelift. cranelift Issues related to the Cranelift code generator enhancement
Projects
None yet
Development

No branches or pull requests

3 participants