-
Notifications
You must be signed in to change notification settings - Fork 24
Add orig_insn_map
: mapping from original to new insn indices.
#50
Conversation
I see why this would be wanted. But it's another 8 bytes per instruction that the allocator needs to sit on and then drag through the machine's caches, and we're already way slow. Is this mapping needed in production runs? If not, I'd prefer to have it implemented similar to the existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a few comments below, in addition to Julian's remark.
lib/src/inst_stream.rs
Outdated
@@ -463,15 +471,18 @@ fn add_spills_reloads_and_moves<F: Function>( | |||
&& insts_to_add[curITA].point == InstPoint::new_reload(iix) | |||
{ | |||
insns.push(insts_to_add[curITA].inst.construct(func)); | |||
orig_insn_map.push(None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you maintain a mapping of new instructions to old instructions, thus needing to push None; could it also work with the inverted mapping (old instructions to new instructions), removing the need for None elements in the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make debuginfo generation slower, as you would need to iterate through the whole map to find the original instruction corresponding to the instruction you are currently emitting lineinfo for. Also you may want to attribute those extra insts to iix
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about attributing those extra insts to iix
.
However i think there's still a linear way to find the source information for a given new inst, since the source locations are handled all at once: start with the first old instruction, remember what its new instruction is, then on the second instruction, you see the new inst mapping to the second instruction; you then know that all the instructions between first's new instruction and second's previous new instruction all map to the first old instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnjbvr I didn't fully grok your last sentence, but I think you're more-or-less suggesting to invert the map in one linear pass?
I actually originally had the regalloc side of this do exactly as you say, but the issue is then we have to allocate a O(|new_insns|)
-sized array anyway to do the inversion, and we have the extra loop with random memory accesses to do it.
IMHO with the SourceLoc::default()
value you had mentioned in the other PR, this cost (4 bytes per new-insn) is a pretty reasonable tradeoff...
@julian-seward1 re: only doing this when necessary: it appears that the Per @bnjbvr in the related Cranelift PR, we have |
Err, I misspoke w.r.t. the default val (we're in the regalloc crate here -- we're operating with |
@cfallin ah well, so be it. If we have to have it, we have to have it. I'll try and quantify the cost in the next round of RA profiling. |
Just added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The fact the name not_present
leaks in a few places where a default impossible value was used is a bit unfortunate, but the alternative doesn't seem much better (i.e. change vocabulary around this to "set/unset" or "defined/undefined" or "default value vs non-default", and then the regalloc.rs users may be misled by its name). Shipit!
This is needed in the Cranelift client code in order to track source-location mapping for debug info: we need to be able to shuffle the source locations into the new locations after the regalloc has done its instruction-stream editing. The `target_map` result is not quite good enough, because it only provides old --> new mappings at a basic block granularity. This commit also adds "invalid" values to the index types; this is useful to represent "new instruction maps to no old instruction" without the memory-layout waste of an `Option<u32>`.
This is needed in the Cranelift client code in order to track
source-location mapping for debug info: we need to be able to shuffle
the source locations into the new locations after the regalloc has done
its instruction-stream editing. The
target_map
result is not quitegood enough, because it only provides old --> new mappings at a basic
block granularity.