|
| 1 | +/* ********************************************************** |
| 2 | + * Copyright (c) 2010-2021 Google, Inc. All rights reserved. |
| 3 | + * **********************************************************/ |
| 4 | + |
| 5 | +/* Dr. Memory: the memory debugger |
| 6 | + * |
| 7 | + * This library is free software; you can redistribute it and/or |
| 8 | + * modify it under the terms of the GNU Lesser General Public |
| 9 | + * License as published by the Free Software Foundation; |
| 10 | + * version 2.1 of the License, and no later version. |
| 11 | + |
| 12 | + * This library is distributed in the hope that it will be useful, |
| 13 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 14 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
| 15 | + * Library General Public License for more details. |
| 16 | + |
| 17 | + * You should have received a copy of the GNU Lesser General Public |
| 18 | + * License along with this library; if not, write to the Free Software |
| 19 | + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. |
| 20 | + */ |
| 21 | + |
| 22 | +/** |
| 23 | + **************************************************************************** |
| 24 | + **************************************************************************** |
| 25 | +\page page_arm_port ARM Port |
| 26 | + |
| 27 | +# ARM Port Design Document |
| 28 | + |
| 29 | +## Pattern Mode |
| 30 | + |
| 31 | +### Instrumentation to compare a memory value to an immediate |
| 32 | + |
| 33 | +We can't easily use our x86 instrumentation: |
| 34 | + |
| 35 | + cmp <memval>, 0xf1fdf1fd |
| 36 | + |
| 37 | +We may have to do sthg like: |
| 38 | + |
| 39 | + spill r0 |
| 40 | + spill r1 |
| 41 | + ldr r0, <memval> |
| 42 | + movw 0xf1fd, r1 |
| 43 | + movt 0xf1fd, r1 |
| 44 | + cmp r0, r1 |
| 45 | + restore r0 |
| 46 | + restore r1 |
| 47 | + |
| 48 | +#### Thumb mode: can repeat single byte |
| 49 | + |
| 50 | +The expanded-immeds do allow: |
| 51 | + |
| 52 | + cmp r0, 0xf100f100 |
| 53 | + |
| 54 | +Or |
| 55 | + |
| 56 | + cmp r0, 0x00fd00fd |
| 57 | + |
| 58 | +Or |
| 59 | + |
| 60 | + cmp r0, 0xf1f1f1f1 |
| 61 | + |
| 62 | +For Thumb, anyway. |
| 63 | + |
| 64 | +Probably it's worth changing the pattern to avoid extra spills and instrs. |
| 65 | + |
| 66 | +What it looks like with 0xf1f1f1f1: |
| 67 | + |
| 68 | + +22 m4 @0x5291e120 <label> |
| 69 | + +22 m4 @0x5291db00 f8ca c084 str %r12 -> +0x00000084(%r10)[4byte] |
| 70 | + +26 m4 @0x5291e088 f8d3 c004 ldr +0x04(%r3)[4byte] -> %r12 |
| 71 | + +30 m4 @0x5291e03c f1bc 3ff1 cmp %r12 $0xf1f1f1f1 |
| 72 | + +34 m4 @0x5291dfa4 e7fe b.ne @0x5291e0d4[4byte] |
| 73 | + +36 m4 @0x5291df58 de00 udf $0x00000000 |
| 74 | + +38 m4 @0x5291e0d4 <label> |
| 75 | + +38 L3 f843 1f04 str %r1 $0x00000004 %r3 -> +0x04(%r3)[4byte] %r3 |
| 76 | + |
| 77 | +With flags save: |
| 78 | + |
| 79 | + +12 m4 @0x550b2408 f8ca 0084 str %r0 -> +0x00000084(%r10)[4byte] |
| 80 | + +16 m4 @0x550b2920 f3ef 8000 mrs %cpsr -> %r0 |
| 81 | + +20 m4 @0x550b2454 f8ca 0080 str %r0 -> +0x00000080(%r10)[4byte] |
| 82 | + +24 m4 @0x550b1e98 f8d1 00e4 ldr +0x000000e4(%r1)[4byte] -> %r0 |
| 83 | + +28 m4 @0x550b24a0 f1b0 3ff1 cmp %r0 $0xf1f1f1f1 |
| 84 | + +32 m4 @0x550b231c e7fe b.ne @0x550b26fc[4byte] |
| 85 | + +34 m4 @0x550b22d0 de00 udf $0x00000000 |
| 86 | + +36 m4 @0x550b26fc <label> |
| 87 | + +36 L3 f8c1 20e4 str.hi %r2 -> +0x000000e4(%r1)[4byte] |
| 88 | + +40 m4 @0x550b1bd8 f8da 0080 ldr +0x00000080(%r10)[4byte] -> %r0 |
| 89 | + +44 m4 @0x550b252c f380 8c00 msr $0x0c %r0 -> %cpsr |
| 90 | + +48 m4 @0x550b26a8 f8da 0084 ldr +0x00000084(%r10)[4byte] -> %r0 |
| 91 | + |
| 92 | +#### To avoid spilling flags, try sub+cbnz in thumb mode |
| 93 | + |
| 94 | +Our scratch reg must be r0-r7 for cbnz though. |
| 95 | + |
| 96 | +And we'd have to add an IT block for sub (but cbnz cannot be inside it). |
| 97 | + |
| 98 | +So maybe we should only do it when the flags are live? Thus |
| 99 | +adding more complexity to the fault identification code. |
| 100 | + |
| 101 | +#### ARM mode: cannot repeat an immmed byte! Use OP_sub x4? |
| 102 | + |
| 103 | +But what about ARM? ARM immediates in GPR instrs are just an 8-bit value |
| 104 | +rotated: no repeating. Even the SIMD and VFP immeds aren't much help, |
| 105 | +except maybe the cmode=1111 combined with cmode=1100? Subtract one and |
| 106 | +then the other? |
| 107 | + |
| 108 | +We could use mvn if most bits are 1's: sthg like 0xfff1ffff, but we still |
| 109 | +need to spill a reg, and if we do that we may as well use movw,movt. |
| 110 | + |
| 111 | +#### Do 4 subtracts? |
| 112 | + |
| 113 | +Faster than a spill, though not if we have a (2nd) dead reg. |
| 114 | + |
| 115 | +So we'd do: |
| 116 | + |
| 117 | + sub r0, 0xf1000000 |
| 118 | + sub r0, 0x00f10000 |
| 119 | + sub r0, 0x0000f100 |
| 120 | + sub r0, 0x000000f1 |
| 121 | + cmp r0, 0 (cbnz is Thumb-only) |
| 122 | + jne skip |
| 123 | + udf |
| 124 | + skip: |
| 125 | + |
| 126 | +We could use 0xf1fdf1fd here -- but maybe simplest to still limit to |
| 127 | +single-byte for consistency w/ Thumb? |
| 128 | + |
| 129 | +Vs the movw,movt: 2 extra instrs if reg dead, same # and no mem access if |
| 130 | +live. Can we ask drreg whether dead or not? |
| 131 | +=> |
| 132 | +add drreg_is_register_dead() |
| 133 | + |
| 134 | +However, having 2 different versions complicates the fault handling. |
| 135 | + |
| 136 | +Double-checking the compiler doesn't have some trick: |
| 137 | + |
| 138 | + if (argc == 0xf1fdf1fd) |
| 139 | + return 1; |
| 140 | + => |
| 141 | + gcc thumb -O3: |
| 142 | + 8372: f24f 13fd movw r3, #61949 ; 0xf1fd |
| 143 | + 8376: f2cf 13fd movt r3, #61949 ; 0xf1fd |
| 144 | + 837a: 4298 cmp r0, r3 |
| 145 | + gcc arm -O3: |
| 146 | + 8374: e30f31fd movw r3, #61949 ; 0xf1fd |
| 147 | + 8378: e34f31fd movt r3, #61949 ; 0xf1fd |
| 148 | + 837c: e1500003 cmp r0, r3 |
| 149 | + |
| 150 | +Real example: |
| 151 | + |
| 152 | + +4 m4 @0x4f8c5be8 e58a1084 str %r1 -> +0x00000084(%r10)[4byte] |
| 153 | + +8 m4 @0x4f8c60a8 e10f1000 mrs %cpsr -> %r1 |
| 154 | + +12 m4 @0x4f8c5c34 e58a1080 str %r1 -> +0x00000080(%r10)[4byte] |
| 155 | + +16 m4 @0x4f8c6134 e5901000 ldr (%r0)[4byte] -> %r1 |
| 156 | + +20 m4 @0x4f8c6180 e24114f1 sub %r1 $0xf1000000 -> %r1 |
| 157 | + +24 m4 @0x4f8c5ff4 e24118f1 sub %r1 $0x00f10000 -> %r1 |
| 158 | + +28 m4 @0x4f8c5b10 e2411cf1 sub %r1 $0x0000f100 -> %r1 |
| 159 | + +32 m4 @0x4f8c5f28 e24110f1 sub %r1 $0x000000f1 -> %r1 |
| 160 | + +36 m4 @0x4f8c5f68 e3510000 cmp %r1 $0x00000000 |
| 161 | + +40 m4 @0x4f8c5b50 1afffffe b.ne @0x4f8c60f4[4byte] |
| 162 | + +44 m4 @0x4f8c6250 e7f000f0 udf $0x00000000 |
| 163 | + +48 m4 @0x4f8c60f4 e7f000f0 <label> |
| 164 | + +48 L3 e5900000 ldr (%r0)[4byte] -> %r0 |
| 165 | + +52 m4 @0x4f8c6290 e59a1080 ldr +0x00000080(%r10)[4byte] -> %r1 |
| 166 | + +56 m4 @0x4f8c62dc e12cf001 msr $0x0c %r1 -> %cpsr |
| 167 | + +60 m4 @0x4f8c6334 e59a1084 ldr +0x00000084(%r10)[4byte] -> %r1 |
| 168 | + |
| 169 | +#### Switch to thumb mode just for the cmp? |
| 170 | + |
| 171 | +Breaks DR's rules: would mess up decode_fragment. |
| 172 | + |
| 173 | +Instead of inlining, could jump to separate gencode (need 15 forms one for |
| 174 | +each scratch reg) -- if already in cache maybe ok that it's not local. |
| 175 | + |
| 176 | +#### Load immed from TLS slot |
| 177 | + |
| 178 | +If TLS in data cache and have L1 hit, may be as fast as movw,movt, and |
| 179 | +it's shorter code. |
| 180 | + |
| 181 | +#### Go w/ unified ARM+Thumb same approach for simpler code? |
| 182 | + |
| 183 | +#### Permanently steal another reg? |
| 184 | + |
| 185 | +Very complex w/ interactions w/ r10 though |
| 186 | + |
| 187 | +#### Put the optimizations under an option and under option switch to single-byte pattern val |
| 188 | + |
| 189 | +#### For 2 spills, have drreg use ldm or ldrd? |
| 190 | + |
| 191 | +For 2 spills, is ldm or ldrd faster? Qin's initial tests showed no faster |
| 192 | +than separate ldr x2, so even though instr density is better, if it makes drreg |
| 193 | +really complex it's prob not worth doing. |
| 194 | + |
| 195 | + |
| 196 | +**************************************************************************** |
| 197 | +**************************************************************************** |
| 198 | +*/ |
0 commit comments