The Return of Cast Operations #503

billhuffman · 2020-06-26T18:10:14Z

We've decided to require (the appearance of) SLEN=VLEN. Early in the discussion of the issue, we considered having cast operations that would rearrange for different element size because it was important in a small number of codes. We may still want to have those cast operations, though now only for performance on wide machines. The cast instruction will perform better on a wide, in-order implementation than the auto-inserted micro-op. As before, the cast is a nop on narrow machines.

I think all the issues about fragmentation are gone here. With and without both work on all implementations. But for performance optimization, they will want to be used on wide, in-order implementations and possibly also on wide, OoO implementations.

David-Horner · 2020-06-26T19:22:35Z

On 2020-06-26 2:10 p.m., billhuffman wrote: We've decided to require (the appearance of) SLEN=VLEN. Early in the discussion of the issue, we considered having cast operations that would rearrange for different element size because it was important in a small number of codes. We may still want to have those cast operations, though now only for performance on wide machines. The cast instruction will perform better on a wide, in-order implementation than the auto-inserted micro-op. As before, the cast is a nop on narrow machines.

Is a vector move to itself (vmv.vv v#,v#) sufficient for most cases? It could thus be defined as the cast hint to the current SEW. As there is no longer a need for the cast to specify a to/from Element Width (EW). (I didn't see any specific proposals with arguments/format et al. Although I was on the look out for them). I can envision a preemptive situation, in which a register write is prefixed (-or- a following cast instruction fused to it) with its expected target EW, to provide that register write a "preferred" structure. In the case of multiple reads of that register the microarchs are avoided. And this could be done in advance of the vsetvli to the new SEW (or EEW in the case of a narrowing op). However, is this a frequent use case? Sufficient to provide a specific cast instruction? If it is sufficiently significant, I would rather propose a prefix cast hint. (the mv without or without the prefix will suffice as a cast op).

…

I think all the issues about fragmentation are gone here. With and without both work on all implementations. But for performance optimization, they will want to be used on wide, in-order implementations and possibly also on wide, OoO implementations. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#503>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFAWIKLR63LUDMQFIYHMK6TRYTQBVANCNFSM4OJRVZUA>.

billhuffman · 2020-06-26T19:50:31Z

That might work. My hesitation would be that a move would likely take one cycle and a cast would likely take more. So, the dispatch computation would need to have taken account of the current EW and the move instruction before dispatch. I assume the compiler would know it was changing widths and could leave additional cycles in it's expectation of when the result would be available. So the extra dispatch recurrence time is what would worry me. Worth thinking about. It at least separates the rearrangement operation from the following use at different width. I wonder if a "cast to width" instruction would be better. It would be assumed to take longer to execute and could be dispatched without comparing widths and doing different things for change-of-EW and no-change-of-EW scenarios. So, move would assume no-change-of-EW and cast would assume change-of-EW. Maybe there's an encoding similar to vmv.v.v that could do this. Bill On 6/26/20 12:22 PM, David-Horner wrote: EXTERNAL MAIL

On 2020-06-26 2:10 p.m., billhuffman wrote: We've decided to require (the appearance of) SLEN=VLEN. Early in the discussion of the issue, we considered having cast operations that would rearrange for different element size because it was important in a small number of codes. We may still want to have those cast operations, though now only for performance on wide machines. The cast instruction will perform better on a wide, in-order implementation than the auto-inserted micro-op. As before, the cast is a nop on narrow machines.

Is a vector move to itself (vmv.vv v#,v#) sufficient for most cases? It could thus be defined as the cast hint to the current SEW. As there is no longer a need for the cast to specify a to/from Element Width (EW). (I didn't see any specific proposals with arguments/format et al. Although I was on the look out for them). I can envision a preemptive situation, in which a register write is prefixed (-or- a following cast instruction fused to it) with its expected target EW, to provide that register write a "preferred" structure. In the case of multiple reads of that register the microarchs are avoided. And this could be done in advance of the vsetvli to the new SEW (or EEW in the case of a narrowing op). However, is this a frequent use case? Sufficient to provide a specific cast instruction? If it is sufficiently significant, I would rather propose a prefix cast hint. (the mv without or without the prefix will suffice as a cast op).

I think all the issues about fragmentation are gone here. With and without both work on all implementations. But for performance optimization, they will want to be used on wide, in-order implementations and possibly also on wide, OoO implementations. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#503>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFAWIKLR63LUDMQFIYHMK6TRYTQBVANCNFSM4OJRVZUA><https://github.com/notifications/unsubscribe-auth/AFAWIKLR63LUDMQFIYHMK6TRYTQBVANCNFSM4OJRVZUA>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/issues/503*issuecomment-650354562__;Iw!!EHscmS1ygiU1lA!WkFMpIHu6mHa5_rJTKZv2I0hWCPsANTQXCfg_xVqtweonPf04HvHkIsc1fQsW4U$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKXXKKHOENAWNVZD5VNL6V3RYTYQRANCNFSM4OJRVZUA__;!!EHscmS1ygiU1lA!WkFMpIHu6mHa5_rJTKZv2I0hWCPsANTQXCfg_xVqtweonPf04HvHkIsc0g7xJ-U$>.

billhuffman · 2020-06-26T19:56:50Z

Ummm.... I didn't fully take in what you were saying before. You were saying, in my terminology, that dispatch would assume that a vmv.v.v with source and destination identical would be assumed by dispatch to take the additional cycles. In any case, vmv.v.v with source=destination would be assumed to represent a cast sort of operation to SEW. I take back my hesitation. I think that might work well. I would like to see at least non-normative text suggesting that so that it works the same way across all implementations that need the hint. Bill

billhuffman · 2020-06-26T20:06:15Z

I'd say let's put a non-normative comment that vmv.v.v with dest=source is expected to be used as a hint, for implementations that need it, that element width is changing from an old element width to the current SEW. If that's added, we can close this issue.

David-Horner · 2020-06-26T21:25:38Z

The beauty is that it (vmv.v.v with source=destination) does the right thing as a hint or as a physical move when internal EW is tracked.

Although it only supports internal reformatting to "SEW friendly" internal state,
if more is needed, the preemptive formatting to explicit EW hint can be added post v1.0.

I agree, however, that v1.0 could benefit from the inclusion of vmv.v.v with dest=source described as a hint.

It helps by

introducing hints to RVV
highlights unique aspects of "nop"s in RVV and
disambiguates this obvious special case.

Do you have any proposed wording?

billhuffman · 2020-06-26T21:40:13Z

How about: The vmv.v.v instruction with source = destination is a functional nop. It is used as a "hint" to indicate the element width of the next use when the element width of the previous use likely was different. Implementations may execute the nop move, drop the instruction entirely, or rearrange the bytes of the vector register as needed for best performance assuming the element width of the next use will be the current SEW. Bill

David-Horner · 2020-06-26T22:58:25Z

A minor tweak in the last sentence: or *assume that element width of the next use will be the current SEW *and rearrange the bytes of the vector register as needed for best performance. What also needs to be added is a section like RVI and RVC on hints. I will open an issue on this topic, as I believe the above hint description would benefit from that context.

On 2020-06-26 5:40 p.m., billhuffman wrote: How about: The vmv.v.v instruction with source = destination is a functional nop. It is used as a "hint" to indicate the element width of the next use when the element width of the previous use likely was different. Implementations may execute the nop move, drop the instruction entirely,

or *assume that element width of the next use will be the current SEW *and rearrange the bytes of the vector register as needed for best performance.

…

Bill — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#503 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFAWIKMMP7WONBZGGHJEV7LRYUIUTANCNFSM4OJRVZUA>.

billhuffman · 2020-06-26T23:03:44Z

On 6/26/20 3:58 PM, David-Horner wrote: A minor tweak in the last sentence: or *assume that element width of the next use will be the current SEW *and rearrange the bytes of the vector register as needed for best performance. Yes. Better. Bill

kasanovic · 2020-07-03T09:07:25Z

This looks like a good resolution.

kasanovic · 2020-07-03T09:11:26Z

Might want both vmv.v.v form for hint that "vl" elements need to be rearranged, and vmv?r.v hints for whole vector register groups.

…>r). Discussed in issue #503.

David-Horner · 2020-07-08T13:40:00Z

I disagree with multiple formulations of whole register (WR) loads to provide hint to microarch of requested internal structure, because (detailed below)

1.. these hints only potentially benefit machines with in-register not in-mem order
2.. better mechanisms should be possible to optimize behaviour for such machines
3.. the designation has diminished value for multiple register load with a single width hint.
4. The typical WR use case is for autonomous/independent spill and fill that are unaware of subsequent use
5.. it complicates both the hardware and software causing potential confusion to each
(software needs to use a size specific load in a context where such info is not immediately available if at all)
6. hint can leads to lower performance for machines not needing them (a consequence of 5).

I believe it is premature to provide these whole register (WR) load hints.
I propose we only use “e1024 encoding” (mop-width all ones) for WR loads and for WR stores.
The implementation will decide what internal format to use, software can assume current sew is used.
Conventions can be used to optimize the re-establishment of internal width information.

Further, I recommend we emphasis the in-memory format of the model. Specifically, that software can consider unit strided elements as equivalent to unit strided elements of narrower widths (and wider elements if alignment requirements are met) with an appropriately adjusted count.

from RVI:

We considered but did not include static branch hints in the instruction encoding. These
can reduce the pressure on dynamic predictors, but require more instruction encoding space and
software profiling for best results, and can result in poor performance if production runs do not
match profiling runs.

Only physical implementations that have differing in-register order formats will potentially benefit from these hints. It thus makes sense to me that such hints be proposed as an extension rather than in the base. It would then get review for the matters considered below.
better mechanisms should be possible to optimize behaviour for such machines
a) implementations may stack and pop internal width information (just as JALR use a convention to stack and pop return addresses). This would be based on a convention or pairing store/loads.
A possible convention is stack width with a whole register store that writes based on r2 , the stack pointer.
b) Another approach implementations could use is store width with address of store, so that load from that address will be associated with the corresponding internal width format.
c) an implementation could always load with the internal format that is least disruptive for its ELEN range. Or dynamically change the default given the recently executed whole register stores.
d) a similar tracking that a microarch performs over a loop: either
it decodes ahead and determines the future usage width for the register and stores according to that format or
profiles the use during the last use in the loop and applies the appropriate internal format from that.
This latter is an optimization that potentially benefits multi in-register order machines. It is questionable that this tracking will provide value as subsequent use of a register with a different EEW than written is relatively rare. (but significantly frequent and, as we decided, impacting to an explicit SLEN<VLEN).

As with JALR refinement of conventions and recommended hardware response can evolve.

the designation has diminished value for multiple register load with a single width hint.
When a group of registers is loaded, the hint can only provide the width for one of them. Forcing all registers into a single width may optimize one of the register’s subsequent use at the expense of the (up to 7) other registers.
The typical WR use case is for autonomous/independent spill and fill that are unaware of subsequent use.
In the envisioned typical use, called routines or interrupt routines will spill and fill registers as needed to perform the requested function. The internal format information is lost over such a process, potentially causing subsequent performance degradation Whereas there is the potential for improved code performance, these width hints may actually yield to poorer performance in dynamic use situations.
As a hint, hardware need only mask out all width designations and perform the same action regardless of width hint. However, providing the hint itself will be a causes of confusion as even minimal vector implementations will need to understand the SLEN<VLEN issues to assertain that they have no need to consider the hint. Similarly, software exception handlers and even vector routines will need substantial analysis to determine an optimal hint for a given target class (that may not be relevant to specific machines in that class as described above). Exception handlers especially will be challenged on the optimal use as no mechanism exists to access the current internal format or even last written EEW for any set of registers.
hint can leads to lower performance for machines not needing them (a consequence of 5).
A machine design may (erroneously) use the designated hint width for a WR load potentially increasing memory side activity (for example performing byte transfers when word or cache line would be appropriate and more performant).
I am sure there will be other examples.

billhuffman · 2020-07-08T16:15:28Z

David, We have agreed on a model where registers always behave as they would if they were stride-1 memory. The hints significantly help wide machines, don't take enough encoding space to matter, and are never required to be any particular value. Specific comments interleaved. On 7/8/20 6:40 AM, David-Horner wrote: EXTERNAL MAIL I disagree with multiple formulations of whole register (WR) loads to provide hint to microarch of requested internal structure, because (detailed below) 1.. these hints only potentially benefit machines with in-register not in-mem order We make simple additions and choices for various classes of machine. 2.. better mechanisms should be possible to optimize behaviour for such machines I think the "better" mechanisms you suggest below are significantly more complex and I'm not clear for most of them that they work. 3.. the designation has diminished value for multiple register load with a single width hint. A compiler that cares about this issue can avoid it most of the time. There will sometimes be a tradeoff. 4. The typical WR use case is for autonomous/independent spill and fill that are unaware of subsequent use The performance sensitive cases are for spill/fill in the same function and the compiler knows how the register is being used. In complex functions, spill/fill performance matters in a number of codes that are currently heavily used for us. I expect a factor greater than 2x loss for some functions without this. Cases like full context switch where the software has no idea hardly matter. 5.. it complicates both the hardware and software causing potential confusion to each (software needs to use a size specific load in a context where such info is not immediately available if at all) Hardware either knows how the bits are to be used or ignores them. Most hardware (and all simple hardware) will ignore the bits. Broadly used software should know how to set the bits correctly most of the time. Where they're not correct, it's a performance loss and those that care about the loss can upstream improvements. Code that's used only on machines that don't care can not care. 6. hint can leads to lower performance for machines not needing them (a consequence of 5). As far as I can tell, there's absolutely no loss of performance. Ever. Can you suggest a case where there's a loss? Bill I believe it is premature to provide these whole register (WR) load hints. I propose we only use “e1024 encoding” (mop-width all ones) for WR loads and for WR stores. The implementation will decide what internal format to use, software can assume current sew is used. Conventions can be used to optimize the re-establishment of internal width information. Further, I recommend we emphasis the in-memory format of the model. Specifically, that software can consider unit strided elements as equivalent to unit strided elements of narrower widths (and wider elements if alignment requirements are met) with an appropriately adjusted count. from RVI: We considered but did not include static branch hints in the instruction encoding. These can reduce the pressure on dynamic predictors, but require more instruction encoding space and software profiling for best results, and can result in poor performance if production runs do not match profiling runs. 1. Only physical implementations that have differing in-register order formats will potentially benefit from these hints. It thus makes sense to me that such hints be proposed as an extension rather than in the base. It would then get review for the matters considered below. 2. better mechanisms should be possible to optimize behaviour for such machines a) implementations may stack and pop internal width information (just as JALR use a convention to stack and pop return addresses). This would be based on a convention or pairing store/loads. A possible convention is stack width with a whole register store that writes based on r2 , the stack pointer. b) Another approach implementations could use is store width with address of store, so that load from that address will be associated with the corresponding internal width format. c) an implementation could always load with the internal format that is least disruptive for its ELEN range. Or dynamically change the default given the recently executed whole register stores. d) a similar tracking that a microarch performs over a loop: either it decodes ahead and determines the future usage width for the register and stores according to that format or profiles the use during the last use in the loop and applies the appropriate internal format from that. This latter is an optimization that potentially benefits multi in-register order machines. It is questionable that this tracking will provide value as subsequent use of a register with a different EEW than written is relatively rare. (but significantly frequent and, as we decided, impacting to an explicit SLEN<VLEN). As with JALR refinement of conventions and recommended hardware response can evolve. 1. the designation has diminished value for multiple register load with a single width hint. When a group of registers is loaded, the hint can only provide the width for one of them. Forcing all registers into a single width may optimize one of the register’s subsequent use at the expense of the (up to 7) other registers. 2. The typical WR use case is for autonomous/independent spill and fill that are unaware of subsequent use. In the envisioned typical use, called routines or interrupt routines will spill and fill registers as needed to perform the requested function. The internal format information is lost over such a process, potentially causing subsequent performance degradation Whereas there is the potential for improved code performance, these width hints may actually yield to poorer performance in dynamic use situations. 3. As a hint, hardware need only mask out all width designations and perform the same action regardless of width hint. However, providing the hint itself will be a causes of confusion as even minimal vector implementations will need to understand the SLEN<VLEN issues to assertain that they have no need to consider the hint. Similarly, software exception handlers and even vector routines will need substantial analysis to determine an optimal hint for a given target class (that may not be relevant to specific machines in that class as described above). Exception handlers especially will be challenged on the optimal use as no mechanism exists to access the current internal format or even last written EEW for any set of registers. 4. hint can leads to lower performance for machines not needing them (a consequence of 5). A machine design may (erroneously) use the designated hint width for a WR load potentially increasing memory side activity (for example performing byte transfers when word or cache line would be appropriate and more performant). I am sure there will be other examples. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/issues/503*issuecomment-655526810__;Iw!!EHscmS1ygiU1lA!QOT0KXhNYi1sSX8C4kKZy3avkqu4ThxHcZyxze5m1R7FfD7lI-4lRX0w3Djox-g$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKXXKKCKZ746U7OYF4DFUXDR2RZMBANCNFSM4OJRVZUA__;!!EHscmS1ygiU1lA!QOT0KXhNYi1sSX8C4kKZy3avkqu4ThxHcZyxze5m1R7FfD7lI-4lRX0wfFj19XA$>.

David-Horner · 2020-07-08T17:22:33Z

Your comments introduced renumbering to the expanded points.

As far as I can tell, there's absolutely no loss of performance. Ever. Can you suggest a case where there's a loss?

From item 6 detail:

hint can leads to lower performance for machines not needing them (a consequence of 5).
A machine design may (erroneously) use the designated hint width for a WR load potentially increasing memory side activity (for example performing byte transfers when word or cache line would be appropriate and more performant).

Your compelling argument is

The performance sensitive cases are for spill/fill in the same function and the compiler knows how the register is being used. In complex functions, spill/fill performance matters in a number of codes that are currently heavily used for us. I expect a factor greater than 2x loss for some functions without this.

This is immediate application with significant anticipated benefit.

As for "better" alternatives, tracking branch behaviour was once "too complex" and did not give a significant return, such that branch hints were the preferred method. Times change, RV architecture in particular is designed to be "forever"

I expect the method to apply these hints will evolve as calling conventions and register allocation conventions "improve".

So, although I would now agree with providing the hints, I believe at least one should be reserved for when the compiler cannot make an informed choice, such as in interrupt routines.
My choice would be as previously suggested, e1024.

billhuffman · 2020-07-08T19:36:35Z

David, Comments interleaved. On 7/8/20 10:22 AM, David-Horner wrote: EXTERNAL MAIL Your comments introduced renumbering to the expanded points. As far as I can tell, there's absolutely no loss of performance. Ever. Can you suggest a case where there's a loss? From item 6 detail: hint can leads to lower performance for machines not needing them (a consequence of 5). A machine design may (erroneously) use the designated hint width for a WR load potentially increasing memory side activity (for example performing byte transfers when word or cache line would be appropriate and more performant). I did read that. Of course some hints, such as branch predictions, can cost performance. I don't understand any mechanism by which this one can. Since this is stride-1, memory activity is precisely the same regardless of element width. For implementations that don't do any "interesting" byte arrangements, all loads are the same, with or without the hint. For implementations that do "interesting" byte arrangements, they do loads of different widths already. This hint tells them which of those loads to do. All should take the same time with or without a hint. So maybe you can be specific about what performance loss you had in mind. Your compelling argument is The performance sensitive cases are for spill/fill in the same function and the compiler knows how the register is being used. In complex functions, spill/fill performance matters in a number of codes that are currently heavily used for us. I expect a factor greater than 2x loss for some functions without this. This is immediate application with significant anticipated benefit. As for "better" alternatives, tracking branch behaviour was once "too complex" and did not give a significant return, such that branch hints were the preferred method. Times change, RV architecture in particular is designed to be "forever" I expect the method to apply these hints will evolve as calling conventions and register allocation conventions "improve". So, although I would now agree with providing the hints, I believe at least one should be reserved for when the compiler cannot make an informed choice, such as in interrupt routines. My choice would be as previously suggested, e1024. I agree. I would like to see one used when the compiler doesn't know. And e1024 seems reasonable. In most codes it won't ever happen. I would add a code to the store because I'd like to differentiate the store where the compiler knows how to set the load element size from the store where it is not expected to know. Same reason we use a jump rather than a branch on equal of x0 and x0. Bill — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/issues/503*issuecomment-655651506__;Iw!!EHscmS1ygiU1lA!TcZ41t8rWxRMm51JrDzHya1M2GPJDjX9GVAFJKuyrGNE_In1-pFeL2_h8QSFsIw$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKXXKKGHIDXSOOINAVTVN5LR2STOPANCNFSM4OJRVZUA__;!!EHscmS1ygiU1lA!TcZ41t8rWxRMm51JrDzHya1M2GPJDjX9GVAFJKuyrGNE_In1-pFeL2_hu1L98wA$>.

kasanovic · 2020-07-10T14:48:09Z

I think this is resolved in favor of retaining the EEW hint I just added #529, where implementations would be simpler if EEW worked as in other load/stores and implied an alignment constraint, making use of e1024 for "unknown" problematic. Krste

>>>> On Wed, 08 Jul 2020 12:36:50 -0700, billhuffman ***@***.***> said:

| David, | Comments interleaved. | On 7/8/20 10:22 AM, David-Horner wrote: | EXTERNAL MAIL | Your comments introduced renumbering to the expanded points. | As far as I can tell, there's absolutely no loss of performance. Ever. Can you suggest a case where there's a loss? | From item 6 detail: | hint can leads to lower performance for machines not needing them (a consequence of 5). | A machine design may (erroneously) use the designated hint width for a WR load potentially increasing memory side activity (for example performing byte transfers | when word or cache line would be appropriate and more performant). | I did read that. Of course some hints, such as branch predictions, can cost performance. I don't understand any mechanism by which this one can. Since this is | stride-1, memory activity is precisely the same regardless of element width. For implementations that don't do any "interesting" byte arrangements, all loads are the | same, with or without the hint. For implementations that do "interesting" byte arrangements, they do loads of different widths already. This hint tells them which of | those loads to do. All should take the same time with or without a hint. | So maybe you can be specific about what performance loss you had in mind. | Your compelling argument is | The performance sensitive cases are for spill/fill in the same function and the compiler knows how the register is being used. In complex functions, spill/fill | performance matters in a number of codes that are currently heavily used for us. I expect a factor greater than 2x loss for some functions without this. | This is immediate application with significant anticipated benefit. | As for "better" alternatives, tracking branch behaviour was once "too complex" and did not give a significant return, such that branch hints were the preferred | method. Times change, RV architecture in particular is designed to be "forever" | I expect the method to apply these hints will evolve as calling conventions and register allocation conventions "improve". | So, although I would now agree with providing the hints, I believe at least one should be reserved for when the compiler cannot make an informed choice, such as in | interrupt routines. | My choice would be as previously suggested, e1024. | I agree. I would like to see one used when the compiler doesn't know. And e1024 seems reasonable. In most codes it won't ever happen. | I would add a code to the store because I'd like to differentiate the store where the compiler knows how to set the load element size from the store where it is not | expected to know. Same reason we use a jump rather than a branch on equal of x0 and x0. | Bill | — | You are receiving this because you authored the thread. | Reply to this email directly, view it on GitHub | <https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/issues/503*issuecomment-655651506__;Iw!!EHscmS1ygiU1lA!TcZ41t8rWxRMm51JrDzHya1M2GPJDjX9GVAFJKuyrGNE_In1-pFeL2_h8QSFsIw$> | ;, or unsubscribe | <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKXXKKGHIDXSOOINAVTVN5LR2STOPANCNFSM4OJRVZUA__;!!EHscmS1ygiU1lA!TcZ41t8rWxRMm51JrDzHya1M2GPJDjX9GVAFJKuyrGNE_In1-pFeL2_hu1L98wA$> | ;. | — | You are receiving this because you modified the open/close state. | Reply to this email directly, view it on GitHub, or unsubscribe.*

David-Horner mentioned this issue Jun 26, 2020

provide section on HINTs #505

Open

kasanovic added the Resolve for v1.0 To be resolved for v1.0 draft label Jun 28, 2020

kasanovic added a commit that referenced this issue Jul 3, 2020

Added description on HINTS on register move instructons (vmv and vm<n…

2144559

…>r). Discussed in issue #503.

kasanovic closed this as completed in 20f673c Jul 3, 2020

David-Horner mentioned this issue Jul 11, 2020

Element width in whole register move load/stores affects misalignment exceptions? #529

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Return of Cast Operations #503

The Return of Cast Operations #503

billhuffman commented Jun 26, 2020

David-Horner commented Jun 26, 2020 via email

billhuffman commented Jun 26, 2020 via email

billhuffman commented Jun 26, 2020 via email •

edited

Loading

billhuffman commented Jun 26, 2020

David-Horner commented Jun 26, 2020

billhuffman commented Jun 26, 2020 via email

David-Horner commented Jun 26, 2020 via email

billhuffman commented Jun 26, 2020 via email

kasanovic commented Jul 3, 2020

kasanovic commented Jul 3, 2020

David-Horner commented Jul 8, 2020

billhuffman commented Jul 8, 2020 via email

David-Horner commented Jul 8, 2020

billhuffman commented Jul 8, 2020 via email

kasanovic commented Jul 10, 2020 via email

The Return of Cast Operations #503

The Return of Cast Operations #503

Comments

billhuffman commented Jun 26, 2020

David-Horner commented Jun 26, 2020 via email

billhuffman commented Jun 26, 2020 via email

billhuffman commented Jun 26, 2020 via email • edited Loading

billhuffman commented Jun 26, 2020

David-Horner commented Jun 26, 2020

billhuffman commented Jun 26, 2020 via email

David-Horner commented Jun 26, 2020 via email

billhuffman commented Jun 26, 2020 via email

kasanovic commented Jul 3, 2020

kasanovic commented Jul 3, 2020

David-Horner commented Jul 8, 2020

billhuffman commented Jul 8, 2020 via email

David-Horner commented Jul 8, 2020

billhuffman commented Jul 8, 2020 via email

kasanovic commented Jul 10, 2020 via email

billhuffman commented Jun 26, 2020 via email •

edited

Loading