Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#3700: Add IR for AArch64's SIMD LD1/ST1 instructions #3710

Merged
merged 5 commits into from
Jul 3, 2019

Conversation

AssadHashmi
Copy link
Contributor

These are the multiple single element structure load/stores
from/to one vector register variants:

ST1 { <Vt>.<T> }, [<Xn|SP>]
LD1 { <Vt>.<T> }, [<Xn|SP>]

These are the multiple single element structure load/stores
from/to one vector register variants:
ST1 { <Vt>.<T> }, [<Xn|SP>]
LD1 { <Vt>.<T> }, [<Xn|SP>]
@AssadHashmi
Copy link
Contributor Author

run arm tests

@AssadHashmi AssadHashmi requested review from fhahn and egrimley June 28, 2019 11:15
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some style nits. I did not look in the manual to check the encoding bits. Are there tests, either automated or manual pre-commit, to ensure this is the right encoding by comparing to some other decoder?


/**
* Creates an Advanced SIMD (NEON) LD1 instruction to load multiple
* single element structures to one vector register, e.g. LD1 {V0.4H},[X0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing trailing .


/**
* Creates an Advanced SIMD (NEON) ST1 instruction to store multiple
* single element structures from one vector register, e.g. ST1 {V1.2S},[X1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing trailing .

/* TODO: Remaining advanced SIMD (NEON) memory instructions:
#define INSTR_CREATE_ld2/3/4_multi_2/3/4()
#define INSTR_CREATE_ld1/2/3/4_single()
and st1 equivalents including post-index variants */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Add the issue number, so TODO i#2626: .

Also we prefer */ on its own line.

@fhahn
Copy link
Contributor

fhahn commented Jun 28, 2019

Great, thanks Assad! Did you use the scripts to auto-generate them?

INSTR_CREATE_st1_multi_<n>() where <n> is 1, 2, 3 or 4

ST1 { <Vt>.<T> }, [<Xn|SP>]
ST1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only the 1 operand version is tested, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Hopefully more will follow as time allows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...good spot. Perhaps I should remove those comments as they could be misleading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. The header comments are useful placeholders and each test call is explicitly commented to make it clear it's the 1 operand version being run.

INSTR_CREATE_ld1_multi_<n>() where <n> is 1, 2, 3 or 4

LD1 { <Vt>.<T> }, [<Xn|SP>]
LD1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only the 1 operand version is tested, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Hopefully more will follow as time allows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...good spot. Perhaps I should remove those comments as they could be misleading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. The header comments are useful placeholders and each test call is explicitly commented to make it clear it's the 1 operand version being run.

@@ -756,7 +756,7 @@ x0110111xxxxxxxxxxxxxxxxxxxxxxxx tbnz tbz
0x001100000000000010xxxxxxxxxxxx st1 memvm : vmsz vt0 vt1 vt2 vt3
0x001100000000000100xxxxxxxxxxxx st3 memvm : vmsz vt0 vt1 vt2
0x001100000000000110xxxxxxxxxxxx st1 memvm : vmsz vt0 vt1 vt2
0x001100000000000111xxxxxxxxxxxx st1 memvm : vmsz vt0
0x001100000000000111xxxxxxxxxxxx st1 memvm : vt0 vmsz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we flip the position of the vmsz operand here to be more in line with the XXX_sz operands, which come last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Is there a reason you did not change it for ld1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Is there a reason you did not change it for ld1?

ld1 doesn't need to change as its fields are already correctly positioned w.r.t. encoding string:

0x001100010000000111xxxxxxxxxxxx  ld1    vt0 : memvm vmsz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

@AssadHashmi
Copy link
Contributor Author

Just some style nits. I did not look in the manual to check the encoding bits. Are there tests, either automated or manual pre-commit, to ensure this is the right encoding by comparing to some other decoder?

In this case I've tested the instruction macros in a client which copies vectors between buffers. It's an application specific client. At some point I'll port it to a client targeted at testing which is safe to be upstreamed as part of the test suite. It can be the basis of further AArch64 SIMD work.

@AssadHashmi
Copy link
Contributor Author

Great, thanks Assad! Did you use the scripts to auto-generate them?

No, unfortunately. There's still some issues and work required and resourcing is . . . variable ;-)

@@ -74,7 +74,7 @@
0b9f13ff : add wzr, wzr, wzr, asr #4 : add %wzr %wzr asr $0x04 -> %wzr
0c0007ff : st4 {v31.4h, v0.4h, v1.4h, v2.4h}, [sp]: st4 $0x01 %d31 %d0 %d1 %d2 -> (%sp)[32byte]
0c0067ff : st1 {v31.4h, v0.4h, v1.4h}, [sp]: st1 $0x01 %d31 %d0 %d1 -> (%sp)[24byte]
0c0077ff : st1 {v31.4h}, [sp] : st1 $0x01 %d31 -> (%sp)[8byte]
0c0077ff : st1 {v31.4h}, [sp] : st1 %d31 $0x01 -> (%sp)[8byte]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add tests for all sizes? same for ld1.

@@ -335,6 +335,12 @@ reg_is_gpr(reg_id_t reg)
return (DR_REG_X0 <= reg && reg <= DR_REG_WSP);
}

bool
reg_is_simd(reg_id_t reg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added reg_is_simd() in order for AArch64 to conform to the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -756,7 +756,7 @@ x0110111xxxxxxxxxxxxxxxxxxxxxxxx tbnz tbz
0x001100000000000010xxxxxxxxxxxx st1 memvm : vmsz vt0 vt1 vt2 vt3
0x001100000000000100xxxxxxxxxxxx st3 memvm : vmsz vt0 vt1 vt2
0x001100000000000110xxxxxxxxxxxx st1 memvm : vmsz vt0 vt1 vt2
0x001100000000000111xxxxxxxxxxxx st1 memvm : vmsz vt0
0x001100000000000111xxxxxxxxxxxx st1 memvm : vt0 vmsz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Is there a reason you did not change it for ld1?

@@ -756,7 +756,7 @@ x0110111xxxxxxxxxxxxxxxxxxxxxxxx tbnz tbz
0x001100000000000010xxxxxxxxxxxx st1 memvm : vmsz vt0 vt1 vt2 vt3
0x001100000000000100xxxxxxxxxxxx st3 memvm : vmsz vt0 vt1 vt2
0x001100000000000110xxxxxxxxxxxx st1 memvm : vmsz vt0 vt1 vt2
0x001100000000000111xxxxxxxxxxxx st1 memvm : vmsz vt0
0x001100000000000111xxxxxxxxxxxx st1 memvm : vt0 vmsz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

@@ -335,6 +335,12 @@ reg_is_gpr(reg_id_t reg)
return (DR_REG_X0 <= reg && reg <= DR_REG_WSP);
}

bool
reg_is_simd(reg_id_t reg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool
reg_is_simd(reg_id_t reg)
{
return (DR_REG_Q0 <= reg && reg <= DR_REG_B31);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised that S, H and B registers are included here. I thought the only vector registers used by instructions are the 128 bit Q, as in v8.2d, v8.4s, v8.8h and 64 bit D as in v8.2s, v8.4h. Am I missing something and are there encodings that use S, H or B registers as vector register?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My (simple) understanding is that all reg_is_simd() is saying is whether a given register is SIMD or not, and B/H/S/D are components of a SIMD Q register. That's independent of any encoding issues. Have I missed something?

@AssadHashmi AssadHashmi merged commit bb78eb5 into master Jul 3, 2019
@derekbruening
Copy link
Contributor

Remember the 'Fixes #NNN' or 'Issue: #NNN' for future commits: this one had neither and so has no link to the issue.

Can #3700 be closed now?

@AssadHashmi
Copy link
Contributor Author

AssadHashmi commented Jul 5, 2019

Remember the 'Fixes #NNN' or 'Issue: #NNN' for future commits: this one had neither and so has no link to the issue.

Gah! My bad.

Can #3700 be closed now?

Not yet. There are still variants of the SIMD ld/st which need to be implemented and the test coverage increased in general.

@AssadHashmi AssadHashmi deleted the i3700-aarch64-simd-ld1-st1 branch July 5, 2019 16:56
@derekbruening
Copy link
Contributor

derekbruening commented Jul 5, 2019

Remember the 'Fixes #NNN' or 'Issue: #NNN' for future commits: this one had neither and so has no link to the issue.

Gah! My bad.

If you've run the devsetup script, it sets a commit message template which makes it easier to remember by pre-populating each commit message with the suggested form.

egrimley-arm pushed a commit that referenced this pull request Nov 26, 2024
Minor conflict in core/arch/aarch64/codec.txt due to PR:
#3710
egrimley-arm pushed a commit that referenced this pull request Nov 26, 2024
This is an internal implementation of a feature we would like to
enable in upstream DynamoRIO. IMO proposing such a change upstream
is more efficient if a working example is available. Once this is
merged an IP clean proposal patch can be created which represents
the interface we would like.

This patch required an upstream change:
#3710

Change-Id: Ifbb14e21083dc68d50609401032ddc2d40c9f7fc
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/193795
Reviewed-by: Al Grant <al.grant@arm.com>
Tested-by: mdc-bbot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants