-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio: Volume: Add HiFi5 implementation. #9419
Conversation
This is a new merge conflicts fixed version of Andrula's #8900. I've run this successfully in testbench HiFi5 environment with 48 kHz and 44.1 kHz rates with s16 and s32 formats. Though testbench is currently IPC3, so the test didn't exercise IPC4 code parts. |
src/audio/volume/volume.c
Outdated
const uint32_t byte_align = 16; | ||
|
||
/*There is no limit for frame number, so both source and sink set it to be 1*/ | ||
const uint32_t frame_align_req = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this come from Kconfig ?
i.e. The selection of HiFi3, HIFI4, HIFI5, AVX etc would set a generic CONFIG_FRAME_BYTE_ALIGN
macro that could be used everywhere ?
9e4ec96
to
394c08d
Compare
src/audio/volume/volume.c
Outdated
* xtensa intrinsics ask for 8-byte aligned. 5.1 format SSE audio | ||
* requires 16-byte aligned. | ||
*/ | ||
const uint32_t byte_align = audio_stream_get_channels(source) == 6 ? 16 : 8; | ||
const uint32_t byte_align = audio_stream_get_channels(source) == 6 ? | ||
SOF_FRAME_BYTE_ALIGN_6CH : SOF_FRAME_BYTE_ALIGN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgirdwood I don't remember from where this align requirement for 6ch comes from. There was no discussion that I could find for it in #5266. Is it specific to peakvolume or generic for loading/storing the format in 64 bit or 128 bit chunks. If internal to peakvolume, then this SOF_FRAME_BYTE_ALIGN_6CH in common.h would make no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6ch is 5.1 via display port.
394c08d
to
69e2d33
Compare
src/include/sof/common.h
Outdated
# else | ||
# define SOF_MAX_XCHAL_HIFI NONE | ||
# endif | ||
#endif | ||
|
||
#if SOF_MAX_XCHAL_HIFI == NONE | ||
# ifndef SOF_FRAME_BYTE_ALIGN | ||
# define SOF_FRAME_BYTE_ALIGN 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is bytes we should state in the comments next to the definition. Should never be 1 if bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one is the default if it's not set to be free to provide any number of frames, but it makes no sense as align constraint. Also Above I think the #if SOF_MAX_XCHAL_HIFI == NONE could be left out as redundant. These could be before this a macros section for SSE or AVX specific definitions.
src/include/sof/common.h
Outdated
# elif XCHAL_HAVE_HIFI4 | ||
# define SOF_MAX_XCHAL_HIFI 4 | ||
# elif XCHAL_HAVE_HIFI3 | ||
# define SOF_FRAME_BYTE_ALIGN 8 | ||
# define SOF_FRAME_BYTE_ALIGN_6CH 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move this 6ch specific definition into volume, I can't see generally how 6ch alignement would be a special case. Every word length, channels count has some number of frames that is not matching align with 8 or 16 bytes / 64 or 128 bits.
src/audio/volume/volume.c
Outdated
/* Both source and sink buffer in HiFi5 processing version, | ||
* xtensa intrinsics ask for 16-byte aligned. | ||
* | ||
* Both source and sink buffer in HiFi 3 or HiFi4 processing version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can converge one way or another - with or without a space in "HiFi.N"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, changing to without space.
src/audio/volume/volume_hifi5.c
Outdated
{ | ||
int32_t i; | ||
|
||
/* using for loop instead of memcpy_s(), because for loop costs less cycles */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this loop can be replaced with a single memcpy()
and you've found out that the loop is faster?.. Interesting, then we have a problem with our memcpy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a finding by Andrula that I haven't verified.
cd->vol[i] = cd->volume[i]; | ||
cd->vol[i + channels_count * 1] = cd->volume[i]; | ||
cd->vol[i + channels_count * 2] = cd->volume[i]; | ||
cd->vol[i + channels_count * 3] = cd->volume[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although it looks like it wouldn't be a single memcpy()
, but a loop of them. So you actually mean that 4 assignments are faster than a memcpy()
? That would be logical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the comment to avoid it to confuse. It's most of use cases just two channels. Especially with recommended memcpy_s() there is more overhead.
src/audio/volume/volume_hifi5.c
Outdated
const int inc = sizeof(ae_int32x4); | ||
int samples = channels_count * frames; | ||
|
||
/** to ensure the adsress is 16-byte aligned and avoid risk of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really supposed to be a doxygen comment? More below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next version
src/audio/volume/volume_hifi5.c
Outdated
m = audio_stream_samples_without_wrap_s32(sink, out); | ||
n = MIN(m, n); | ||
inu = AE_LA128_PP(in); | ||
/* process four continuous samples once */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"once?" Did you mean "per iteration?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
69e2d33
to
07df9a2
Compare
07df9a2
to
5b345b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor notes, looks good!
@@ -112,6 +112,9 @@ struct sof_ipc_ctrl_value_chan; | |||
#define VOL_S16_SAMPLES_TO_BYTES(s) ((s) << 1) | |||
#define VOL_S32_SAMPLES_TO_BYTES(s) ((s) << 2) | |||
|
|||
/** \brief PCM samples align requirement for HiFi3 an Hifi4 for volume component */ | |||
#define VOLUME_HIFI3_HIFI4_FRAME_BYTE_ALIGN_6CH 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says "PCM samples" which is a bit misleading as this is alignment in bytes (as tge define name says, FRAME_BYTE).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops yes, it looks quite confusing.
src/audio/volume/volume_hifi5.c
Outdated
const int inc = sizeof(ae_int32x4); | ||
int samples = channels_count * frames; | ||
|
||
/* to ensure the adsress is 16-byte aligned and avoid risk of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: adsress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added commit to fix same typos in HiFi3 and HiFi4 versions.
Add HiFi5 implementation of volume functions, compared with HiFi3 version, can reduce about 28% cycles. Signed-off-by: Andrula Song <andrula.song@intel.com> Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Changed adsress -> address, also the comments are edited to avoid to be mistaken as Doxygen. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
5b345b7
to
1050ffc
Compare
sof-docs fail and Intel LNL fails all known and tracked in https://github.com/thesofproject/sof/issues?q=is%3Aissue+is%3Aopen+label%3ACI |
Add HiFi5 implementation of volume functions, compared with HiFi3 version, can reduce about 28% cycles.