Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio: Volume: Add HiFi5 implementation. #9419

Merged
merged 2 commits into from
Oct 31, 2024

Conversation

singalsu
Copy link
Collaborator

Add HiFi5 implementation of volume functions, compared with HiFi3 version, can reduce about 28% cycles.

@singalsu
Copy link
Collaborator Author

singalsu commented Aug 29, 2024

This is a new merge conflicts fixed version of Andrula's #8900. I've run this successfully in testbench HiFi5 environment with 48 kHz and 44.1 kHz rates with s16 and s32 formats. Though testbench is currently IPC3, so the test didn't exercise IPC4 code parts.

Comment on lines 651 to 654
const uint32_t byte_align = 16;

/*There is no limit for frame number, so both source and sink set it to be 1*/
const uint32_t frame_align_req = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this come from Kconfig ?
i.e. The selection of HiFi3, HIFI4, HIFI5, AVX etc would set a generic CONFIG_FRAME_BYTE_ALIGN macro that could be used everywhere ?

* xtensa intrinsics ask for 8-byte aligned. 5.1 format SSE audio
* requires 16-byte aligned.
*/
const uint32_t byte_align = audio_stream_get_channels(source) == 6 ? 16 : 8;
const uint32_t byte_align = audio_stream_get_channels(source) == 6 ?
SOF_FRAME_BYTE_ALIGN_6CH : SOF_FRAME_BYTE_ALIGN;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood I don't remember from where this align requirement for 6ch comes from. There was no discussion that I could find for it in #5266. Is it specific to peakvolume or generic for loading/storing the format in 64 bit or 128 bit chunks. If internal to peakvolume, then this SOF_FRAME_BYTE_ALIGN_6CH in common.h would make no sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6ch is 5.1 via display port.

# else
# define SOF_MAX_XCHAL_HIFI NONE
# endif
#endif

#if SOF_MAX_XCHAL_HIFI == NONE
# ifndef SOF_FRAME_BYTE_ALIGN
# define SOF_FRAME_BYTE_ALIGN 4
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or 1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is bytes we should state in the comments next to the definition. Should never be 1 if bytes.

Copy link
Collaborator Author

@singalsu singalsu Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one is the default if it's not set to be free to provide any number of frames, but it makes no sense as align constraint. Also Above I think the #if SOF_MAX_XCHAL_HIFI == NONE could be left out as redundant. These could be before this a macros section for SSE or AVX specific definitions.

# elif XCHAL_HAVE_HIFI4
# define SOF_MAX_XCHAL_HIFI 4
# elif XCHAL_HAVE_HIFI3
# define SOF_FRAME_BYTE_ALIGN 8
# define SOF_FRAME_BYTE_ALIGN_6CH 16
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this 6ch specific definition into volume, I can't see generally how 6ch alignement would be a special case. Every word length, channels count has some number of frames that is not matching align with 8 or 16 bytes / 64 or 128 bits.

/* Both source and sink buffer in HiFi5 processing version,
* xtensa intrinsics ask for 16-byte aligned.
*
* Both source and sink buffer in HiFi 3 or HiFi4 processing version,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can converge one way or another - with or without a space in "HiFi.N"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, changing to without space.

{
int32_t i;

/* using for loop instead of memcpy_s(), because for loop costs less cycles */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop can be replaced with a single memcpy() and you've found out that the loop is faster?.. Interesting, then we have a problem with our memcpy()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a finding by Andrula that I haven't verified.

cd->vol[i] = cd->volume[i];
cd->vol[i + channels_count * 1] = cd->volume[i];
cd->vol[i + channels_count * 2] = cd->volume[i];
cd->vol[i + channels_count * 3] = cd->volume[i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although it looks like it wouldn't be a single memcpy(), but a loop of them. So you actually mean that 4 assignments are faster than a memcpy()? That would be logical

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the comment to avoid it to confuse. It's most of use cases just two channels. Especially with recommended memcpy_s() there is more overhead.

const int inc = sizeof(ae_int32x4);
int samples = channels_count * frames;

/** to ensure the adsress is 16-byte aligned and avoid risk of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really supposed to be a doxygen comment? More below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next version

m = audio_stream_samples_without_wrap_s32(sink, out);
n = MIN(m, n);
inu = AE_LA128_PP(in);
/* process four continuous samples once */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"once?" Did you mean "per iteration?"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor notes, looks good!

@@ -112,6 +112,9 @@ struct sof_ipc_ctrl_value_chan;
#define VOL_S16_SAMPLES_TO_BYTES(s) ((s) << 1)
#define VOL_S32_SAMPLES_TO_BYTES(s) ((s) << 2)

/** \brief PCM samples align requirement for HiFi3 an Hifi4 for volume component */
#define VOLUME_HIFI3_HIFI4_FRAME_BYTE_ALIGN_6CH 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment says "PCM samples" which is a bit misleading as this is alignment in bytes (as tge define name says, FRAME_BYTE).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yes, it looks quite confusing.

const int inc = sizeof(ae_int32x4);
int samples = channels_count * frames;

/* to ensure the adsress is 16-byte aligned and avoid risk of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: adsress

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added commit to fix same typos in HiFi3 and HiFi4 versions.

andrula-song and others added 2 commits October 29, 2024 15:47
Add HiFi5 implementation of volume functions, compared with
HiFi3 version, can reduce about 28% cycles.

Signed-off-by: Andrula Song <andrula.song@intel.com>
Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Changed adsress -> address, also the comments are edited to avoid
to be mistaken as Doxygen.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
@singalsu singalsu requested a review from lyakh October 30, 2024 08:45
@kv2019i
Copy link
Collaborator

kv2019i commented Oct 31, 2024

sof-docs fail and Intel LNL fails all known and tracked in https://github.com/thesofproject/sof/issues?q=is%3Aissue+is%3Aopen+label%3ACI

@kv2019i kv2019i merged commit 2f4efa5 into thesofproject:main Oct 31, 2024
42 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants