Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC:topology: bug fix oops cause by pointer dereference. #72

Closed
wants to merge 1 commit into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented Aug 10, 2018

  1. check the new widgets allocated result.
  2. do the w->kcontrols allocation in sub-function.
    free it when error happend in the sub-function.

the two steps can avoid the oops caused by pointer dereference.

Signed-off-by: Wu Zhigang zhigang.wu@linux.intel.com

@ghost
Copy link
Author

ghost commented Aug 10, 2018

when the tplg setting is incorrect in some parameters, the snd_kcontrol{} instance will not be allocated.
but the snd_kcontrol_news{} will be allocated. the snd_kcontrol_news{} is allocated first. one widget can hold this kind of multi instances. when the parameters are not correct in widget or snd_kcontrol_new{}. the snd_kcontrol{} will not be allocated in snd_soc_dapm_new_widgets() function.
for example: the widget's id is not set correct. the get/put/info ID is also not correct.

when we do the module reload, the panic will be hit.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, needs a small comment update.

int j;

/* in some case:tplg incorrect configuration.
* the kcontrol{} is not allocated successfully
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change both comments to
/* kcontrol not guaranteed to be created so validate */

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will update it. thanks

@plbossart
Copy link
Member

plbossart commented Aug 10, 2018

Maybe a better option would be to report an error in the first place when the widget parameters are not correct, e.g. by something like the (untested) patch below. dapm/topology don't use return values that are available.
Also make sure that your patch subjects follow the usual ALSA/SOC conventions, e.g.
ASoC: topology: avoid oops on dereference during topology free

diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index 6d54128e44b4..d23234a4406c 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -3048,6 +3048,7 @@ int snd_soc_dapm_new_widgets(struct snd_soc_card *card)
 {
        struct snd_soc_dapm_widget *w;
        unsigned int val;
+       int ret = -EINVAL;
 
        mutex_lock_nested(&card->dapm_mutex, SND_SOC_DAPM_CLASS_INIT);
 
@@ -3070,23 +3071,26 @@ int snd_soc_dapm_new_widgets(struct snd_soc_card *card)
                case snd_soc_dapm_switch:
                case snd_soc_dapm_mixer:
                case snd_soc_dapm_mixer_named_ctl:
-                       dapm_new_mixer(w);
+                       ret = dapm_new_mixer(w);
                        break;
                case snd_soc_dapm_mux:
                case snd_soc_dapm_demux:
-                       dapm_new_mux(w);
+                       ret = dapm_new_mux(w);
                        break;
                case snd_soc_dapm_pga:
                case snd_soc_dapm_out_drv:
-                       dapm_new_pga(w);
+                       ret = dapm_new_pga(w);
                        break;
                case snd_soc_dapm_dai_link:
-                       dapm_new_dai_link(w);
+                       ret = dapm_new_dai_link(w);
                        break;
                default:
                        break;
                }
 
+               if (ret < 0)
+                       return ret;
+
                /* Read the initial power state from the device */
                if (w->reg >= 0) {
                        soc_dapm_read(w->dapm, w->reg, &val);
diff --git a/sound/soc/soc-topology.c b/sound/soc/soc-topology.c
index ac3bbc142432..3d89189c1c29 100644
--- a/sound/soc/soc-topology.c
+++ b/sound/soc/soc-topology.c
@@ -1719,9 +1719,11 @@ static int soc_tplg_dapm_complete(struct soc_tplg *tplg)
        }
 
        ret = snd_soc_dapm_new_widgets(card);
-       if (ret < 0)
+       if (ret < 0) {
                dev_err(tplg->dev, "ASoC: failed to create new widgets %d\n",
                        ret);
+               return ret;
+       }
 
        return 0;
 }

@ghost
Copy link
Author

ghost commented Aug 11, 2018

This solution seems better than mine. Can you send this PR? I will close mine.
But I have two questions:

  1. when one widget parameters is incorrect, stop loading tplg directly and cause the firmware load failure. Is it ok? I am sorry I am not sure of this.
  2. when we return the directly. should we care about the widgets we have already malloced successfully? Memory leak?

@ghost
Copy link
Author

ghost commented Aug 11, 2018

Can we just print the error info in the snd_soc_dapm_new_widgets() function?
This will not block our system's work at present.
If in future some guys make a mistake in the tplg setting. this will not block us at least.

@ghost
Copy link
Author

ghost commented Aug 11, 2018

If you agree not return directly. we had better stay in snd_soc_dapm_new_widgets() to finish all of the widget configuration. even there is an error in several widgets. we just print the error info in the function when we detect it to remind the developer.

@ghost ghost changed the title bug fix:module reload will cause panic ASoC: topology: avoid oops on dereference during topology free Aug 13, 2018
@plbossart
Copy link
Member

@zhigang-wu I don't have time to test this and I don't have a rt5651 board, can you try and resubmit a proposal. From my perspective, it's better to reject a bad topology upfront, and we need to make sure there is no memory leak on any error handling path. Thanks!

@ghost
Copy link
Author

ghost commented Aug 14, 2018

@plbossart
OK, I will do more test and feedback to you. Thanks!

@ghost
Copy link
Author

ghost commented Aug 14, 2018

@plbossart
I add the code shown below. and got the log shown below.
you can find there are many failure in the widget allocation.
the "id" is the widget->id.
If we do this, I think based on current tplg, our firmware can not work.
this is what i tried on BYT platform.
I will try on multi platform tomorrow.
Thanks!

the code to log:

int snd_soc_dapm_new_widgets(struct snd_soc_card *card)
{
struct snd_soc_dapm_widget *w;
unsigned int val;
int ret = -EINVAL;
mutex_lock_nested(&card->dapm_mutex, SND_SOC_DAPM_CLASS_INIT);

list_for_each_entry(w, &card->widgets, list)
{
	if (w->new)
		continue;

	if (w->num_kcontrols) {
		w->kcontrols = kzalloc(w->num_kcontrols *
					sizeof(struct snd_kcontrol *),
					GFP_KERNEL);
		if (!w->kcontrols) {
			mutex_unlock(&card->dapm_mutex);
			return -ENOMEM;
		}
	}

	switch(w->id) {
	case snd_soc_dapm_switch:
	case snd_soc_dapm_mixer:
	case snd_soc_dapm_mixer_named_ctl:
		ret = dapm_new_mixer(w);
		break;
	case snd_soc_dapm_mux:
	case snd_soc_dapm_demux:
		ret = dapm_new_mux(w);
		break;
	case snd_soc_dapm_pga:
	case snd_soc_dapm_out_drv:
		ret = dapm_new_pga(w);
		break;
	case snd_soc_dapm_dai_link:
		ret = dapm_new_dai_link(w);
		break;
	default:
		break;
	}

	if (ret < 0)
		dev_err(w->dapm->dev, "==wzg:err create widget:ret=%d, id=%d, num-kctl=%d\n",
			ret, w->id, w->num_kcontrols);

this is the log:

ug 11 19:06:22 wzg-byt kernel: [ 13.810223] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=12
Aug 11 19:06:22 wzg-byt kernel: [ 13.810239] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=11
Aug 11 19:06:22 wzg-byt kernel: [ 13.810247] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=11
Aug 11 19:06:22 wzg-byt kernel: [ 13.810255] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=13
Aug 11 19:06:22 wzg-byt kernel: [ 13.810262] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=14
Aug 11 19:06:22 wzg-byt kernel: [ 13.810286] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810294] sof-audio sof-audio: ==wzg:err create widget:ret=-22, id=27
Aug 11 19:06:22 wzg-byt kernel: [ 13.810301] sof-audio sof-audio: ==wzg:err create widget:ret=-22, id=28
Aug 11 19:06:22 wzg-byt kernel: [ 13.810308] sof-audio sof-audio: ==wzg:err create widget:ret=-22, id=27
Aug 11 19:06:22 wzg-byt kernel: [ 13.810315] sof-audio sof-audio: ==wzg:err create widget:ret=-22, id=28
Aug 11 19:06:22 wzg-byt kernel: [ 13.810329] sof-audio sof-audio: ==wzg:err create widget:ret=-22, id=27
Aug 11 19:06:22 wzg-byt kernel: [ 13.810336] sof-audio sof-audio: ==wzg:err create widget:ret=-22, id=28
Aug 11 19:06:22 wzg-byt kernel: [ 13.810342] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810354] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810362] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810370] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810377] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810385] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810393] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810401] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810407] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810414] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810420] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810427] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810433] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810440] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810446] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810453] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810459] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19

@plbossart
Copy link
Member

@zhigang-wu I am not sure I understand your point. Are you saying that with our current solution there were tons of unreported errors? If yes, that's not good and needs to be root-caused. The suggestion that we need to avoid this error handling because it "breaks" our solution is not quite right, it just means our solution is already broken and only works by accident, not by design.

@ghost
Copy link
Author

ghost commented Aug 15, 2018

@plbossart
Yes, our current solution has such errors.
you can check the log i pasted in comments.
for example, this part of log:
ug 11 19:06:22 wzg-byt kernel: [ 13.810223] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=12
Aug 11 19:06:22 wzg-byt kernel: [ 13.810239] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=11
Aug 11 19:06:22 wzg-byt kernel: [ 13.810247] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=11
Aug 11 19:06:22 wzg-byt kernel: [ 13.810255] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=13
Aug 11 19:06:22 wzg-byt kernel: [ 13.810262] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=14
Aug 11 19:06:22 wzg-byt kernel: [ 13.810286] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=19

this means in the bycr_rt5651.c file. in the pre-defined widget table below, these widget is fail to allocate the widget.

static const struct snd_soc_dapm_widget byt_rt5651_widgets[] = {
SND_SOC_DAPM_HP("Headphone", NULL),
SND_SOC_DAPM_MIC("Headset Mic", NULL),
SND_SOC_DAPM_MIC("Internal Mic", NULL),
SND_SOC_DAPM_SPK("Speaker", NULL),
SND_SOC_DAPM_LINE("Line In", NULL),
SND_SOC_DAPM_SUPPLY("Platform Clock", SND_SOC_NOPM, 0, 0,
platform_clock_control, SND_SOC_DAPM_PRE_PMU |
SND_SOC_DAPM_POST_PMD),
};

@ghost
Copy link
Author

ghost commented Aug 15, 2018

@plbossart
for this part of log:

Aug 11 19:06:22 wzg-byt kernel: [ 13.810342] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810354] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810362] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810370] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810377] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810385] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810393] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19
Aug 11 19:06:22 wzg-byt kernel: [ 13.810401] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810407] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810414] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810420] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810427] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810433] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810440] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810446] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810453] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=0
Aug 11 19:06:22 wzg-byt kernel: [ 13.810459] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-22, id=19

this means: in the rt5651.c file. the pre-defined widget in the rt5651_dapm_widgets[] table are failure during the widget allocation.

this is the root cause.

@plbossart
Copy link
Member

I have a different understanding of the word 'root cause', you need to dig deeper. It's unclear to me why standard widgets that are not SOF specific can't be added? It could very well be that you've discovered a problem that we've had forever, but these sort of widgets are pretty standard in all machine drivers...

@ghost
Copy link
Author

ghost commented Aug 15, 2018

@plbossart
because in the snd_soc_dapm_new_widgets() function.
we did not supported the id likes the "snd_soc_dapm_hp" or "snd_soc_dapm_mic"
but in the driver code these widgets are pre-defined already.
I think the elements in these tables means we can support these widgets (so they are listed here!)
but that does not means we must support all of them. we can pick several of them to support.
that is what i understand.
thanks!

@ghost
Copy link
Author

ghost commented Aug 15, 2018

@plbossart
that is good question.
I will add a little more trace code in the snd_soc_dapm_new_widgets() function.
to check the widget's name. then we can confirm where are these error widget info comes from.
if it is from rt5651 module. then we could confirm actually these standard widgets we did not support at present.
thanks!

@ghost
Copy link
Author

ghost commented Aug 15, 2018

@plbossart
this is the log below i got in the snd_soc_dapm_new_widgets() function.
I printed out the widget's name. they are same as in the byt_rt5651_widgets[] table.
they are failed to be created.
the "Headphone" id is "snd_soc_dapm_hp"
the "Headset Mic" id is "snd_soc_dapm_mic"
the "Internal Mic" id is "snd_soc_dapm_mic"
the "Speaker" id is "snd_soc_dapm_spk"
the "Line In" id is "snd_soc_dapm_line"
the "Platform Clock" id is "snd_soc_dapm_supply".

I checked the history of the patch about the byt_rt5651_widgets[] in the bytcr_rt5651.c
I do not understand why we use this table. I did not find the reason to use this table.
(I did not get enough information from the comments in that patch.)
because these "id" are not supported in the snd_soc_dapm_new_widgets() function.
and no other function is for this target.

Aug 11 19:06:21 wzg-byt kernel: [ 11.171477] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=12, num-kctl=0, name=Headphone
Aug 11 19:06:21 wzg-byt kernel: [ 11.171674] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=11, num-kctl=0, name=Headset Mic
Aug 11 19:06:21 wzg-byt kernel: [ 11.171842] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=11, num-kctl=0, name=Internal Mic
Aug 11 19:06:21 wzg-byt kernel: [ 11.172043] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=13, num-kctl=0, name=Speaker
Aug 11 19:06:21 wzg-byt kernel: [ 11.172206] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=14, num-kctl=0, name=Line In
Aug 11 19:06:21 wzg-byt kernel: [ 11.172366] bytcr_rt5651 bytcr_rt5651: ==wzg:err create widget:ret=-22, id=19, num-kctl=0, name=Platform Clock

@plbossart
Copy link
Member

@zhigang-wu This is a red herring (a wrong lead)

The error that you are reporting does not mean that these widgets are useless, there's something else you haven't looked into. -22 means -EINVAL, and that's the value I initialized in the example code. Change it to zero so that the default: case will not report any issues...

These widgets are used in ALL machine drivers, and you can see for yourself that the platform_clock_control() function is invoked when a stream is started/stopped.

@ghost
Copy link
Author

ghost commented Aug 16, 2018

@plbossart
the widgets in byt_rt5651_widgets[] table will be processed in the snd_soc_dapm_new_controls() function before the tplg parsing. Each new allocated widgets will be linked into the widgets link list. during the tplg parsing, some new widgets will be constructed based on the tplg's info. the new allocated widgets also will be linked into the widgets link list. After the tplg parsing, the function snd_soc_dapm_new_widgets() will be called to construct the kcontrol{} instance for all widgets. it will try to allocate the kcontrol{} for each widget linked in the widgets link list.

Yes, the pre-defined widget in byt_rt5651_widgets[] table are useful. but they did not get the kcontrol{} in the snd_soc_dapm_new_widgets() function, for the widget->id is not supported.
I think it can work without kcontrol{} for the widget. you are right, the w->event() will be called during the stream start/stop.

@plbossart
Copy link
Member

plbossart commented Aug 16, 2018

@zhigang-wu please change the example code I provided to this:

int snd_soc_dapm_new_widgets(struct snd_soc_card *card)
{
struct snd_soc_dapm_widget *w;
unsigned int val;
int ret = 0;
mutex_lock_nested(&card->dapm_mutex, SND_SOC_DAPM_CLASS_INIT);

list_for_each_entry(w, &card->widgets, list)
{
	if (w->new)
		continue;

	if (w->num_kcontrols) {
		w->kcontrols = kzalloc(w->num_kcontrols *
					sizeof(struct snd_kcontrol *),
					GFP_KERNEL);
		if (!w->kcontrols) {
			mutex_unlock(&card->dapm_mutex);
			return -ENOMEM;
		}
	}

	switch(w->id) {
	case snd_soc_dapm_switch:
	case snd_soc_dapm_mixer:
	case snd_soc_dapm_mixer_named_ctl:
		ret = dapm_new_mixer(w);
		break;
	case snd_soc_dapm_mux:
	case snd_soc_dapm_demux:
		ret = dapm_new_mux(w);
		break;
	case snd_soc_dapm_pga:
	case snd_soc_dapm_out_drv:
		ret = dapm_new_pga(w);
		break;
	case snd_soc_dapm_dai_link:
		ret = dapm_new_dai_link(w);
		break;
	default:
		break;
	}

	if (ret < 0)
		dev_err(w->dapm->dev, "==wzg:err create widget:ret=%d, id=%d, num-kctl=%d\n",
			ret, w->id, w->num_kcontrols);

@ghost
Copy link
Author

ghost commented Aug 20, 2018

I changed the code like this below, if the w->id is not supported, the ret value will be set with -65536.
from the log, you can find there exists lots of unsupported w->id.

int snd_soc_dapm_new_widgets(struct snd_soc_card *card)
{
struct snd_soc_dapm_widget *w;
unsigned int val;
int ret = 0;

mutex_lock_nested(&card->dapm_mutex, SND_SOC_DAPM_CLASS_INIT);

list_for_each_entry(w, &card->widgets, list)
{
	if (w->new)
		continue;

	if (w->num_kcontrols) {
		w->kcontrols = kzalloc(w->num_kcontrols *
					sizeof(struct snd_kcontrol *),
					GFP_KERNEL);
		if (!w->kcontrols) {
			mutex_unlock(&card->dapm_mutex);
			return -ENOMEM;
		}
	}

	switch(w->id) {
	case snd_soc_dapm_switch:
	case snd_soc_dapm_mixer:
	case snd_soc_dapm_mixer_named_ctl:
		ret = dapm_new_mixer(w);
		break;
	case snd_soc_dapm_mux:
	case snd_soc_dapm_demux:
		ret = dapm_new_mux(w);
		break;
	case snd_soc_dapm_pga:
	case snd_soc_dapm_out_drv:
		ret = dapm_new_pga(w);
		break;
	case snd_soc_dapm_dai_link:
		ret = dapm_new_dai_link(w);
		break;
	default:
		ret = -65535;
		break;
	}

	if (ret < 0)
		dev_err(w->dapm->dev, "==wzg:err create widget:ret=%d, id=%d, num-kctl=%d, name=%s\n",
			ret, w->id, w->num_kcontrols, w->name);

you can find the log it printed:

[ 14.910574] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=28, num-kctl=0, name=ssp0 Rx
[ 14.910730] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=27, num-kctl=0, name=ssp1 Tx
[ 14.910888] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=28, num-kctl=0, name=ssp1 Rx
[ 14.911045] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=27, num-kctl=0, name=ssp2 Tx
[ 14.911202] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=28, num-kctl=0, name=ssp2 Rx
[ 14.911362] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=I2S1 ASRC
[ 14.911528] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=I2S2 ASRC
[ 14.911691] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=STO1 DAC ASRC
[ 14.911859] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=STO2 DAC ASRC
[ 14.912095] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=ADC ASRC
[ 14.912266] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=LDO
[ 14.912423] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=micbias1
[ 14.912587] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=MIC1
[ 14.912760] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=MIC2
[ 14.912930] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=MIC3
[ 14.913391] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=IN1P
[ 14.913556] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=IN2P
[ 14.913712] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=IN2N
[ 14.913867] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=IN3P
[ 14.914024] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=DMIC L1
[ 14.914186] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=0, num-kctl=0, name=DMIC R1
[ 14.914345] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=DMIC CLK
[ 14.914636] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=8, num-kctl=0, name=ADC L
[ 14.914797] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=8, num-kctl=0, name=ADC R
[ 14.914955] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=ADC L Power
[ 14.915123] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=ADC R Power
[ 14.915384] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=Stereo1 Filter
[ 14.915556] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=Stereo2 Filter
[ 14.915812] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=I2S1
[ 14.915992] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=I2S2
[ 14.916306] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=23, num-kctl=0, name=AIF1RX
[ 14.916485] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=24, num-kctl=0, name=AIF1TX
[ 14.916646] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=23, num-kctl=0, name=AIF2RX
[ 14.916806] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=24, num-kctl=0, name=AIF2TX
[ 14.917036] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=Stero1 DAC Power
[ 14.917212] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=Stero2 DAC Power
[ 14.917515] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=9, num-kctl=0, name=DAC L1
[ 14.917676] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=9, num-kctl=0, name=DAC R1
[ 14.917834] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=DAC L1 Power
[ 14.918003] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=DAC R1 Power
[ 14.918383] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=HP L Amp
[ 14.920605] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=HP R Amp
[ 14.923530] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=19, num-kctl=0, name=Amp Power
[ 14.925902] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=18, num-kctl=0, name=HP Post
[ 14.928028] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=1, num-kctl=0, name=HPOL
[ 14.930166] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=1, num-kctl=0, name=HPOR
[ 14.932281] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=1, num-kctl=0, name=LOUTL
[ 14.934355] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=1, num-kctl=0, name=LOUTR
[ 14.936447] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=1, num-kctl=0, name=PDML
[ 14.938522] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=1, num-kctl=0, name=PDMR
[ 14.940701] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=27, num-kctl=0, name=AIF1 Playback
[ 14.942817] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=28, num-kctl=0, name=AIF1 Capture
[ 14.944989] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=27, num-kctl=0, name=AIF2 Playback
[ 14.947147] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=28, num-kctl=0, name=AIF2 Capture
[ 14.949355] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=23, num-kctl=0, name=PCM0P
[ 14.951592] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF1.0
[ 14.953747] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF1.1
[ 14.955748] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF1.2
[ 14.957738] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF1.3
[ 14.959569] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=27, num-kctl=0, name=SSP2.OUT
[ 14.961378] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=32, num-kctl=0, name=PIPELINE.1.SSP2.OUT
[ 14.963188] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=24, num-kctl=0, name=PCM0C
[ 14.965231] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF2.0
[ 14.967030] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF2.1
[ 14.968839] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=28, num-kctl=0, name=SSP2.IN
[ 14.970606] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=32, num-kctl=0, name=PIPELINE.2.SSP2.IN
[ 14.972425] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=23, num-kctl=0, name=PCM1P
[ 14.974970] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=34, num-kctl=0, name=SRC3.0
[ 14.980166] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF3.0
[ 14.981905] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF3.1
[ 14.983600] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF3.2
[ 14.988211] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=32, num-kctl=0, name=PIPELINE.3.SRC3.0
[ 14.989868] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=25, num-kctl=1, name=TONE5.0
[ 14.991561] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF5.0
[ 14.993290] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=31, num-kctl=0, name=BUF5.1
[ 14.994945] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=32, num-kctl=0, name=PIPELINE.5.TONE5.0
[ 15.000108] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=27, num-kctl=0, name=Media Playback 1
[ 15.001824] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=27, num-kctl=0, name=Low Latency Playback 0
[ 15.003559] sof-audio sof-audio: ==wzg:err create widget:ret=-65535, id=28, num-kctl=0, name=Low Latency Capture 0
[ 15.005394] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=30, num-kctl=0, name=HPO L Playback Switch Autodisable
[ 15.007260] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=30, num-kctl=0, name=HPO R Playback Switch Autodisable
[ 15.012868] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=30, num-kctl=0, name=LOUT L Playback Switch Autodisable
[ 15.014810] rt5651 i2c-10EC5651:00: ==wzg:err create widget:ret=-65535, id=30, num-kctl=0, name=LOUT R Playback Switch Autodisable

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine, but change commit message to
ASoC: topology: fix pointer dereference during topology free

Some kcontrols may not be properly created during topology load. Check these during remove() prior to usage.

SoB

@lgirdwood
Copy link
Member

@zhigang-wu The dereference fix code looks ok, but are we missing some other patch for checking and reporting kcontrol results during topology load as discussed with @plbossart. Will this be in a new PR ?

@ghost
Copy link
Author

ghost commented Aug 20, 2018

@lgirdwood
That is what I am thinking about. I will send a PR for this.
Do you think it is ok to report the kcontrol allocated result in snd_soc_dapm_new_widgets() function?

@lgirdwood
Copy link
Member

@zhigang-wu you should report any errors where they are discovered. So if they are discovered in snd_soc_dapm_new_widgets() then you should report errors here too.

@plbossart
Copy link
Member

@zhigang-wu please try without setting an error value for the default case (as I suggested but you changed my code...). It's not clear to me why you want this to be an error when all the widgets listed are perfectly legit - their definition can be found in drivers or topology.

@ghost
Copy link
Author

ghost commented Aug 22, 2018

@plbossart
I tried it. If I did not set the error value for the default case. there is no such log in the dmesg.

@ghost ghost changed the title ASoC: topology: avoid oops on dereference during topology free ASoC: topology: fix pointer dereference during topology free Aug 22, 2018
@ghost
Copy link
Author

ghost commented Aug 22, 2018

@plbossart
So from this result, we could make a conclusion:

  1. there exists lots of unsupported widget in the system.
  2. these unsupported widget will not be processed in the snd_soc_dapm_new_widgets() function.
    the kcontrol{} instance will not be allocated in this situation.
  3. we have to protect the the pointer de-reference in the topology free processing.
  4. we could add the print code in the snd_soc_dapm_new_widgets() function to warn this case.
    maybe print out the widget->id, widget->name. to help developer trace this case.

@ghost
Copy link
Author

ghost commented Aug 22, 2018

@lgirdwood
I updated the PR's title and comments already.

@ghost ghost changed the title ASoC: topology: fix pointer dereference during topology free ASoC:topology: check new widgets allocated result. Aug 22, 2018
@ghost
Copy link
Author

ghost commented Aug 22, 2018

@plbossart
I updated the PR already.
I will test it after back to office.
we do two steps here:

  1. if the return value from the sub-function is < 0. we have to free the kcontrol pointer before return.
  2. if the default case if reached, the w->id is not supported right now, we have to free the kcontrol if need. but no return. the widget process will be continued.
  3. we have to unlock the mutex before return.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear on the problem statement still and if the solution can be accepted upstream.

break;
default:
need_free = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? Why do you need to free the kcontrols in that case? You will need to add a strong explanation to explain why this is needed - since it's going to affect a lot of platforms using this code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will add the comments here to explain it.

@ghost
Copy link
Author

ghost commented Aug 23, 2018

there are several cases:

  1. when the w->id is not supported, it will enter the default case, the w->kcontrols pointer array is allocated in new_widgets() function. the pointer in the w->kcontrols[] is empty. so system will hit the oops in topology free state: for example in the remove_widget() function.
    so we have to free the w->kcontrols to avoid this case.
  2. when the w->id is supported, if the ret<0, means the process for the snd_kcontrol{} is failure.
    the w->kcontrols[] has empty pointer to hold. if we did not free here, the system will hit the oops
    in tolopogy free state.
  3. for example in remove_widget(), it only check the w->kcontrols pointer, but did not check the
    pointer stored in the w->kcontrols[].

@ghost
Copy link
Author

ghost commented Aug 23, 2018

If we free the pointer in new_widgets() function, we did not need to add more pointer check in remove_widget() to avoid the oops issue.

@plbossart
Copy link
Member

@zhigang-wu I don't understand what 'supported' widget means. There is no such concept, we use widgets that are part of an existing ASoC list, so your explanations are impossible to follow. I believe you are treating the default case as an error, but I don't think it's always the case; the code , so your systematic free is likely to have negative side effects.
Take as an example the 'siggen' widget we use for tones, it'd be handled as part of the default case but it's fully legit. Why would you want to free the associated kcontrols when they are very much required.

@ghost
Copy link
Author

ghost commented Aug 23, 2018

@plbossart
the "unsupported" is not accurate here. I can take it as "kcontrol non-allocated".
In the function snd_soc_dapm_new_widgets(), the mainly target is to allocate the snd_kcontrol{} instance. the widget->kcontrols[] is the pointer array to hold these kcontrol instance.

in this oops case: the "siggen" widget is not processed, the kcontrol for this widget is not allocated.
So I think if this snd_kcontrol is not allocated, the widget->kcontrols[] will be useless,
we should free it right now.

There is some unclear things, I need you to confirm:

  1. when one widget needs the kcontrol to be allocated, but fail at last. should we stop all the widgets processed or just drop this widget and continue to process other widget?
    the old code seems do nothing for this.
  2. if one widget needs multi kcontrol to be allocated. but one of them is failed during allocated. what should we do. free all of the allocated kcontrols? or just let it go with warning report?

@ghost
Copy link
Author

ghost commented Aug 24, 2018

@plbossart @lgirdwood
I discussed with Keyon about this PR.
there is another propose:

  1. can we move the code below in the snd_soc_dapm_new_widgets() into dapm_new_mixer()/dapm_new_mux()/dapm_new_pga()/dapm_new_dai_link() seperately?
    in these function, we will decide to free the w->kcontrols or not based on the process status.
    if (w->num_kcontrols) {
    w->kcontrols = kcalloc(w->num_kcontrols,
    sizeof(struct snd_kcontrol *),
    GFP_KERNEL);
    ..........
    }

  2. judge the return value from these sub-function. when ret<0, we just return to abort this process in the snd_soc_dapm_new_widgets() function.

in this situation, the "siggen" widget will not be allocated the kcontrol instance. we did not need to
care about free process.
what do you think of this idea?
thanks!

@@ -3068,23 +3069,42 @@ int snd_soc_dapm_new_widgets(struct snd_soc_card *card)
case snd_soc_dapm_switch:
case snd_soc_dapm_mixer:
case snd_soc_dapm_mixer_named_ctl:
dapm_new_mixer(w);
ret = dapm_new_mixer(w);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the return is good, but dapm_new_mixer, dapm_new_mux etc should free any resources they allocate on failure. likewise dapm_create_or_share_kcontrol() will also need to be check that it frees any resource it allocates on failure. So please

  1. Check return values and propagate the return values up the call stack.
  2. Free any resources in the function that allocates them on any failure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I checked the code in the dapm_create_or_share_kcontrol(), when it detected the error and before return, it has already finished free operation. in this functin, it has only two allocation operation for long_name and kcontrol.
    like the snd_soc_dapm_add_path(), it also do this job already when return with error.
  2. we only care about the w->kcontrols itself. when the sub-function return error. it will free the allocated w->kcontrols.
  3. But the updated PR also has the problem: when w->num_kcontrols is not equal to 0. but the code goto the default case. it will cause the w->kcontrols will be allocated. but never be processed further.
    I think we have to free it when enter the default case.

@lgirdwood
Copy link
Member

@zhigang-wu my general comment is that we check return values and send them up the call stack and free any resources in the functions that allocate them.

@ghost
Copy link
Author

ghost commented Aug 24, 2018

@lgirdwood
That is clear to me.
I will update it accordingly.

@ghost ghost changed the title ASoC:topology: check new widgets allocated result. ASoC:topology: bug fix oops cause by pointer dereference. Aug 27, 2018
@ghost
Copy link
Author

ghost commented Aug 27, 2018

@lgirdwood
I updated the PR. and I did the test. it seems it still has some problem.

  1. In this PR, I move the kcontrols allocation into the sub-function.
    in each sub-function, I do the kcontrols allocation, if there is some errors,
    I will free the allocated kcontrols.
  2. in snd_soc_dapm_new_widgets() function, I will do the return value check,
    if it is < 0, it will return directly with the error return value.
  3. If some widgets enter the default case. the kcontrols will not be allocated.

But this semms still has some problems. I will do more research on this PR.

@ghost
Copy link
Author

ghost commented Aug 27, 2018

I tested it with the v4.18 kernel. I am not sure whether it is the new issue or not.
I have to do more debug.

@@ -3054,37 +3110,32 @@ int snd_soc_dapm_new_widgets(struct snd_soc_card *card)
if (w->new)
continue;

if (w->num_kcontrols) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep allocation here for w->kcontrols,

break;
default:
break;
}

if (ret < 0) {
mutex_unlock(&card->dapm_mutex);
return ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free w->kcontrols here.

check the return value to free the kcontrols instance
to avoid oops caused by the pointer dereference.

Signed-off-by: Wu Zhigang <zhigang.wu@linux.intel.com>
break;
case snd_soc_dapm_dai_link:
dapm_new_dai_link(w);
ret = dapm_new_dai_link(w);
break;
default:
break;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should the code in the default case:
if (w->num_kcontrols) {
kfree(w->kcontrols);
w->kcontrols = NULL;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, do it outside default case otherwise we wont kfree() all resources.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood
what should i do if we are in this case?
the w->num_kcontrols != 0, the default case will be entered.

in this case, the w->kcontrols will be allocated. but it is not processed by sub-function, because the default case is entered. then question comes: should we kfree this allocated w->kcontrols at this time?

If yes, we have to kfree it. then I will roll back to the previous version: adding the flag in the default case. when the flag is set in the default. we will do the kfree.

if no, I think we can not cover the "siggen" case, which cause our panic when in tplg free stage.
its w->num_kcontrols = 1, but "snd_soc_dapm_siggen" is not in this switch{}, it will enter the default case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhigang-wu I need you to improve the current patch to consider all error paths in the function. default case is a valid for other widget types so cant be used as a means of freeing resources. Just walk through this function line by line and see where things could fail and then ask where do I recover this ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood that function adds new widgets by walking a linked list of them. If adding one of them fails, looks like previously successfully added widgets are kept. Without this patch in such a case the function could return an error, if allocation failed, or 0, if initialisation failed. I can see the following possibilities in case of a partial success:

  1. free all so far successfully added widgets, using snd_soc_dapm_free_widget() and return an error
  2. keep all successfully added widgets and return 0
  3. keep and return an error seems illogical to me
    Which one would you prefer? 1 seems the most consistent to me. Also note, that most users of the function don't even check its return code...

break;
case snd_soc_dapm_dai_link:
dapm_new_dai_link(w);
ret = dapm_new_dai_link(w);
break;
default:
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhigang-wu I need you to improve the current patch to consider all error paths in the function. default case is a valid for other widget types so cant be used as a means of freeing resources. Just walk through this function line by line and see where things could fail and then ask where do I recover this ?

@ghost ghost closed this Sep 18, 2018
bardliao pushed a commit that referenced this pull request Jun 11, 2019
BugLink: https://bugs.launchpad.net/bugs/1821607

commit baef1c9 upstream.

Using the batch API from the interconnect driver sometimes leads to a
KASAN error due to an access to freed memory. This is easier to trigger
with threadirqs on the kernel commandline.

 BUG: KASAN: use-after-free in rpmh_tx_done+0x114/0x12c
 Read of size 1 at addr fffffff51414ad84 by task irq/110-apps_rs/57

 CPU: 0 PID: 57 Comm: irq/110-apps_rs Tainted: G        W         4.19.10 #72
 Call trace:
  dump_backtrace+0x0/0x2f8
  show_stack+0x20/0x2c
  __dump_stack+0x20/0x28
  dump_stack+0xcc/0x10c
  print_address_description+0x74/0x240
  kasan_report+0x250/0x26c
  __asan_report_load1_noabort+0x20/0x2c
  rpmh_tx_done+0x114/0x12c
  tcs_tx_done+0x450/0x768
  irq_forced_thread_fn+0x58/0x9c
  irq_thread+0x120/0x1dc
  kthread+0x248/0x260
  ret_from_fork+0x10/0x18

 Allocated by task 385:
  kasan_kmalloc+0xac/0x148
  __kmalloc+0x170/0x1e4
  rpmh_write_batch+0x174/0x540
  qcom_icc_set+0x8dc/0x9ac
  icc_set+0x288/0x2e8
  a6xx_gmu_stop+0x320/0x3c0
  a6xx_pm_suspend+0x108/0x124
  adreno_suspend+0x50/0x60
  pm_generic_runtime_suspend+0x60/0x78
  __rpm_callback+0x214/0x32c
  rpm_callback+0x54/0x184
  rpm_suspend+0x3f8/0xa90
  pm_runtime_work+0xb4/0x178
  process_one_work+0x544/0xbc0
  worker_thread+0x514/0x7d0
  kthread+0x248/0x260
  ret_from_fork+0x10/0x18

 Freed by task 385:
  __kasan_slab_free+0x12c/0x1e0
  kasan_slab_free+0x10/0x1c
  kfree+0x134/0x588
  rpmh_write_batch+0x49c/0x540
  qcom_icc_set+0x8dc/0x9ac
  icc_set+0x288/0x2e8
  a6xx_gmu_stop+0x320/0x3c0
  a6xx_pm_suspend+0x108/0x124
  adreno_suspend+0x50/0x60
 cr50_spi spi5.0: SPI transfer timed out
  pm_generic_runtime_suspend+0x60/0x78
  __rpm_callback+0x214/0x32c
  rpm_callback+0x54/0x184
  rpm_suspend+0x3f8/0xa90
  pm_runtime_work+0xb4/0x178
  process_one_work+0x544/0xbc0
  worker_thread+0x514/0x7d0
  kthread+0x248/0x260
  ret_from_fork+0x10/0x18

 The buggy address belongs to the object at fffffff51414ac80
  which belongs to the cache kmalloc-512 of size 512
 The buggy address is located 260 bytes inside of
  512-byte region [fffffff51414ac80, fffffff51414ae80)
 The buggy address belongs to the page:
 page:ffffffbfd4505200 count:1 mapcount:0 mapping:fffffff51e00c680 index:0x0 compound_mapcount: 0
 flags: 0x4000000000008100(slab|head)
 raw: 4000000000008100 ffffffbfd4529008 ffffffbfd44f9208 fffffff51e00c680
 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  fffffff51414ac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  fffffff51414ad00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >fffffff51414ad80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                    ^
  fffffff51414ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  fffffff51414ae80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

The batch API sets the same completion for each rpmh message that's sent
and then loops through all the messages and waits for that single
completion declared on the stack to be completed before returning from
the function and freeing the message structures. Unfortunately, some
messages may still be in process and 'stuck' in the TCS. At some later
point, the tcs_tx_done() interrupt will run and try to process messages
that have already been freed at the end of rpmh_write_batch(). This will
in turn access the 'needs_free' member of the rpmh_request structure and
cause KASAN to complain. Furthermore, if there's a message that's
completed in rpmh_tx_done() and freed immediately after the complete()
call is made we'll be racing with potentially freed memory when
accessing the 'needs_free' member:

	CPU0                         CPU1
	----                         ----
	rpmh_tx_done()
	 complete(&compl)
	                             wait_for_completion(&compl)
	                             kfree(rpm_msg)
	 if (rpm_msg->needs_free)
	 <KASAN warning splat>

Let's fix this by allocating a chunk of completions for each message and
waiting for all of them to be completed before returning from the batch
API. Alternatively, we could wait for the last message in the batch, but
that may be a more complicated change because it looks like
tcs_tx_done() just iterates through the indices of the queue and
completes each message instead of tracking the last inserted message and
completing that first.

Fixes: c8790cb ("drivers: qcom: rpmh: add support for batch RPMH request")
Cc: Lina Iyer <ilina@codeaurora.org>
Cc: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: stable@vger.kernel.org
Reviewed-by: Lina Iyer <ilina@codeaurora.org>
Reviewed-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants