Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpi-sound cards and upstreaming #3

Closed
msperl opened this issue May 15, 2016 · 38 comments
Closed

rpi-sound cards and upstreaming #3

msperl opened this issue May 15, 2016 · 38 comments

Comments

@msperl
Copy link
Owner

msperl commented May 15, 2016

@clivem: let us discuss these sound card things here...

I have tried the following upstream, but the card is not recognized:

/ {
    pcm5102a_codec: pcm5102a_codec {
        #sound-dai-cells = <0>;
        compatible = "ti,pcm5102a";
        status = "okay";
    };

    sound {
        compatible = "simple-audio-card";
        simple-audio-card,format = "i2s";
        simple-audio-card,name = "pcm5102a";
        status = "okay";
        simple-audio-card,cpu {
            sound-dai = <&i2s>;
        };
        simple-audio-card,codec {
            sound-dai = <&pcm5102a_codec>;
        };
    };
};
/* for some reasons this is required - simple-sound-card complains otherwise */
&i2s {
    #sound-dai-cells = <0>;
};

If I can make this work we can take things further from here...

As for the extension: we can look into it if we know how to parametrise things...

@clivem
Copy link

clivem commented May 15, 2016

I didn't need that sound-dai-cells=<0> for i2s def. I am using the simple-pcm5102a-dac overlay config that I showed in the gist....

https://gist.github.com/clivem/9d1da750f666dc1d1ede4df761079742

dtoverlay=simple-pcm5102a-dac,card-name="snd_rpi_hifiberry_dac"

asoc-simple-card soc:sound: New simple-card: snd_rpi_hifiberry_dac
asoc-simple-card soc:sound: Revert to legacy daifmt parsing
asoc-simple-card soc:sound:         name : bcm2835-i2s-pcm5102a-hifi
asoc-simple-card soc:sound:         format : 4001
asoc-simple-card soc:sound:         cpu : bcm2835-i2s / 0
asoc-simple-card soc:sound:         codec : pcm5102a-hifi / 0
asoc-simple-card soc:sound: pcm5102a-hifi <-> 20203000.i2s mapping ok

@msperl
Copy link
Owner Author

msperl commented May 15, 2016

At least upstream 4.6 requires it - or you get:

[   70.787180] [591] asoc-simple-card sound: New simple-card: HifiBerry DAC
[   70.787255] [591] asoc-simple-card sound: Revert to legacy daifmt parsing
[   70.787316] /sound/simple-audio-card,cpu: could not get #sound-dai-cells for /soc/i2s@7e203000
[   70.787342] asoc-simple-card sound: parse error -22
[   70.787401] asoc-simple-card: probe of sound failed with error -22

with this defined the driver loads...

[   60.427500] [589] asoc-simple-card sound: New simple-card: HifiBerry DAC
[   60.427584] [589] asoc-simple-card sound: Revert to legacy daifmt parsing
[   60.427701] [589] asoc-simple-card sound:    name : bcm2835-i2s-pcm5102a-hifi
[   60.427727] [589] asoc-simple-card sound:    format : 4001
[   60.427746] [589] asoc-simple-card sound:    cpu : bcm2835-i2s / 0
[   60.427765] [589] asoc-simple-card sound:    codec : pcm5102a-hifi / 0
[   60.429111] asoc-simple-card sound: pcm5102a-hifi <-> 20203000.i2s mapping ok

I still have no sound, but that is probably related to the *2 issue

@msperl
Copy link
Owner Author

msperl commented May 16, 2016

So going further: the snd_soc_dai_set_bclk_ratio(cpu_dai, sample_bits * 2)
is not really needed, as that is the default set in i2s_bcm2835 if nothing is set.

So we are actually covered there...

This mostly leaves the custom bclk settings...

@clivem
Copy link

clivem commented May 16, 2016

Sorry, not in front of computer at moment. ISTR only difference without card driver setting bclk ratio, is for 24 bit, where bclk_ratio from i2s driver will be 48. ie. 24 bits * 2 channels. Card driver uses physical width of bits, so S24_LE = 32 bits * 2 channels = 64.

hifiberry_dac card driver will request bclk_ratio of 32 for 16 bit and 64 for 24 and 32 bit. I2S driver will use 32 for 16 bit, 48 for 24 bit, and 64 for 32 bit.

@msperl
Copy link
Owner Author

msperl commented May 16, 2016

here an example that - i guess - I got to work:

        hw-params-rule@0 {
            match@0 {
                method = "asoc_generic_hw_params_match_sample_bits";
                value = <32>;
            };
            match@1 {
                method = "asoc_generic_hw_params_match_rate";
                value = <48000>;
            };
            match@2 {
                method = "asoc_generic_hw_params_match_channels";
                value = <2>;
            };
            action@0 {
                method = "asoc_generic_hw_params_set_fixed_bclk_size";
                value = <80>;
            };
        };

And then playing: speaker-test -c 2 -r 48000 -F S32_LE -f 440 -t sine
I get the following pcm clock:

root@raspcm:~# cat /sys/kernel/debug/clk/pcm/regdump
ctl = 0x00000001
div = 0x00005000

so parent clock 1 = OSC and a divider of 5.

Here the debug messages:

[root@raspcm:~# speaker-test -c 2 -r 48000 -F S32_LE -f 440 -t sine -l 1
speaker-test 1.0.28

Playback device is default
Stream parameters are 48000Hz, S32_LE, 2 channels
Sine wave rate is 440.0000Hz
Rate set to 48000Hz (requested 48000Hz)
Buffer size range from 64 to 65536
Period size range from 32 to 32768
Using max buffer size 65536
Periods = 4
was set period_size = 16384
was set buffer_size = 65536
 0 - Front Left
 1 - Front Right
Time per period = 4.196302
root@raspcm:~# cat /sys/kernel/debug/clk/pcm/regdump ctl = 0x00000001
div = 0x00005000
root@raspcm:~# dmesg
[  113.390048] [623] asoc-simple-card sound: New simple-card: HifiBerry DAC
[  113.390138] [623] asoc-simple-card sound: Revert to legacy daifmt parsing
[  113.390264] [623] asoc-simple-card sound:    name : bcm2835-i2s-pcm5102a-hifi
[  113.390291] [623] asoc-simple-card sound:    format : 4001
[  113.390312] [623] asoc-simple-card sound:    cpu : bcm2835-i2s / 3840000
[  113.390333] [623] asoc-simple-card sound:    codec : pcm5102a-hifi / 0
[  113.390380] [623] asoc-simple-card sound:    adding Rule: /sound/hw-params-rule@0
[  113.390454] [623] asoc-simple-card sound:        added match: /sound/hw-params-rule@0/match@0 - asoc_generic_hw_params_match_sample_bits [snd_soc_simple_card](00000020)
[  113.390512] [623] asoc-simple-card sound:        added match: /sound/hw-params-rule@0/match@1 - asoc_generic_hw_params_match_rate [snd_soc_simple_card](0000bb80)
[  113.390585] [623] asoc-simple-card sound:        added match: /sound/hw-params-rule@0/match@2 - asoc_generic_hw_params_match_channels [snd_soc_simple_card](00000002)
[  113.390655] [623] asoc-simple-card sound:        added action: /sound/hw-params-rule@0/action@0 - asoc_generic_hw_params_set_fixed_bclk_size [snd_soc_simple_card](00000050)
[  113.391991] asoc-simple-card sound: pcm5102a-hifi <-> 20203000.i2s mapping ok
--- here the speaker test ---
[  119.048124] [627]  bcm2835-i2s-pcm5102a-hifi: Trying to apply rule: /sound/hw-params-rule@0
[  119.048217] [627]  bcm2835-i2s-pcm5102a-hifi:    Running match asoc_generic_hw_params_match_sample_bits [snd_soc_simple_card](00000020)
[  119.048257] [627]  bcm2835-i2s-pcm5102a-hifi:    Running match asoc_generic_hw_params_match_rate [snd_soc_simple_card](0000bb80)
[  119.048309] [627]  bcm2835-i2s-pcm5102a-hifi:    Running match asoc_generic_hw_params_match_channels [snd_soc_simple_card](00000002)
[  119.048345] [627]  bcm2835-i2s-pcm5102a-hifi:    Running action asoc_generic_hw_params_set_fixed_bclk_size [snd_soc_simple_card](00000050)

So now in principle you can define (almost) any choice.

The way it is written this can easily get applied to any driver:

diff --git a/sound/soc/generic/simple-card.c b/sound/soc/generic/simple-card.c
index 2389ab4..6b95ea1 100644
--- a/sound/soc/generic/simple-card.c
+++ b/sound/soc/generic/simple-card.c
@@ -8,6 +8,7 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
+#include "hw-params-rules.h"
 #include <linux/clk.h>
 #include <linux/device.h>
 #include <linux/gpio.h>
@@ -33,6 +34,7 @@ struct simple_card_data {
        int gpio_hp_det_invert;
        int gpio_mic_det;
        int gpio_mic_det_invert;
+       struct list_head hw_params_rules;
        struct snd_soc_dai_link dai_link[];     /* dynamically allocated */
 };

@@ -51,7 +53,7 @@ static int asoc_simple_card_startup(struct snd_pcm_substream *substream)
        ret = clk_prepare_enable(dai_props->cpu_dai.clk);
        if (ret)
                return ret;
-
+
        ret = clk_prepare_enable(dai_props->codec_dai.clk);
        if (ret)
                clk_disable_unprepare(dai_props->cpu_dai.clk);
@@ -99,7 +101,9 @@ static int asoc_simple_card_hw_params(struct snd_pcm_substream *substream,
                if (ret && ret != -ENOTSUPP)
                        goto err;
        }
-       return 0;
+
+       return asoc_generic_hw_params_process_rules(
+               &priv->hw_params_rules, substream, params);
 err:
        return ret;
 }
@@ -509,7 +513,8 @@ static int asoc_simple_card_parse_of(struct device_node *node,
        if (!priv->snd_card.name)
                priv->snd_card.name = priv->snd_card.dai_link->name;

-       return 0;
+       return asoc_generic_hw_params_rules_parse_of(
+               dev, node, &priv->hw_params_rules);
 }

 /* Decrease the reference count of the device nodes */

@msperl
Copy link
Owner Author

msperl commented May 16, 2016

If you are interested look at my approach at:

@clivem
Copy link

clivem commented May 16, 2016

If you are interested look at my approach at:

Yes, more than interested, but at the moment I'm having to deal with a security breach issue, which means I won't get around to looking until this evening.

@msperl
Copy link
Owner Author

msperl commented May 16, 2016

extensions for other "matches" or "actions" can get easily get added.

I really would like to use kallsyms_lookup_name, but that comes with some implications and could get abused, so it is not included yet, but the static mapping is used.

With that one could also add those "choose clock" things of the pcm51xx into that driver and use it with simple-config...

@clivem
Copy link

clivem commented May 16, 2016

@msperl Martin, just thinking out loud, because I haven't even looked closely yet, are the match values, single values, or can be a list? Otherwise will have to have a hw-params-rule for every single sample rate at every single bit depth we want to cover. I'm thinking, something like this would result in the least lines of config needed in dts.....

        hw-params-rule@0 {
                match@0 {
                        method = "asoc_generic_hw_params_match_sample_bits";
                        value = <16>;
                };
                match@1 {
                        method = "asoc_generic_hw_params_match_rate";
                        value = <8000 16000 32000 48000 64000 96000>;
                };
                match@2 {
                        method = "asoc_generic_hw_params_match_channels";
                        value = <2>;
                };
                action@0 {
                        method = "asoc_generic_hw_params_set_fixed_bclk_size";
                        value = <50>;
                };
        };
        hw-params-rule@1 {
                match@0 {
                        method = "asoc_generic_hw_params_match_sample_bits";
                        value = <24 32>;
                };
                match@1 {
                        method = "asoc_generic_hw_params_match_rate";
                        value = <8000 16000 32000 48000 64000 96000>;
                };
                match@2 {
                        method = "asoc_generic_hw_params_match_channels";
                        value = <2>;
                };
                action@0 {
                        method = "asoc_generic_hw_params_set_fixed_bclk_size";
                        value = <100>;
                };
        };

@clivem
Copy link

clivem commented May 16, 2016

@msperl When you tested the of parsing code, did you have more than a single rule in your dt config????

msperl.txt

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

@clivem I have only implemented single value matchers for now - an array is certainly feasible...
Afair I have tested 2 rules at some point...

@clivem
Copy link

clivem commented May 17, 2016

Afair I have tested 2 rules at some point...

This doesn't look right to me. It looks like matches/actions for rules 0,1,2 and 3 are being added to the list for rule 3.


Rule: /soc/sound/hw-params-rule@3
 added match: /soc/sound/hw-params-rule@3/match@2
 added match: /soc/sound/hw-params-rule@3/match@1
 added match: /soc/sound/hw-params-rule@3/match@0
 added match: /soc/sound/hw-params-rule@2/match@2
 added match: /soc/sound/hw-params-rule@2/match@1
 added match: /soc/sound/hw-params-rule@2/match@0
 added match: /soc/sound/hw-params-rule@1/match@2
 added match: /soc/sound/hw-params-rule@1/match@1
 added match: /soc/sound/hw-params-rule@1/match@0
 added match: /soc/sound/hw-params-rule@0/match@2
 added match: /soc/sound/hw-params-rule@0/match@1
 added match: /soc/sound/hw-params-rule@0/match@0
 added action: /soc/sound/hw-params-rule@3/action@0
 added action: /soc/sound/hw-params-rule@2/action@0
 added action: /soc/sound/hw-params-rule@1/action@0
 added action: /soc/sound/hw-params-rule@0/action@0

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

i know - i guess I did not read the fine details when testing the first rule...
Working on the values thing right now and I guess that of_find_node_by_name does not stay within one parent and so I have to write my own matcher (and I had issue when i wrote that initially)...

@clivem
Copy link

clivem commented May 17, 2016

Working on the values thing right now

OK, great. I'll come back to this later today. I think if you can make those matches takes arrays of u32's rather than a single value, and the logic is that if any value in the array is matched...... and the of parsing gets a tweak, we have a winner..... The question is, will upstream take it, or is there going to be the usual sharp intake of breath, followed by a "why do you even need this".... LOL.

As far as this framework is concerned, leaving aside clock selection again for the moment.... I wonder if you can find a few mins to think about this, and how we could make this sort of need to initialise things work via the same sort of framework on top of simple-card, so something like this PR submitted yesterday could use simple-card and not require a dedicated machine/card driver.......

raspberrypi#1476 (comment)

I still have concerns that "simple" is going to end up being "complicated".... But as @pelwell said yesterday, "Yes, once again simple-card proves to be slightly too simple to be useful."

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

well - as for initialization we could in principle use the same interface - just:

sound {
  init@0 { method = "..."; values = "..."; };
};

It just depends on what is needed - but actually, as 4.6 just went out, i want to try to get those clock changes in so I would like to focus on those...

I will send that initial "patch" for hw_param hopefully today...

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

pushed an update for array - note that value changes to values for those that expect an array

@clivem
Copy link

clivem commented May 17, 2016

It just depends on what is needed - but actually, as 4.6 just went out, i want to try to get those clock changes in so I would like to focus on those...

Yes, you should do that. The 'core' clock stuff is much more important than this.

pushed an update for array - note that value changes to values for those that expect an array

Just making a coffee and then I'll pull the latest.

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

just wait until I pushed the last change to fix that bug...

[ 2117.352855] [824] asoc-simple-card sound: New simple-card: HifiBerry DAC
[ 2117.352944] [824] asoc-simple-card sound: Revert to legacy daifmt parsing
[ 2117.353077] [824] asoc-simple-card sound:    name : bcm2835-i2s-pcm5102a-hifi
[ 2117.353105] [824] asoc-simple-card sound:    format : 4001
[ 2117.353127] [824] asoc-simple-card sound:    cpu : bcm2835-i2s / 7680000
[ 2117.353162] [824] asoc-simple-card sound:    codec : pcm5102a-hifi / 0
[ 2117.353201] [824] asoc-simple-card sound:    adding Rule: /sound/hw-params-rule@0
[ 2117.353271] [824] asoc-simple-card sound:            added match: /sound/hw-params-rule@0/match@0 - asoc_generic_hw_params_match_sample_bits [snd_soc_hw_params_rules](dace4450)
[ 2117.353338] [824] asoc-simple-card sound:            added match: /sound/hw-params-rule@0/match@1 - asoc_generic_hw_params_match_rate [snd_soc_hw_params_rules](dace4910)
[ 2117.353386] [824] asoc-simple-card sound:            added match: /sound/hw-params-rule@0/match@2 - asoc_generic_hw_params_match_channels [snd_soc_hw_params_rules](dace4350)
[ 2117.353452] [824] asoc-simple-card sound:            added action: /sound/hw-params-rule@0/action@0 - asoc_generic_hw_params_set_fixed_bclk_size [snd_soc_hw_params_rules](00000050)
[ 2117.353487] [824] asoc-simple-card sound:    adding Rule: /sound/hw-params-rule@1
[ 2117.353550] [824] asoc-simple-card sound:            added match: /sound/hw-params-rule@1/match@0 - asoc_generic_hw_params_match_sample_bits [snd_soc_hw_params_rules](dacbce90)
[ 2117.353601] [824] asoc-simple-card sound:            added match: /sound/hw-params-rule@1/match@1 - asoc_generic_hw_params_match_rate [snd_soc_hw_params_rules](dacbcf10)
[ 2117.353648] [824] asoc-simple-card sound:            added match: /sound/hw-params-rule@1/match@2 - asoc_generic_hw_params_match_channels [snd_soc_hw_params_rules](dacbcc10)
[ 2117.353711] [824] asoc-simple-card sound:            added action: /sound/hw-params-rule@1/action@0 - asoc_generic_hw_params_set_fixed_bclk_size [snd_soc_hw_params_rules](00000050)
[ 2117.355056] asoc-simple-card sound: pcm5102a-hifi <-> 20203000.i2s mapping ok
[ 2119.388367] [830]  bcm2835-i2s-pcm5102a-hifi: Trying to apply rule: /sound/hw-params-rule@0
[ 2119.388454] [830]  bcm2835-i2s-pcm5102a-hifi:        Running match asoc_generic_hw_params_match_sample_bits [snd_soc_hw_params_rules](dace4450)
[ 2119.388490] [830]  bcm2835-i2s-pcm5102a-hifi:        Running match asoc_generic_hw_params_match_rate [snd_soc_hw_params_rules](dace4910)
[ 2119.388511] [830]  bcm2835-i2s-pcm5102a-hifi: Trying to apply rule: /sound/hw-params-rule@1
[ 2119.388555] [830]  bcm2835-i2s-pcm5102a-hifi:        Running match asoc_generic_hw_params_match_sample_bits [snd_soc_hw_params_rules](dacbce90)
[ 2119.388589] [830]  bcm2835-i2s-pcm5102a-hifi:        Running match asoc_generic_hw_params_match_rate [snd_soc_hw_params_rules](dacbcf10)
[ 2119.388619] [830]  bcm2835-i2s-pcm5102a-hifi:        Running match asoc_generic_hw_params_match_channels [snd_soc_hw_params_rules](dacbcc10)
[ 2119.388648] [830]  bcm2835-i2s-pcm5102a-hifi:        Running action asoc_generic_hw_params_set_fixed_bclk_size [snd_soc_hw_params_rules](00000050)

Had to apply one more fix before fixing the dt-parsing...

Pushed and enjoy...

@clivem
Copy link

clivem commented May 17, 2016

Thanks. I'll stop my current build and pull the latest.....

@clivem
Copy link

clivem commented May 17, 2016

--- sound/soc/generic/hw-params-rules.c~    2016-05-17 13:51:52.000000000 +0100
+++ sound/soc/generic/hw-params-rules.c 2016-05-17 13:53:00.704744902 +0100
@@ -40,7 +40,7 @@
 static int asoc_generic_hw_params_read_u32array(
    struct device *dev, struct device_node *node, void **data)
 {
-   int i, size, ret;
+   int size, ret;
    struct snd_soc_size_u32array *array;

    size = of_property_count_elems_of_size(node, "values", sizeof(u32));
@@ -59,11 +59,9 @@

    array->size = size;

-   for (i = 0; i < size; i++) {
-       ret = of_property_read_u32(node, "values", &array->data[i]);
-       if (ret)
-           return ret;
-   }
+   ret = of_property_read_u32_array(node, "values", &array->data[0], size);
+   if (ret)
+       return ret;

    return 0;
 }

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

yes - there is that (but no of_for_each_child_with_name...)
Will fix that.

Except for that: does it work as expected?

@clivem
Copy link

clivem commented May 17, 2016

Except for that: does it work as expected?

No, don't submit it yet. At the moment I'm dealing with a security issue, which I need to concentrate on.

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

as a note: the new audioinjector-pi-soundcard that came into the foundation kernel would have - afaik - worked with the simple-card driver....

Or is there something missing?

@pelwell
Copy link

pelwell commented May 17, 2016

See raspberrypi#1476 (comment)

@clivem
Copy link

clivem commented May 17, 2016

Deja-vu......

@pelwell
Copy link

pelwell commented May 17, 2016

More like "Pas déjà lu".

@clivem
Copy link

clivem commented May 17, 2016

@msperl Sorry, I have to go sort out a problem for my mother, so I need to cut and run without giving a good explanation of where I think we are now.... I'll explain about sorting the rules by priority and that we need a default rule with lowest priority to set the bclk back to default (or bcm2835-i2s driver controlled) after having had it set via a match.....

https://gist.github.com/clivem/ff1ac5a793524bdd066aea7f906e1bf6

Ignore the dupe rule@2 in the overlay. I screwed up the cut and paste. Don't have time to fix now.

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

actually just the "numbers" after hw_param_rule@... will define the order (but alpha-numerically, so 1, 10, 11, 19, 2, 20, 21, ... ,9, 90, 99, ...)

so if you need a lowest order rule use hw_param_rule@9999 or something like that

@msperl
Copy link
Owner Author

msperl commented May 17, 2016

anyway - typically you would use reg = ; in such situations and not priority - at least as far as I understood...

@clivem
Copy link

clivem commented May 19, 2016

@msperl Sorry Martin, another crazy day today......

Yes, I hear what you say about priority, but people (non devs) wont understand why @99 > @100.
And I thought 'reg' was used for addresses, so I wouldn't have thought of using that as a descriptor for priority....

I have combined what I have been testing with (es9023, 384k and your hw_param_rule) on top of your upstream backport rpi-next-merged-patches-not-accepted-upstream dma/i2s, on top of downstream rpi-4.4.y HEAD.....

https://github.com/DigitalDreamtimeLtd/linux/commits/rpi-4.4.y-simple

For the ES9023 boards, I'm using/combining ....
dtoverlay=simple-es9023-audio,384k,card-name="Akkordion" plus
dtoverlay=simple-bclk-int-div-50-100 or dtoverlay=simple-bclk-int-div-40-80
For (HifiBerry 1st gen 'B' DAC and Pimoroni PHAT DAC on PiZero) pcm5102a ....
dtoverlay=simple-pcm5102a-audio,card-name="snd_rpi_hifiberry_dac"

I know I'm probably going to get my ears chewed for match_noop. ;) Hope you don't have an issue with match_rate_{lt,gt,div_by}.

woshihuangzhijie pushed a commit to woshihuangzhijie/linuxcode that referenced this issue Sep 20, 2017
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
amichelotti pushed a commit to amichelotti/ubuntu-vme-xenial that referenced this issue Oct 19, 2017
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
lianhaidong pushed a commit to lianhaidong/ubuntu-xenial that referenced this issue Nov 6, 2017
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
liuqun pushed a commit to liuqun/linux-raspi2 that referenced this issue Dec 14, 2017
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
liuqun pushed a commit to liuqun/linux-raspi2 that referenced this issue Dec 20, 2017
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
liuqun pushed a commit to liuqun/linux-raspi2 that referenced this issue Dec 20, 2017
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
polceanum pushed a commit to polceanum/ubuntu-xenial that referenced this issue Feb 6, 2018
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
APokorny pushed a commit to ubports/ubuntu_kernel_xenial that referenced this issue Oct 11, 2018
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
msperl pushed a commit that referenced this issue Jan 6, 2019
[ Upstream commit c5a94f4 ]

It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue Feb 15, 2019
Yonghong Song says:

====================
The previous BTF kind_flag support patch set introduced a bug
for kernel bpffs pretty printing and another bug for bpftool
map pretty printing. If a bitfield struct member offset is
greater than 256 bits, printed value for that struct
member will be incorrect.

- Patch #1 fixed the bug in kernel bpffs pretty printing.
- Patch #2 enhanced the test_btf test case to cover the
           issue exposed by patch #1.
- Patch #3 fixed the bug in bpftool map pretty printing.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
msperl pushed a commit that referenced this issue Feb 15, 2019
Taehee Yoo says:

====================
net: bpfilter: fix two bugs in bpfilter

This patches fix two bugs in the bpfilter_umh which are related in
iptables command.

The first patch adds an exit code for UMH process.
This provides an opportunity to cleanup members of the umh_info
to modules which use the UMH.
In order to identify UMH processes, a new flag PF_UMH is added.

The second patch makes the bpfilter_umh use UMH cleanup callback.

The third patch adds re-start routine for the bpfilter_umh.
The bpfilter_umh does not re-start after error occurred.
because there is no re-start routine in the module.

The fourth patch ensures that the bpfilter.ko module will not removed while
it's being used.
The bpfilter.ko is not protected by locks or module reference counter.
Therefore that can be removed while module is being used.
In order to protect that, mutex is used.

The first and second patch are preparation patches for the third and
fourth patch.

TEST #1
   while :
   do
	modprobe bpfilter
	kill -9 <pid of the bpfilter_umh>
	iptables -vnL
   done

TEST #2
   while :
   do
	iptables -I FORWARD -m string --string ap --algo kmp &
	iptables -F &
	modprobe -rv bpfilter &
   done

TEST #3
   while :
   do
	modprobe bpfilter &
	modprobe -rv bpfilter &
   done

The TEST1 makes a failure of iptables command.
This is fixed by the third patch.

The TEST2 makes a panic because of a race condition in the bpfilter_umh
module.
This is fixed by the fourth patch.

The TEST3 makes a double-create UMH process.
This is fixed by the third and fourth patch.

v4 :
 - declare the exit_umh() as static inline
 - check stop flag in the load_umh() to avoid a double-create UMH
v3 :
 - Avoid unnecessary list lookup for non-UMH processes
 - Add a new PF_UMH flag
v2 : add the first and second patch
v1 : Initial patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
msperl pushed a commit that referenced this issue Feb 15, 2019
[ Upstream commit 721b1d9 ]

kcopyd has no upper limit to the number of jobs one can allocate and
issue. Under certain workloads this can lead to excessive memory usage
and workqueue stalls. For example, when creating multiple dm-snapshot
targets with a 4K chunk size and then writing to the origin through the
page cache. Syncing the page cache causes a large number of BIOs to be
issued to the dm-snapshot origin target, which itself issues an even
larger (because of the BIO splitting taking place) number of kcopyd
jobs.

Running the following test, from the device mapper test suite [1],

  dmtest run --suite snapshot -n many_snapshots_of_same_volume_N

, with 8 active snapshots, results in the kcopyd job slab cache growing
to 10G. Depending on the available system RAM this can lead to the OOM
killer killing user processes:

[463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP),
              nodemask=(null), order=1, oom_score_adj=0
[463.492894] kthreadd cpuset=/ mems_allowed=0
[463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3
[463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[463.492952] Call Trace:
[463.492964]  dump_stack+0x7d/0xbb
[463.492973]  dump_header+0x6b/0x2fc
[463.492987]  ? lockdep_hardirqs_on+0xee/0x190
[463.493012]  oom_kill_process+0x302/0x370
[463.493021]  out_of_memory+0x113/0x560
[463.493030]  __alloc_pages_slowpath+0xf40/0x1020
[463.493055]  __alloc_pages_nodemask+0x348/0x3c0
[463.493067]  cache_grow_begin+0x81/0x8b0
[463.493072]  ? cache_grow_begin+0x874/0x8b0
[463.493078]  fallback_alloc+0x1e4/0x280
[463.493092]  kmem_cache_alloc_node+0xd6/0x370
[463.493098]  ? copy_process.part.31+0x1c5/0x20d0
[463.493105]  copy_process.part.31+0x1c5/0x20d0
[463.493115]  ? __lock_acquire+0x3cc/0x1550
[463.493121]  ? __switch_to_asm+0x34/0x70
[463.493129]  ? kthread_create_worker_on_cpu+0x70/0x70
[463.493135]  ? finish_task_switch+0x90/0x280
[463.493165]  _do_fork+0xe0/0x6d0
[463.493191]  ? kthreadd+0x19f/0x220
[463.493233]  kernel_thread+0x25/0x30
[463.493235]  kthreadd+0x1bf/0x220
[463.493242]  ? kthread_create_on_cpu+0x90/0x90
[463.493248]  ret_from_fork+0x3a/0x50
[463.493279] Mem-Info:
[463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0
[463.493285]  active_file:80216 inactive_file:80107 isolated_file:435
[463.493285]  unevictable:0 dirty:51266 writeback:109372 unstable:0
[463.493285]  slab_reclaimable:31191 slab_unreclaimable:3483521
[463.493285]  mapped:526 shmem:4903 pagetables:1759 bounce:0
[463.493285]  free:33623 free_pcp:2392 free_cma:0
...
[463.493489] Unreclaimable slab info:
[463.493513] Name                      Used          Total
[463.493522] bio-6                   1028KB       1028KB
[463.493525] bio-5                   1028KB       1028KB
[463.493528] dm_snap_pending_exception     236783KB     243789KB
[463.493531] dm_exception              41KB         42KB
[463.493534] bio-4                   1216KB       1216KB
[463.493537] bio-3                 439396KB     439396KB
[463.493539] kcopyd_job           6973427KB    6973427KB
...
[463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child
[463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB
[463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Moreover, issuing a large number of kcopyd jobs results in kcopyd
hogging the CPU, while processing them. As a result, processing of work
items, queued for execution on the same CPU as the currently running
kcopyd thread, is stalled for long periods of time, hurting performance.
Running the aforementioned test we get, in dmesg, messages like the
following:

[67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s!
[67501.195586] Showing busy workqueues and worker pools:
[67501.195591] workqueue events: flags=0x0
[67501.195597]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195611]     pending: cache_reap
[67501.195641] workqueue mm_percpu_wq: flags=0x8
[67501.195645]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195656]     pending: vmstat_update
[67501.195682] workqueue kblockd: flags=0x18
[67501.195687]   pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256
[67501.195698]     pending: blk_timeout_work
[67501.195753] workqueue kcopyd: flags=0x8
[67501.195757]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195768]     pending: do_work [dm_mod]
[67501.195802] workqueue kcopyd: flags=0x8
[67501.195806]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195817]     pending: do_work [dm_mod]
[67501.195834] workqueue kcopyd: flags=0x8
[67501.195838]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195848]     pending: do_work [dm_mod]
[67501.195881] workqueue kcopyd: flags=0x8
[67501.195885]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195896]     pending: do_work [dm_mod]
[67501.195920] workqueue kcopyd: flags=0x8
[67501.195924]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256
[67501.195935]     in-flight: 67:do_work [dm_mod]
[67501.195945]     pending: do_work [dm_mod]
[67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765

The root cause for these issues is the way dm-snapshot uses kcopyd. In
particular, the lack of an explicit or implicit limit to the maximum
number of in-flight COW jobs. The merging path is not affected because
it implicitly limits the in-flight kcopyd jobs to one.

Fix these issues by using a semaphore to limit the maximum number of
in-flight kcopyd jobs. We grab the semaphore before allocating a new
kcopyd job in start_copy() and start_full_bio() and release it after the
job finishes in copy_callback().

The initial semaphore value is configurable through a module parameter,
to allow fine tuning the maximum number of in-flight COW jobs. Setting
this parameter to zero initializes the semaphore to INT_MAX.

A default value of 2048 maximum in-flight kcopyd jobs was chosen. This
value was decided experimentally as a trade-off between memory
consumption, stalling the kernel's workqueues and maintaining a high
enough throughput.

Re-running the aforementioned test:

  * Workqueue stalls are eliminated
  * kcopyd's job slab cache uses a maximum of 130MB
  * The time taken by the test to write to the snapshot-origin target is
    reduced from 05m20.48s to 03m26.38s

[1] https://github.com/jthornber/device-mapper-test-suite

Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue Feb 15, 2019
[ Upstream commit 4f4b374 ]

This is the much more correct fix for my earlier attempt at:

https://lkml.org/lkml/2018/12/10/118

Short recap:

- There's not actually a locking issue, it's just lockdep being a bit
  too eager to complain about a possible deadlock.

- Contrary to what I claimed the real problem is recursion on
  kn->count. Greg pointed me at sysfs_break_active_protection(), used
  by the scsi subsystem to allow a sysfs file to unbind itself. That
  would be a real deadlock, which isn't what's happening here. Also,
  breaking the active protection means we'd need to manually handle
  all the lifetime fun.

- With Rafael we discussed the task_work approach, which kinda works,
  but has two downsides: It's a functional change for a lockdep
  annotation issue, and it won't work for the bind file (which needs
  to get the errno from the driver load function back to userspace).

- Greg also asked why this never showed up: To hit this you need to
  unregister a 2nd driver from the unload code of your first driver. I
  guess only gpus do that. The bug has always been there, but only
  with a recent patch series did we add more locks so that lockdep
  built a chain from unbinding the snd-hda driver to the
  acpi_video_unregister call.

Full lockdep splat:

[12301.898799] ============================================
[12301.898805] WARNING: possible recursive locking detected
[12301.898811] 4.20.0-rc7+ raspberrypi#84 Not tainted
[12301.898815] --------------------------------------------
[12301.898821] bash/5297 is trying to acquire lock:
[12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80
[12301.898841] but task is already holding lock:
[12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
[12301.898856] other info that might help us debug this:
[12301.898862]  Possible unsafe locking scenario:
[12301.898867]        CPU0
[12301.898870]        ----
[12301.898874]   lock(kn->count#39);
[12301.898879]   lock(kn->count#39);
[12301.898883] *** DEADLOCK ***
[12301.898891]  May be due to missing lock nesting notation
[12301.898899] 5 locks held by bash/5297:
[12301.898903]  #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0
[12301.898915]  #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190
[12301.898925]  #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
[12301.898936]  #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240
[12301.898950]  #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40
[12301.898960] stack backtrace:
[12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ raspberrypi#84
[12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
[12301.898982] Call Trace:
[12301.898989]  dump_stack+0x67/0x9b
[12301.898997]  __lock_acquire+0x6ad/0x1410
[12301.899003]  ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899010]  ? find_held_lock+0x2d/0x90
[12301.899017]  ? mutex_spin_on_owner+0xe4/0x150
[12301.899023]  ? find_held_lock+0x2d/0x90
[12301.899030]  ? lock_acquire+0x90/0x180
[12301.899036]  lock_acquire+0x90/0x180
[12301.899042]  ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899049]  __kernfs_remove+0x296/0x310
[12301.899055]  ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899060]  ? kernfs_name_hash+0xd/0x80
[12301.899066]  ? kernfs_find_ns+0x6c/0x100
[12301.899073]  kernfs_remove_by_name_ns+0x3b/0x80
[12301.899080]  bus_remove_driver+0x92/0xa0
[12301.899085]  acpi_video_unregister+0x24/0x40
[12301.899127]  i915_driver_unload+0x42/0x130 [i915]
[12301.899160]  i915_pci_remove+0x19/0x30 [i915]
[12301.899169]  pci_device_remove+0x36/0xb0
[12301.899176]  device_release_driver_internal+0x185/0x240
[12301.899183]  unbind_store+0xaf/0x180
[12301.899189]  kernfs_fop_write+0x104/0x190
[12301.899195]  __vfs_write+0x31/0x180
[12301.899203]  ? rcu_read_lock_sched_held+0x6f/0x80
[12301.899209]  ? rcu_sync_lockdep_assert+0x29/0x50
[12301.899216]  ? __sb_start_write+0x13c/0x1a0
[12301.899221]  ? vfs_write+0x17f/0x1b0
[12301.899227]  vfs_write+0xb9/0x1b0
[12301.899233]  ksys_write+0x50/0xc0
[12301.899239]  do_syscall_64+0x4b/0x180
[12301.899247]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[12301.899253] RIP: 0033:0x7f452ac7f7a4
[12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
[12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4
[12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001
[12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730
[12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d
[12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d

Looking around I've noticed that usb and i2c already handle similar
recursion problems, where a sysfs file can unbind the same type of
sysfs somewhere else in the hierarchy. Relevant commits are:

commit 356c05d
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Mon May 14 13:30:03 2012 -0400

    sysfs: get rid of some lockdep false positives

commit e9b526f
Author: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Date:   Fri May 17 14:56:35 2013 +0200

    i2c: suppress lockdep warning on delete_device

Implement the same trick for driver bind/unbind.

v2: Put the macro into bus.c (Greg).

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Arend van Spriel <aspriel@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue Feb 15, 2019
commit aff9cf5 upstream.

We were experiencing a crash similar to the one reported as part of
commit:a5ba1d95e46e ("uart: fix race between uart_put_char() and
uart_shutdown()") in our testbed as well. We continue to observe the same
crash after integrating the commit a5ba1d9 ("uart: fix race between
uart_put_char() and uart_shutdown()")

On reviewing the change, the port lock should be taken prior to checking for
if (!circ->buf) in fn. __uart_put_char and other fns. that update the buffer
uart_state->xmit.

Traceback:

[11/27/2018 06:24:32.4870] Unable to handle kernel NULL pointer dereference
                           at virtual address 0000003b

[11/27/2018 06:24:32.4950] PC is at memcpy+0x48/0x180
[11/27/2018 06:24:32.4950] LR is at uart_write+0x74/0x120
[11/27/2018 06:24:32.4950] pc : [<ffffffc0002e6808>]
                           lr : [<ffffffc0003747cc>] pstate: 000001c5
[11/27/2018 06:24:32.4950] sp : ffffffc076433d30
[11/27/2018 06:24:32.4950] x29: ffffffc076433d30 x28: 0000000000000140
[11/27/2018 06:24:32.4950] x27: ffffffc0009b9d5e x26: ffffffc07ce36580
[11/27/2018 06:24:32.4950] x25: 0000000000000000 x24: 0000000000000140
[11/27/2018 06:24:32.4950] x23: ffffffc000891200 x22: ffffffc01fc34000
[11/27/2018 06:24:32.4950] x21: 0000000000000fff x20: 0000000000000076
[11/27/2018 06:24:32.4950] x19: 0000000000000076 x18: 0000000000000000
[11/27/2018 06:24:32.4950] x17: 000000000047cf08 x16: ffffffc000099e68
[11/27/2018 06:24:32.4950] x15: 0000000000000018 x14: 776d726966205948
[11/27/2018 06:24:32.4950] x13: 50203a6c6974755f x12: 74647075205d3333
[11/27/2018 06:24:32.4950] x11: 3a35323a36203831 x10: 30322f37322f3131
[11/27/2018 06:24:32.4950] x9 : 5b205d303638342e x8 : 746164206f742070
[11/27/2018 06:24:32.4950] x7 : 7520736920657261 x6 : 000000000000003b
[11/27/2018 06:24:32.4950] x5 : 000000000000817a x4 : 0000000000000008
[11/27/2018 06:24:32.4950] x3 : 2f37322f31312a5b x2 : 000000000000006e
[11/27/2018 06:24:32.4950] x1 : ffffffc0009b9cf0 x0 : 000000000000003b

[11/27/2018 06:24:32.4950] CPU2: stopping
[11/27/2018 06:24:32.4950] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P      D    O    4.1.51 #3
[11/27/2018 06:24:32.4950] Hardware name: Broadcom-v8A (DT)
[11/27/2018 06:24:32.4950] Call trace:
[11/27/2018 06:24:32.4950] [<ffffffc0000883b8>] dump_backtrace+0x0/0x150
[11/27/2018 06:24:32.4950] [<ffffffc00008851c>] show_stack+0x14/0x20
[11/27/2018 06:24:32.4950] [<ffffffc0005ee810>] dump_stack+0x90/0xb0
[11/27/2018 06:24:32.4950] [<ffffffc00008e844>] handle_IPI+0x18c/0x1a0
[11/27/2018 06:24:32.4950] [<ffffffc000080c68>] gic_handle_irq+0x88/0x90

Fixes: a5ba1d9 ("uart: fix race between uart_put_char() and uart_shutdown()")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Samir Virmani <samir@embedur.com>
Acked-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@msperl msperl closed this as completed May 11, 2019
msperl pushed a commit that referenced this issue May 11, 2019
commit 9f0bbf3 upstream.

Because there may be random garbage beyond a string's null terminator,
it's not correct to copy the the complete character array for use as a
hist trigger key.  This results in multiple histogram entries for the
'same' string key.

So, in the case of a string key, use strncpy instead of memcpy to
avoid copying in the extra bytes.

Before, using the gdbus entries in the following hist trigger as an
example:

  # echo 'hist:key=comm' > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
  # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist

  ...

  { comm: ImgDecoder #4                      } hitcount:        203
  { comm: gmain                              } hitcount:        213
  { comm: gmain                              } hitcount:        216
  { comm: StreamTrans raspberrypi#73                    } hitcount:        221
  { comm: mozStorage #3                      } hitcount:        230
  { comm: gdbus                              } hitcount:        233
  { comm: StyleThread#5                      } hitcount:        253
  { comm: gdbus                              } hitcount:        256
  { comm: gdbus                              } hitcount:        260
  { comm: StyleThread#4                      } hitcount:        271

  ...

  # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
  51

After:

  # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
  1

Link: http://lkml.kernel.org/r/50c35ae1267d64eee975b8125e151e600071d4dc.1549309756.git.tom.zanussi@linux.intel.com

Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 79e577c ("tracing: Support string type key properly")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
msperl pushed a commit that referenced this issue May 11, 2019
commit 9951379 upstream.

Some users see panics like the following when performing fstrim on a
bcached volume:

[  529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  530.183928] #PF error: [normal kernel read fault]
[  530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0
[  530.750887] Oops: 0000 [#1] SMP PTI
[  530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3
[  531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[  531.693137] RIP: 0010:blk_queue_split+0x148/0x620
[  531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48
8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3
[  532.838634] RSP: 0018:ffffb9b708df39b0 EFLAGS: 00010246
[  533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000
[  533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000
[  533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000
[  534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000
[  534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020
[  534.833319] FS:  00007efec26e3840(0000) GS:ffff940d1f480000(0000) knlGS:0000000000000000
[  535.224098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0
[  535.851759] Call Trace:
[  535.970308]  ? mempool_alloc_slab+0x15/0x20
[  536.174152]  ? bch_data_insert+0x42/0xd0 [bcache]
[  536.403399]  blk_mq_make_request+0x97/0x4f0
[  536.607036]  generic_make_request+0x1e2/0x410
[  536.819164]  submit_bio+0x73/0x150
[  536.980168]  ? submit_bio+0x73/0x150
[  537.149731]  ? bio_associate_blkg_from_css+0x3b/0x60
[  537.391595]  ? _cond_resched+0x1a/0x50
[  537.573774]  submit_bio_wait+0x59/0x90
[  537.756105]  blkdev_issue_discard+0x80/0xd0
[  537.959590]  ext4_trim_fs+0x4a9/0x9e0
[  538.137636]  ? ext4_trim_fs+0x4a9/0x9e0
[  538.324087]  ext4_ioctl+0xea4/0x1530
[  538.497712]  ? _copy_to_user+0x2a/0x40
[  538.679632]  do_vfs_ioctl+0xa6/0x600
[  538.853127]  ? __do_sys_newfstat+0x44/0x70
[  539.051951]  ksys_ioctl+0x6d/0x80
[  539.212785]  __x64_sys_ioctl+0x1a/0x20
[  539.394918]  do_syscall_64+0x5a/0x110
[  539.568674]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

We have observed it where both:
1) LVM/devmapper is involved (bcache backing device is LVM volume) and
2) writeback cache is involved (bcache cache_mode is writeback)

On one machine, we can reliably reproduce it with:

 # echo writeback > /sys/block/bcache0/bcache/cache_mode
   (not sure whether above line is required)
 # mount /dev/bcache0 /test
 # for i in {0..10}; do
	file="$(mktemp /test/zero.XXX)"
	dd if=/dev/zero of="$file" bs=1M count=256
	sync
	rm $file
    done
  # fstrim -v /test

Observing this with tracepoints on, we see the following writes:

fstrim-18019 [022] .... 91107.302026: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4260112 + 196352 hit 0 bypass 1
fstrim-18019 [022] .... 91107.302050: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4456464 + 262144 hit 0 bypass 1
fstrim-18019 [022] .... 91107.302075: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4718608 + 81920 hit 0 bypass 1
fstrim-18019 [022] .... 91107.302094: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5324816 + 180224 hit 0 bypass 1
fstrim-18019 [022] .... 91107.302121: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5505040 + 262144 hit 0 bypass 1
fstrim-18019 [022] .... 91107.302145: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5767184 + 81920 hit 0 bypass 1
fstrim-18019 [022] .... 91107.308777: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 6373392 + 180224 hit 1 bypass 0
<crash>

Note the final one has different hit/bypass flags.

This is because in should_writeback(), we were hitting a case where
the partial stripe condition was returning true and so
should_writeback() was returning true early.

If that hadn't been the case, it would have hit the would_skip test, and
as would_skip == s->iop.bypass == true, should_writeback() would have
returned false.

Looking at the git history from 'commit 72c2706 ("bcache: Write out
full stripes")', it looks like the idea was to optimise for raid5/6:

       * If a stripe is already dirty, force writes to that stripe to
	 writeback mode - to help build up full stripes of dirty data

To fix this issue, make sure that should_writeback() on a discard op
never returns true.

More details of debugging:
https://www.spinics.net/lists/linux-bcache/msg06996.html

Previous reports:
 - https://bugzilla.kernel.org/show_bug.cgi?id=201051
 - https://bugzilla.kernel.org/show_bug.cgi?id=196103
 - https://www.spinics.net/lists/linux-bcache/msg06885.html

(Coly Li: minor modification to follow maximum 75 chars per line rule)

Cc: Kent Overstreet <koverstreet@google.com>
Cc: stable@vger.kernel.org
Fixes: 72c2706 ("bcache: Write out full stripes")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
msperl pushed a commit that referenced this issue May 11, 2019
…FCP devices

commit 242ec14 upstream.

Suppose more than one non-NPIV FCP device is active on the same channel.
Send I/O to storage and have some of the pending I/O run into a SCSI
command timeout, e.g. due to bit errors on the fibre. Now the error
situation stops. However, we saw FCP requests continue to timeout in the
channel. The abort will be successful, but the subsequent TUR fails.
Scsi_eh starts. The LUN reset fails. The target reset fails.  The host
reset only did an FCP device recovery. However, for non-NPIV FCP devices,
this does not close and reopen ports on the SAN-side if other non-NPIV FCP
device(s) share the same open ports.

In order to resolve the continuing FCP request timeouts, we need to
explicitly close and reopen ports on the SAN-side.

This was missing since the beginning of zfcp in v2.6.0 history commit
ea127f9 ("[PATCH] s390 (7/7): zfcp host adapter.").

Note: The FSF requests for forced port reopen could run into FSF request
timeouts due to other reasons. This would trigger an internal FCP device
recovery. Pending forced port reopen recoveries would get dismissed. So
some ports might not get fully reopened during this host reset handler.
However, subsequent I/O would trigger the above described escalation and
eventually all ports would be forced reopen to resolve any continuing FCP
request timeouts due to earlier bit errors.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #3.0+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
msperl pushed a commit that referenced this issue May 11, 2019
commit 23da958 upstream.

Syzkaller reports:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599
Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91
RSP: 0018:ffff8881d828f238 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267
RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178
RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259
R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4
R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000
FS:  00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629
 get_subdir fs/proc/proc_sysctl.c:1022 [inline]
 __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
 br_netfilter_init+0xbc/0x1000 [br_netfilter]
 do_one_initcall+0xfa/0x5ca init/main.c:887
 do_init_module+0x204/0x5f6 kernel/module.c:3460
 load_module+0x66b2/0x8570 kernel/module.c:3808
 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc
R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle
 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73]
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace 770020de38961fd0 ]---

A new dir entry can be created in get_subdir and its 'header->parent' is
set to NULL.  Only after insert_header success, it will be set to 'dir',
otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
However in err handling path of get_subdir, drop_sysctl_table also be
called on 'new->header' regardless its value of parent pointer.  Then
put_links is called, which triggers NULL-ptr deref when access member of
header->parent.

In fact we have multiple error paths which call drop_sysctl_table() there,
upon failure on insert_links() we also call drop_sysctl_table().And even
in the successful case on __register_sysctl_table() we still always call
drop_sysctl_table().This patch fix it.

Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com
Fixes: 0e47c99 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>    [3.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
msperl pushed a commit that referenced this issue May 11, 2019
[ Upstream commit aadcef6 ]

As Jiqun Li reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=202883

sometimes, dead lock when make system call SYS_getdents64 with fsync() is
called by another process.

monkey running on android9.0

1.  task 9785 held sbi->cp_rwsem and waiting lock_page()
2.  task 10349 held mm_sem and waiting sbi->cp_rwsem
3. task 9709 held lock_page() and waiting mm_sem

so this is a dead lock scenario.

task stack is show by crash tools as following

crash_arm64> bt ffffffc03c354080
PID: 9785   TASK: ffffffc03c354080  CPU: 1   COMMAND: "RxIoScheduler-3"
>> #7 [ffffffc01b50fac0] __lock_page at ffffff80081b11e8

crash-arm64> bt 10349
PID: 10349  TASK: ffffffc018b83080  CPU: 1   COMMAND: "BUGLY_ASYNC_UPL"
>> #3 [ffffffc01f8cfa40] rwsem_down_read_failed at ffffff8008a93afc
     PC: 00000033  LR: 00000000  SP: 00000000  PSTATE: ffffffffffffffff

crash-arm64> bt 9709
PID: 9709   TASK: ffffffc03e7f3080  CPU: 1   COMMAND: "IntentService[A"
>> #3 [ffffffc001e67850] rwsem_down_read_failed at ffffff8008a93afc
>> #8 [ffffffc001e67b80] el1_ia at ffffff8008084fc4
     PC: ffffff8008274114  [compat_filldir64+120]
     LR: ffffff80083584d4  [f2fs_fill_dentries+448]
     SP: ffffffc001e67b80  PSTATE: 80400145
    X29: ffffffc001e67b80  X28: 0000000000000000  X27: 000000000000001a
    X26: 00000000000093d7  X25: ffffffc070d52480  X24: 0000000000000008
    X23: 0000000000000028  X22: 00000000d43dfd60  X21: ffffffc001e67e90
    X20: 0000000000000011  X19: ffffff80093a4000  X18: 0000000000000000
    X17: 0000000000000000  X16: 0000000000000000  X15: 0000000000000000
    X14: ffffffffffffffff  X13: 0000000000000008  X12: 0101010101010101
    X11: 7f7f7f7f7f7f7f7f  X10: 6a6a6a6a6a6a6a6a   X9: 7f7f7f7f7f7f7f7f
     X8: 0000000080808000   X7: ffffff800827409c   X6: 0000000080808000
     X5: 0000000000000008   X4: 00000000000093d7   X3: 000000000000001a
     X2: 0000000000000011   X1: ffffffc070d52480   X0: 0000000000800238
>> #9 [ffffffc001e67be0] f2fs_fill_dentries at ffffff80083584d0
     PC: 0000003c  LR: 00000000  SP: 00000000  PSTATE: 000000d9
    X12: f48a02ff X11: d4678960 X10: d43dfc00  X9: d4678ae4
     X8: 00000058  X7: d4678994  X6: d43de800  X5: 000000d9
     X4: d43dfc0c  X3: d43dfc10  X2: d46799c8  X1: 00000000
     X0: 00001068

Below potential deadlock will happen between three threads:
Thread A		Thread B		Thread C
- f2fs_do_sync_file
 - f2fs_write_checkpoint
  - down_write(&sbi->node_change) -- 1)
			- do_page_fault
			 - down_write(&mm->mmap_sem) -- 2)
			  - do_wp_page
			   - f2fs_vm_page_mkwrite
						- getdents64
						 - f2fs_read_inline_dir
						  - lock_page -- 3)
  - f2fs_sync_node_pages
   - lock_page -- 3)
			    - __do_map_lock
			     - down_read(&sbi->node_change) -- 1)
						  - f2fs_fill_dentries
						   - dir_emit
						    - compat_filldir64
						     - do_page_fault
						      - down_read(&mm->mmap_sem) -- 2)

Since f2fs_readdir is protected by inode.i_rwsem, there should not be
any updates in inode page, we're safe to lookup dents in inode page
without its lock held, so taking off the lock to improve concurrency
of readdir and avoid potential deadlock.

Reported-by: Jiqun Li <jiqun.li@unisoc.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue May 11, 2019
…_map

[ Upstream commit 39df730 ]

Detected via gcc's ASan:

  Direct leak of 2048 byte(s) in 64 object(s) allocated from:
    6     #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370)
    7     #1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43
    8     #2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85
    9     #3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250
   10     #4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382
   11     #5 0x556b0f0e3231 in print_events util/parse-events.c:2514
   12     #6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
   13     #7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
   14     #8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
   15     #9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
   16     #10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520
   17     #11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 8989605 ("perf tools: Do not put a variable sized type not at the end of a struct")
Link: http://lkml.kernel.org/r/20190316080556.3075-3-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue May 11, 2019
[ Upstream commit 54569ba ]

Detected with gcc's ASan:

  Direct leak of 66 byte(s) in 5 object(s) allocated from:
      #0 0x7ff3b1f32070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      #1 0x560c8761034d in collect_config util/config.c:597
      #2 0x560c8760d9cb in get_value util/config.c:169
      #3 0x560c8760dfd7 in perf_parse_file util/config.c:285
      #4 0x560c8760e0d2 in perf_config_from_file util/config.c:476
      #5 0x560c876108fd in perf_config_set__init util/config.c:661
      #6 0x560c87610c72 in perf_config_set__new util/config.c:709
      #7 0x560c87610d2f in perf_config__init util/config.c:718
      #8 0x560c87610e5d in perf_config util/config.c:730
      #9 0x560c875ddea0 in main /home/changbin/work/linux/tools/perf/perf.c:442
      #10 0x7ff3afb8609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Fixes: 20105ca ("perf config: Introduce perf_config_set class")
Link: http://lkml.kernel.org/r/20190316080556.3075-6-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue May 11, 2019
[ Upstream commit 8bde851 ]

Detected with gcc's ASan:

  Direct leak of 4356 byte(s) in 120 object(s) allocated from:
      #0 0x7ff1a2b5a070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      #1 0x55719aef4814 in build_id_cache__origname util/build-id.c:215
      #2 0x55719af649b6 in print_sdt_events util/parse-events.c:2339
      #3 0x55719af66272 in print_events util/parse-events.c:2542
      #4 0x55719ad1ecaa in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
      #5 0x55719aec745d in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #6 0x55719aec7d1a in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #7 0x55719aec8184 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #8 0x55719aeca41a in main /home/changbin/work/linux/tools/perf/perf.c:520
      #9 0x7ff1a07ae09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 40218da ("perf list: Show SDT and pre-cached events")
Link: http://lkml.kernel.org/r/20190316080556.3075-7-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue May 11, 2019
[ Upstream commit 42dfa45 ]

Using gcc's ASan, Changbin reports:

  =================================================================
  ==7494==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 48 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x5625e5330a5e in zalloc util/util.h:23
      #2 0x5625e5330a9b in perf_counts__new util/counts.c:10
      #3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      #4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      #5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      #6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      #7 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #8 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #10 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 72 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x5625e532560d in zalloc util/util.h:23
      #2 0x5625e532566b in xyarray__new util/xyarray.c:10
      #3 0x5625e5330aba in perf_counts__new util/counts.c:15
      #4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      #5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      #6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      #7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      #8 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #9 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #11 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

His patch took care of evsel->prev_raw_counts, but the above backtraces
are about evsel->counts, so fix that instead.

Reported-by: Changbin Du <changbin.du@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue May 11, 2019
…_event_on_all_cpus test

[ Upstream commit 93faa52 ]

  =================================================================
  ==7497==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      #1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45
      #2 0x5625e5326703 in cpu_map__read util/cpumap.c:103
      #3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120
      #4 0x5625e5326915 in cpu_map__new util/cpumap.c:135
      #5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36
      #6 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #7 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #9 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object")
Link: http://lkml.kernel.org/r/20190316080556.3075-15-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue May 11, 2019
[ Upstream commit f97a899 ]

  =================================================================
  ==7506==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 13 byte(s) in 3 object(s) allocated from:
      #0 0x7f03339d6070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      #1 0x5625e53aaef0 in expr__find_other util/expr.y:221
      #2 0x5625e51bcd3f in test__expr tests/expr.c:52
      #3 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #4 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #5 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #6 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #7 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #8 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #9 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #10 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #11 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 0751673 ("perf tools: Add a simple expression parser for JSON")
Link: http://lkml.kernel.org/r/20190316080556.3075-16-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue May 11, 2019
[ Upstream commit d982b33 ]

  =================================================================
  ==20875==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 1160 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x55bd50005599 in zalloc util/util.h:23
      #2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
      #3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
      #4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
      #5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
      #6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
      #7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
      #8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
      #9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 19 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      #1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
msperl pushed a commit that referenced this issue May 11, 2019
commit 08b7c2f upstream.

If `vmk80xx_auto_attach()` returns an error, the core comedi module code
will call `vmk80xx_detach()` to clean up.  If `vmk80xx_auto_attach()`
successfully allocated the comedi device private data,
`vmk80xx_detach()` assumes that a `struct semaphore limit_sem` contained
in the private data has been initialized and uses it.  Unfortunately,
there are a couple of places where `vmk80xx_auto_attach()` can return an
error after allocating the device private data but before initializing
the semaphore, so this assumption is invalid.  Fix it by initializing
the semaphore just after allocating the private data in
`vmk80xx_auto_attach()` before any other errors can be returned.

I believe this was the cause of the following syzbot crash report
<https://syzkaller.appspot.com/bug?extid=54c2f58f15fe6876b6ad>:

usb 1-1: config 0 has no interface number 0
usb 1-1: New USB device found, idVendor=10cf, idProduct=8068, bcdDevice=e6.8d
usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 1-1: config 0 descriptor??
vmk80xx 1-1:0.117: driver 'vmk80xx' failed to auto-configure device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xe8/0x16e lib/dump_stack.c:113
 assign_lock_key kernel/locking/lockdep.c:786 [inline]
 register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095
 __lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582
 lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x44/0x60 kernel/locking/spinlock.c:152
 down+0x12/0x80 kernel/locking/semaphore.c:58
 vmk80xx_detach+0x59/0x100 drivers/staging/comedi/drivers/vmk80xx.c:829
 comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204
 comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156
 comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline]
 comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190
 comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline]
 comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880
 comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068
 usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
 really_probe+0x2da/0xb10 drivers/base/dd.c:509
 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
 __device_attach+0x223/0x3a0 drivers/base/dd.c:844
 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
 device_add+0xad2/0x16e0 drivers/base/core.c:2106
 usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
 generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
 usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
 really_probe+0x2da/0xb10 drivers/base/dd.c:509
 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
 __device_attach+0x223/0x3a0 drivers/base/dd.c:844
 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
 device_add+0xad2/0x16e0 drivers/base/core.c:2106
 usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
 hub_port_connect drivers/usb/core/hub.c:5089 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
 port_event drivers/usb/core/hub.c:5350 [inline]
 hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
 process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
 worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
 kthread+0x313/0x420 kernel/kthread.c:253
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Reported-by: syzbot+54c2f58f15fe6876b6ad@syzkaller.appspotmail.com
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
msperl pushed a commit that referenced this issue May 11, 2019
commit 660cf4c upstream.

If `ni6501_auto_attach()` returns an error, the core comedi module code
will call `ni6501_detach()` to clean up.  If `ni6501_auto_attach()`
successfully allocated the comedi device private data, `ni6501_detach()`
assumes that a `struct mutex mut` contained in the private data has been
initialized and uses it.  Unfortunately, there are a couple of places
where `ni6501_auto_attach()` can return an error after allocating the
device private data but before initializing the mutex, so this
assumption is invalid.  Fix it by initializing the mutex just after
allocating the private data in `ni6501_auto_attach()` before any other
errors can be retturned.  Also move the call to `usb_set_intfdata()`
just to keep the code a bit neater (either position for the call is
fine).

I believe this was the cause of the following syzbot crash report
<https://syzkaller.appspot.com/bug?extid=cf4f2b6c24aff0a3edf6>:

usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 1-1: config 0 descriptor??
usb 1-1: string descriptor 0 read error: -71
comedi comedi0: Wrong number of endpoints
ni6501 1-1:0.233: driver 'ni6501' failed to auto-configure device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 585 Comm: kworker/0:3 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xe8/0x16e lib/dump_stack.c:113
 assign_lock_key kernel/locking/lockdep.c:786 [inline]
 register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095
 __lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582
 lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211
 __mutex_lock_common kernel/locking/mutex.c:925 [inline]
 __mutex_lock+0xfe/0x12b0 kernel/locking/mutex.c:1072
 ni6501_detach+0x5b/0x110 drivers/staging/comedi/drivers/ni_usb6501.c:567
 comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204
 comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156
 comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline]
 comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190
 comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline]
 comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880
 comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068
 usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
 really_probe+0x2da/0xb10 drivers/base/dd.c:509
 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
 __device_attach+0x223/0x3a0 drivers/base/dd.c:844
 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
 device_add+0xad2/0x16e0 drivers/base/core.c:2106
 usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
 generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
 usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
 really_probe+0x2da/0xb10 drivers/base/dd.c:509
 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
 __device_attach+0x223/0x3a0 drivers/base/dd.c:844
 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
 device_add+0xad2/0x16e0 drivers/base/core.c:2106
 usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
 hub_port_connect drivers/usb/core/hub.c:5089 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
 port_event drivers/usb/core/hub.c:5350 [inline]
 hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
 process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
 worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
 kthread+0x313/0x420 kernel/kthread.c:253
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Reported-by: syzbot+cf4f2b6c24aff0a3edf6@syzkaller.appspotmail.com
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hisenyiu2015 pushed a commit to hisenyiu2015/msm-4.14 that referenced this issue May 20, 2021
Add '#sound-dai-cells = <0>;' to the I2S definition in
bcm2708_common.dtsi

Not having it specified, whilst not causing an issue right now with
rpi-4.4.y, is going to cause an issue going forward with the use of
simple-card driver. So it doesn't fall through the cracks, patch it
in now.

Hopefully Martin has taken care of getting a patch submitted for the
upstream Pi dts, as it was he who first run into the issue with the
current upstream kernel....
msperl/linux-rpi#3 (comment)

Signed-off-by: DigitalDreamtime <clive.messer@digitaldreamtime.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants