-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BCM270X_DT: mz61581: Revert to spi-bcm2708 #1077
Conversation
When an init sequence is present in the Device Tree, fbtft_init_display_dt() is used to initialize the display. Add missing reset function call and activation of chip select for parallel bus. Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
The MZ61581 display does not work with spi-bcm2835 and software chip select. It works before the commit: spi: bcm2835: transform native-cs to gpio-cs on first spi_setup Revert to spi-bcm2708 until the cause has been detected and the issue resolved. Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Okay with me. @pelwell ? Ideally we would like to drop spi-bcm2708 (and other drivers that have an upstream alternative that appears fully functional), but I guess that won't happen for a while (perhaps rpi-4.2.y kernel tree?) |
BCM270X_DT: mz61581: Revert to spi-bcm2708
I can inform the manufacturer of the display that spi-bcm2708 probably will be removed the next time the Linux version is bumped on the official Pi kernel. That will give them some time to find out why it doesn't work with software controlled chip select. |
From what I remember, the upstream spi-bcm2835 doesn't have the auto-conversion patch. If the default is to always use cs-gpio in the dtb, and I'm not saying it should be, then there will need to be a way to defeat that with an overlay. |
@msperl Can spi-bcm2835 still be used with hw chip selects? Also with DMA? |
As explained there are situations when the CS gets asserted/de-asserted for short periods (a few usec) of time when the control register is written to. Does the issue also happen when the dt already sets up pins 8 and 7 correctly as output and the spi driver node contains "CS-gpios = <&gpio 8>, <&gpio 7>;"? |
Yes. I tried that when I was trying to understand the spi cs gpio code. |
If I had such a device then I could look into the why comparing the behavior on the bus using a logic-analyzer... Maybe it is related to a pull-up issue in the hw? The only difference I have seen is that the Native gpio has slightly less time between CS and clock-start. So nothing else comes to my mind that could explain this except maybe a different interpretation of a spi-transfer sequence with regards to chip-selects in the framework compared to the 2708 driver. Another thing: running at slower clocks solves the issue by chance? |
Out of curiosity: why not use a "downgrade to spi-bcm2708" device-tree overlay for this specific device? But obviously we also need to understand the issue and figure out why it does not work... |
Have you read this Pull Request? 8-) |
Ok, that's a place to start, adding a small delay after the gpio set in spi_set_cs().
No it doesn't.
This PR does that :-) |
As said: if you can have a look at the clocks and CS with a logic-analyzer then look into what is the difference on the bus-level (I can do this at up to 200MHz sample-rates if someone borrows me that device)... But as said: I actually guess that the implementation behavior is slightly different for the drivers with regards to cs_change and the corresponding delays (actually between spi-bcm2835 and the spi-framework standard implementation) |
One thing: is there any data that you read back from the device or can I also run this "blind" (similar to the st7739r I own). |
None of the FBTFT drivers rely on anything read back. I think that only one driver tries to read the device code if possible. |
One idea: maybe the switching the gpio 7/8 to output does not remove the internal pull-up which would shift the low level up higher than with the alt2 mode (which would also control the gpio pull-ups I assume) - the cpld may run 1.8v internally but may be 3v3 tolerant, so with low guaranteed to say 0.2 vcc that would mean 0.3v as low limit and with the pull-up we may not get to that level. So the test could be:
If it works that way then we know it is the pull-ups and you got your way around it (and then I would be actually be surprised that the gpio code allows to change to output without disabling pull-ups). |
Neither the DT change, nor adding delay around CS helped. Also tried going down to 4MHz. For reference, this is what I tried (tested ok on a rpi-display). diff --git a/arch/arm/boot/dts/bcm2708-rpi-b-plus.dts b/arch/arm/boot/dts/bcm2708-rpi-b-plus.dts
index 3ad2e0d..8faea0c 100644
--- a/arch/arm/boot/dts/bcm2708-rpi-b-plus.dts
+++ b/arch/arm/boot/dts/bcm2708-rpi-b-plus.dts
@@ -10,7 +10,8 @@
&gpio {
spi0_pins: spi0_pins {
brcm,pins = <7 8 9 10 11>;
- brcm,function = <4>; /* alt0 */
+ brcm,function = <1 1 4 4 4>; /* out out alt0 alt0 alt0 */
+ brcm,pull = <0 0 1 1 1>; /* none none down down down */
};
i2c0_pins: i2c0 {
@@ -46,6 +47,8 @@
pinctrl-names = "default";
pinctrl-0 = <&spi0_pins>;
+ cs-gpios = <&gpio 8 1>, <&gpio 7 1>;
+
spidev@0{
compatible = "spidev";
reg = <0>; /* CE0 */
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 50910d8..7d579a0 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -464,8 +464,11 @@ static void spi_set_cs(struct spi_device *spi, bool enable)
if (spi->mode & SPI_CS_HIGH)
enable = !enable;
- if (spi->cs_gpio >= 0)
+ if (spi->cs_gpio >= 0) {
+udelay(10);
gpio_set_value(spi->cs_gpio, !enable);
+udelay(10);
+ }
else if (spi->master->set_cs)
spi->master->set_cs(spi, !enable);
} Thanks for the suggestions. |
I would try to figure it out... |
I will share my findings with you and you can compare the measurements yourself as well... |
kernel: BCM2835_V4L2: Add support for V4L2_EXPOSURE_METERING_MATRIX See: raspberrypi/linux#1068 kernel: dmaengine: bcm2708-dmaengine: Fix memory leak when stopping a running transfer See: raspberrypi/linux#1072 kernel: BCM270X_DT: mz61581: Revert to spi-bcm2708 See: raspberrypi/linux#1077 kernel: bcm2708/2835-i2s: Fix for PCM register ranges in device trees See: raspberrypi/linux#1079 kernel: bcm2835-sdhost: Add the ERASE capability See: raspberrypi/linux#1076 kernel: bcm2835-sdhost: Ignore CRC7 for MMC CMD1 kernel: BCM270X_DT: Add unit address to gpio node name kernel: spi-bcm2835: merge upstream patches allowing DMA transfers See: raspberrypi/linux#1085 kernel: BCM270X_DT: Use i2c_arm for rtc and bmp085 overlays kernel: BCM2708_DT: CM dtparams for audio, watchdog and RNG firmware: video_decode: Don't wait for a valid timestamp to output frames See: raspberrypi/firmware#451
kernel: BCM2835_V4L2: Add support for V4L2_EXPOSURE_METERING_MATRIX See: raspberrypi/linux#1068 kernel: dmaengine: bcm2708-dmaengine: Fix memory leak when stopping a running transfer See: raspberrypi/linux#1072 kernel: BCM270X_DT: mz61581: Revert to spi-bcm2708 See: raspberrypi/linux#1077 kernel: bcm2708/2835-i2s: Fix for PCM register ranges in device trees See: raspberrypi/linux#1079 kernel: bcm2835-sdhost: Add the ERASE capability See: raspberrypi/linux#1076 kernel: bcm2835-sdhost: Ignore CRC7 for MMC CMD1 kernel: BCM270X_DT: Add unit address to gpio node name kernel: spi-bcm2835: merge upstream patches allowing DMA transfers See: raspberrypi/linux#1085 kernel: BCM270X_DT: Use i2c_arm for rtc and bmp085 overlays kernel: BCM2708_DT: CM dtparams for audio, watchdog and RNG firmware: video_decode: Don't wait for a valid timestamp to output frames See: #451
Now that I run the logic-analyzer I see that you use:
And then the transfer on this changes the initial settings... Would it be possible to set the first transfer to a dummy command and see if it works then? I would guess that just running:
would also do the trick... If it works then, then it is actually a bug in the spi-framework... (in that case I will explain why later - essentially something that has bitten me/us with CS-polarity before) |
Please make sure that you only test with a single spi-device! |
Reloading the module revealed a bug in fbtft which results in using a pointer to free'd memory. I thought I had fixed that issue. I need to build a kernel. |
To summarize the issue - as I see it: That is why I was asking if setting up a single dummy transfer solves the issue. Why does this not happen with native-CS? The reason is that in that case the polarities and chip-selects are set at the same time when writing the CS register. As for a fix it will require a patch, that moves the setup of polarity to master->prepare_master to set polarity prior to setting gpio-CS. |
@notro: sent you a (yet unverified) patch that should solve the polarity issue by moving the polarity settings to run before the chip-select not after. The patch applies cleanly against upstream (4.2-rc3), but there may be a reject if you apply it against the foundation kernel because of handle_err does not exist in 4.0 |
Sent the patch upstream - when it gets accepted I will create a merge request |
Spot on Martin! Your patch solved the problem as did reloading the driver (I had to apply the patch by hand on rpi-4.0.y). I tried enabling DMA, but I only got polling transfers. When I disabled polling, I got DMA transfers for pixel data.
Ref: https://github.com/notro/fbtft/wiki/Performance#mz61581-pi-ext |
Just a note: If you try the performance output on any other than rpi-4.0.y, you need to use debug=35, because of a fbtft reset patch recently applied there. If not you get a white display. |
I can try to run the same performance tests as you to see where/when we waste time with the logic-analyzer (especially the gaps)... One note: please try to use a bigger DMA buffer than 4k - 32k for example should work and compare it then... (I guess the spi framework mapping of pages is quite expensive and then dmaengine on top - the original dma driver did not have that but did everything on its own) |
As a note: when cleaning you run 51.3us between each 4k transfer with the default settings and 29.3us are spent with cs high. so there is still some interrupts (double maybe?) that run unneccesarily and stall the flow... (see my comment to your upstream patch) And we run 0.358s for a full screen update at 8MHz. |
32k still gives me polling transfers. unsigned long overflows in the equation:
Removing polling, 32k gives comparable values:
fps=50:
The fps parameter is a misnomer but fps=50 translates to a HZ/fps=2 jiffies delay between updates. What upstream patch are you referring to? |
So we need to fix the calculation as well (at least make xfer_time_us long long) |
Here the calculation patch (spaces instead of tabs, so does not apply cleanly)
That should fix the polling issue... |
set out a patch for this for upstream as well... |
I'm not sure xfer_time_us needs to be long long. 32-bit unsigned can handle over an hour. |
the bigger problem is that if I cast and cast back then we get issues with the 64/64 bit divide is not in eapi... |
@notro: as for speed: there does not seem to be a difference between 120MHz and 25MHz - at least from your dmesg reporting:
I will have a look at the output lines with the logic-analyzer to see where the "delay" can get found, |
The fps numbers are used confusingly in fbtft. It first started out as a way to say how often the display should be updated. So the fps DT property and module argument is reflected in the fps value reported when the driver loads: fps=33. This is the default value. Later when displays with bigger resolutions came along, fbtft couldn't always keep up with the value asked for, so I added a debug option to measure the real fps. This fps value is printed to the log for each frame sent to the display when debug=32, or for this particular display debug=35 has to be used because of a bug (fixed in rpi-4.0.y) in fbtft. https://github.com/notro/fbtft/wiki/FPS was written to try and clear up this confusion, but this has not been done in the code. |
The fps=33 here is not the measured frame rate. It's a misnomer that really means: we have a delay of HZ/fps=100/33=3 jiffies from the time of the first page touched in videomem until the update kernel thread is woken. I tried fps=100 (1 jiffy delay) but got the same measured 25 fps as with fps=50 (2 jiffies). This is how fbtft works:
Now SPI takes the buffer, allocates a rx buffer, dma maps buffers and sends them to dma engine. Looking at your numbers I wonder why the overhead goes from 259us to 379us when the speed is doubled. Shouldn't these numbers be more or less the same? Ideally, fbtft should have dynamically switched to a continous update mode when it detects that all of videomem is updated several times in a row. This could have shaved off the 2 jiffies delay. I tried increasing txbuflen to 64k, but that gave me some USB errors, and a lousy measured framerate:
Really nice to have access to an analyzer like this. Much easier to make descisions with access to hard facts. I have returned to working on fbtft now, especially the io path, so this is a helpful walkthrough for me. |
@notro: Why does it vary? I have only taken a single example to show the "situation" and did not go over it taking 10s of 100s of samples. Well, there are lots of factors here - some of them have to do with scheduling and also I know that the first time a code-path is used it is about 70% slower - this has to do with the code being in the different CPU caches - the more often it runs the quicker. So whenever a "thread" is switched CPU everything is slower as it needs to fill the caches... If you want I can run some more samples and I can send you a link to download the data - the SW itself you can get from the producer of the logic analyzer and you can read the files and analyze them... But I guess that this is better happening in a different discussion or offline. |
note that the patch for the "time estimate" has been accepted already. |
Thanks, no need to run more samples. I don't know much about cpu caches and process/thread switching time. Have you tried running the queue with realtime priority (rt=true), what's your experience with that? |
Rt does not make a huge difference any longer on a multi-CPU system. Also the designs of running the spi message pump inline for spi_sync calls when there is no activity makes this somewhat redundant - you would need to run the calling thread with rt priority to make it work. |
kernel: BCM2835_V4L2: Add support for V4L2_EXPOSURE_METERING_MATRIX See: raspberrypi/linux#1068 kernel: dmaengine: bcm2708-dmaengine: Fix memory leak when stopping a running transfer See: raspberrypi/linux#1072 kernel: BCM270X_DT: mz61581: Revert to spi-bcm2708 See: raspberrypi/linux#1077 kernel: bcm2708/2835-i2s: Fix for PCM register ranges in device trees See: raspberrypi/linux#1079 kernel: bcm2835-sdhost: Add the ERASE capability See: raspberrypi/linux#1076 kernel: bcm2835-sdhost: Ignore CRC7 for MMC CMD1 kernel: BCM270X_DT: Add unit address to gpio node name kernel: spi-bcm2835: merge upstream patches allowing DMA transfers See: raspberrypi/linux#1085 kernel: BCM270X_DT: Use i2c_arm for rtc and bmp085 overlays kernel: BCM2708_DT: CM dtparams for audio, watchdog and RNG firmware: video_decode: Don't wait for a valid timestamp to output frames See: raspberrypi#451
Our fbdev setup requires the device to be awake for access through the GTT. If one boots without connected displays and later plugs one in, we won't have any runtime PM references when the fbdev setup runs. Explicitly grab a runtime PM reference during the fbdev setup to avoid the following spew: [ 62.518435] RPM wakelock ref not held during HW access [ 62.518459] ------------[ cut here ]------------ [ 62.518546] WARNING: CPU: 3 PID: 37 at ../drivers/gpu/drm/i915/intel_drv.h:1800 i915_vma_pin_iomap+0x144/0x150 [i915] [ 62.518585] Modules linked in: i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_gtt agpgart netconsole nls_iso8859_1 nls_cp437 vfat fat efi_pstore coretemp hwmon intel_rapl x86_pkg_temp_thermal e1000e efivars ptp pps_core video evdev ip_tables x_tables ipv6 autofs4 [ 62.518741] CPU: 3 PID: 37 Comm: kworker/3:1 Not tainted 4.13.0-rc7-skl+ raspberrypi#1077 [ 62.518770] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017 [ 62.518827] Workqueue: events i915_hotplug_work_func [i915] [ 62.518853] task: ffff88046c00dc00 task.stack: ffffc90000184000 [ 62.518896] RIP: 0010:i915_vma_pin_iomap+0x144/0x150 [i915] [ 62.518919] RSP: 0018:ffffc90000187cc8 EFLAGS: 00010292 [ 62.518942] RAX: 000000000000002a RBX: ffff880460044000 RCX: 0000000000000006 [ 62.518969] RDX: 0000000000000006 RSI: ffffffff819c3e6f RDI: ffffffff819f1c0e [ 62.518996] RBP: ffffc90000187cd8 R08: ffff88046c00e4f0 R09: 0000000000000000 [ 62.519022] R10: ffff8804669ca800 R11: 0000000000000000 R12: ffff880461d20000 [ 62.519049] R13: ffffc90000187d48 R14: ffff880461d20000 R15: ffff880460044000 [ 62.519076] FS: 0000000000000000(0000) GS:ffff88047ed80000(0000) knlGS:0000000000000000 [ 62.519107] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.519130] CR2: 000056478ae213f0 CR3: 0000000002c0f000 CR4: 00000000003406e0 [ 62.519156] Call Trace: [ 62.519190] intelfb_create+0x176/0x360 [i915] [ 62.519216] __drm_fb_helper_initial_config_and_unlock+0x1c7/0x3c0 [drm_kms_helper] [ 62.519251] drm_fb_helper_hotplug_event.part.18+0xac/0xc0 [drm_kms_helper] [ 62.519282] drm_fb_helper_hotplug_event+0x1a/0x20 [drm_kms_helper] [ 62.519324] intel_fbdev_output_poll_changed+0x1a/0x20 [i915] [ 62.519352] drm_kms_helper_hotplug_event+0x27/0x30 [drm_kms_helper] [ 62.519395] i915_hotplug_work_func+0x24e/0x2b0 [i915] [ 62.519420] process_one_work+0x1d3/0x6d0 [ 62.519440] worker_thread+0x4b/0x400 [ 62.519458] ? schedule+0x4a/0x90 [ 62.519475] ? preempt_count_sub+0x97/0xf0 [ 62.519495] kthread+0x114/0x150 [ 62.519511] ? process_one_work+0x6d0/0x6d0 [ 62.519530] ? kthread_create_on_node+0x40/0x40 [ 62.519551] ret_from_fork+0x27/0x40 [ 62.519569] Code: c4 78 e6 e0 0f ff e9 08 ff ff ff 80 3d d5 bc 0c 00 00 0f 85 0b ff ff ff 48 c7 c7 d8 50 32 a0 c6 05 c1 bc 0c 00 01 e8 9d 78 e6 e0 <0f> ff e9 f1 fe ff ff 0f 1f 44 00 00 0f 1f 44 00 00 0f b6 87 98 [ 62.519771] ---[ end trace 5fbe271f991a58ae ]--- Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901195456.6386-1-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The MZ61581 display does not work with spi-bcm2835 and software chip selects. It works before the commit: spi: bcm2835: transform native-cs to gpio-cs on first spi_setup
Revert to spi-bcm2708 until the cause has been detected and the issue resolved.
Also increase reliability by adding a patch for missing display controller reset.
This fbtft patch is now in linux-next and will probably land in 4.3.