Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can vdma loop work ? #31

Closed
tangxu00 opened this issue Nov 19, 2017 · 27 comments
Closed

can vdma loop work ? #31

tangxu00 opened this issue Nov 19, 2017 · 27 comments

Comments

@tangxu00
Copy link

tangxu00 commented Nov 19, 2017

use dma ip ,the driver,examples worked well, but vdma cant:

Z-turn# ./axidma_benchmark -v
AXI DMA Benchmark Parameters:
Transmit Buffer Size: 7.91 Mb
Receive Buffer Size: 7.91 Mb
Number of DMA Transfers: 1000 transfers

Using transmit channel 0 and receive channel 1.
axidma: axidma_dma.c: axidma_start_transfer: 298: VDMA receive transaction timed out.
Failed to perform the AXI DMA read-write transfer: Timer expired

how can i use vdma driver?

@tangxu00 tangxu00 changed the title hello,banchmark wrong can vdma loop work ? Nov 20, 2017
@bperez77
Copy link
Owner

bperez77 commented Nov 21, 2017

The answer to that is maybe. The driver has support for VDMA, but I was never able to get it to work even in a simple loopback mode. As discussed in #15, I believe this may actually be because of a lack of support in the backend Xilinx driver for DMA. However, that may have changed since then. In issue #25, it seems that someone may have gotten AXI VDMA to work. Perhaps @yaobaishen can comment on this?

If you want some help debugging, can you send me your device tree entries for AXI VDMA. Also, can you send the output of dmesg immediately after you run the benchmark?

@tangxu00
Copy link
Author

thanks for your help, and i have emailed yaobaishen,but no reply.
i used zynq 7010, and the leasted linux-4.9.
that is my pl.dtsi:
`/*

  • CAUTION: This file is automatically generated by Xilinx.
  • Version:
  • Today is: Mon Nov 20 15:56:55 2017
    */

/ {
axidma_chrdev: axidma_chrdev@0 {
compatible = "xlnx,axidma-chrdev";
dmas = <&axi_vdma_0 0 &axi_vdma_0 1>;
dma-names = "tx_channel", "rx_channel";
};
amba_pl: amba_pl {
#address-cells = <1>;
#size-cells = <1>;
compatible = "simple-bus";
ranges ;
axi_vdma_0: dma@43000000 {
#dma-cells = <1>;
clock-names = "s_axi_lite_aclk", "m_axi_mm2s_aclk", "m_axi_mm2s_aclk", "m_axi_s2mm_aclk", "m_axi_s2mm_aclk";
clocks = <&clkc 15>, <&clkc 15>, <&clkc 15>, <&clkc 15>, <&clkc 15>;
compatible = "xlnx,axi-vdma-1.00.a";
interrupt-parent = <&intc>;
interrupts = <0 29 4 0 30 4>;
reg = <0x43000000 0x10000>;
xlnx,addrwidth = <0x20>;
xlnx,flush-fsync = <0x1>;
xlnx,num-fstores = <0x3>;
dma-channel@43000000 {
compatible = "xlnx,axi-vdma-mm2s-channel";
interrupts = <0 29 4>;
xlnx,datawidth = <0x18>;
xlnx,device-id = <0x0>;
};
dma-channel@43000030 {
compatible = "xlnx,axi-vdma-s2mm-channel";
interrupts = <0 30 4>;
xlnx,datawidth = <0x18>;
xlnx,device-id = <0x1>;
};
};
};
};
and dmesg after run the benchmark:Z-turn# ./axidma_benchmark -v
AXI DMA Benchmark Parameters:
Transmit Buffer Size: 7.91 Mb
Receive Buffer Size: 7.91 Mb
Number of DMA Transfers: 1000 transfers

Using transmit channel 0 and receive channel 1.
axidma: axidma_dma.c: axidma_start_transfer: 298: VDMA receive transaction timed out.
Failed to perform the AXI DMA read-write transfer: Timer expired
Z-turn# dmesg
Booting Linux on physical CPU 0x0
Linux version 4.9.0-xilinx (osrc@osrc-virtual-machine) (gcc version 4.6.1 (Sourcery CodeBench Lite 2011.09-50) ) #1 SMP PREEMPT Sun Nov 19 19:34:06 CST 2017
CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=18c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
OF: fdt:Machine model: xlnx,zynq-7000
cma: Reserved 28 MiB at 0x3e400000
Memory policy: Data cache writealloc
On node 0 totalpages: 262144
free_area_init_node: node 0, pgdat c0a31500, node_mem_map ef7f8000
Normal zone: 1536 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 196608 pages, LIFO batch:31
HighMem zone: 65536 pages, LIFO batch:15
percpu: Embedded 14 pages/cpu @ef7d3000 s25984 r8192 d23168 u57344
pcpu-alloc: s25984 r8192 d23168 u57344 alloc=14*4096
pcpu-alloc: [0] 0 [0] 1
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 260608
Kernel command line: console=ttyPS0,115200 root=/dev/ram rw earlyprintk cma=25M
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 994776K/1048576K available (6144K kernel code, 200K rwdata, 1468K rodata, 1024K init, 230K bss, 25128K reserved, 28672K cma-reserved, 233472K highmem)
Virtual kernel memory layout:
vector : 0xffff0000 - 0xffff1000 ( 4 kB)
fixmap : 0xffc00000 - 0xfff00000 (3072 kB)
vmalloc : 0xf0800000 - 0xff800000 ( 240 MB)
lowmem : 0xc0000000 - 0xf0000000 ( 768 MB)
pkmap : 0xbfe0random: fast init done
0000 - 0xc0000000 ( 2 MB)
modules : 0xbf000000 - 0xbfe00000 ( 14 MB)
.text : 0xc0008000 - 0xc0700000 (7136 kB)
.init : 0xc0900000 - 0xc0a00000 (1024 kB)
.data : 0xc0a00000 - 0xc0a32100 ( 201 kB)
.bss : 0xc0a32100 - 0xc0a6bb1c ( 231 kB)
Preemptible hierarchical RCU implementation.
Build-time adjustment of leaf fanout to 32.
RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
NR_IRQS:16 nr_irqs:16 16
efuse mapped to f0800000
slcr mapped to f0802000
L2C: platform modifies aux control register: 0x72360000 -> 0x72760000
L2C: DT/platform modifies aux control register: 0x72360000 -> 0x72760000
L2C-310 erratum 769419 enabled
L2C-310 enabling early BRESP for Cortex-A9
L2C-310 full line of zeros enabled for Cortex-A9
L2C-310 ID prefetch enabled, offset 1 lines
L2C-310 dynamic clock gating enabled, standby mode enabled
L2C-310 cache controller enabled, 8 ways, 512 kB
L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x76760001
zynq_clock_init: clkc starts at f0802100
Zynq clock init
sched_clock: 64 bits at 333MHz, resolution 3ns, wraps every 4398046511103ns
clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles: 0x4ce07af025, max_idle_ns: 440795209040 ns
Switching to timer-based delay loop, resolution 3ns
clocksource: ttc_clocksource: mask: 0xffff max_cycles: 0xffff, max_idle_ns: 537538477 ns
timer #0 at f080a000, irq=17
Console: colour dummy device 80x30
Calibrating delay loop (skipped), value calculated using timer frequency.. 666.66 BogoMIPS (lpj=3333333)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
CPU: Testing write buffer coherency: ok
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x100000 - 0x100058
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
Brought up 2 CPUs
SMP: Total of 2 processors activated (1333.33 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
devtmpfs: initialized
VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 256 KiB pool for atomic coherent allocations
cpuidle: using governor menu
hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
hw-breakpoint: maximum watchpoint size is 4 bytes.
zynq-ocm f800c000.ocmc: ZYNQ OCM pool: 256 KiB @ 0xf0880000
zynq-pinctrl 700.pinctrl: zynq pinctrl initialized
vgaarb: loaded
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
media: Linux media interface: v0.10
Linux video capture interface: v2.00
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti giometti@linux.it
PTP clock support registered
EDAC MC: Ver: 3.0.0
FPGA manager framework
fpga-region fpga-full: FPGA Region probed
Advanced Linux Sound Architecture Driver Initialized.
clocksource: Switched to clocksource arm_global_timer
NET: Registered protocol family 2
TCP established hash table entries: 8192 (order: 3, 32768 bytes)
TCP bind hash table entries: 8192 (order: 4, 65536 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
UDP hash table entries: 512 (order: 2, 16384 bytes)
UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
PCI: CLS 0 bytes, default 64
Trying to unpack rootfs image as initramfs...
rootfs image is not initramfs (no cpio magic); looks like an initrd
Freeing initrd memory: 5804K (dfa55000 - e0000000)
hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
futex hash table entries: 512 (order: 3, 32768 bytes)
workingset: timestamp_bits=30 max_order=18 bucket_order=0
jffs2: version 2.2. (NAND) (SUMMARY) © 2001-2006 Red Hat, Inc.
bounce: pool size: 64 pages
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
dma-pl330 f8003000.dmac: Loaded driver for PL330 DMAC-241330
dma-pl330 f8003000.dmac: DBUFF-128x8bytes Num_Chans-8 Num_Peri-4 Num_Events-16
xilinx-vdma 43000000.dma: Xilinx AXI VDMA Engine Driver Probed!!
e0001000.serial: ttyPS0 at MMIO 0xe0001000 (irq = 27, base_baud = 6249999) is a xuartps
console [ttyPS0] enabled
[drm] Initialized
brd: module loaded
loop: module loaded
libphy: Fixed MDIO Bus: probed
CAN device driver interface
libphy: MACB_mii_bus: probed
macb e000b000.ethernet eth0: Cadence GEM rev 0x00020118 at 0xe000b000 irq 28 (00:0a:35:00:01:22)
Generic PHY e000b000.etherne:03: attached PHY driver [Generic PHY] (mii_bus:phy_addr=e000b000.etherne:03, irq=-1)
e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver
usbcore: registered new interface driver usb-storage
mousedev: PS/2 mouse device common for all mice
i2c /dev entries driver
cdns-i2c e0005000.i2c: 400 kHz mmio e0005000 irq 24
cdns-wdt f8005000.watchdog: Xilinx Watchdog Timer at f099a000 with timeout 10s
EDAC MC: ECC not enabled
Xilinx Zynq CpuIdle Driver started
sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
sdhci-pltfm: SDHCI platform and OF driver helper
mmc0: SDHCI controller on e0100000.sdhci [e0100000.sdhci] using ADMA
ledtrig-cpu: registered to indicate activity on CPUs
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
fpga_manager fpga0: Xilinx Zynq FPGA Manager registered
NET: Registered protocol family 10
sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
NET: Registered protocol family 17
can: controller area network core (rev 20120528 abi 9)
NET: Registered protocol family 29
can: raw protocol (rev 20120528)
can: broadcast manager protocol (rev 20161123 t)
can: netlink gateway (rev 20130117) max_hops=1
Registering SWP/SWPB emulation handler
mmc0: new high speed SDHC card at address 1234
hctosys: unable to open rtc device (rtc0)
mmcblk0: mmc0:1234 SA16G 14.6 GiB
mmcblk0: p1
of_cfs_init
of_cfs_init: OK
ALSA device list:
No soundcards found.
RAMDISK: gzip image found at block 0
EXT4-fs (ram0): couldn't mount as ext3 due to feature incompatibilities
EXT4-fs (ram0): mounted filesystem without journal. Opts: (null)
VFS: Mounted root (ext4 filesystem) on device 1:0.
devtmpfs: mounted
Freeing unused kernel memory: 1024K (c0900000 - c0a00000)
FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
random: sshd: uninitialized urandom read (32 bytes read)
export_store: invalid GPIO 110
axidma: loading out-of-tree module taints kernel.
axidma: axidma_dma.c: axidma_dma_init: 705: DMA: Found 0 transmit channels and 0 receive channels.
axidma: axidma_dma.c: axidma_dma_init: 707: VDMA: Found 1 transmit channels and 1 receive channels.
axidma: axidma_dma.c: axidma_start_transfer: 298: VDMA receive transaction timed out.
Z-turn# `
I am working on using vdma to capture hdmi siginal. but it never worked with linux . did your driver or library provid vdma interface ?
thank you for reply.

@yaobaishen
Copy link

Sorry that I didn't work out VDMA either, so I am looking forward someone to verify it too.

@tangxu00
Copy link
Author

I will try working on it , yaobaishen reply me ,and he give up . thanks for you great work !

@tangxu00
Copy link
Author

tangxu00 commented Nov 23, 2017

hei , I changed my hardwere desigin in vivado , maybe the vdma worked ?

Z-turn# insmod axidma.ko
axidma: loading out-of-tree module taints kernel.
axidma: axidma_dma.c: axidma_dma_init: 705: DMA: Found 0 transmit channels and 0 receive channels.
axidma: axidma_dma.c: axidma_dma_init: 707: VDMA: Found 1 transmit channels and 1 receive channels.
Z-turn# ./axidma_benchmark -v
AXI DMA Benchmark Parameters:
Transmit Buffer Size: 7.91 Mb
Receive Buffer Size: 7.91 Mb
Number of DMA Transfers: 1000 transfers

Using transmit channel 0 and receive channel 1.
Single transfer test successfully completed!
Beginning performance analysis of the DMA engine.

random: fast init done

DMA Timing Statistics:
Elapsed Time: 27.71 s
Transmit Throughput: 285.42 Mb/s
Receive Throughput: 285.42 Mb/s
Total Throughput: 570.83 Mb/s
Z-turn#

Why take it so long ?

@bperez77
Copy link
Owner

Interesting, well you're the first person to have VDMA working, so that's good. There must have been some issue with my design when I was testing.

Hmm, the output of the timing statistics doesn't make much sense. According to the bandwidth numbers, your transfer should be completing in far less time, but it's not. So, for some reason the numbers are are inconsistent, which I haven't seen before.

I do notice a line of output random: fast init done, which may be causing some delay? Not sure if that is the source are not. Do you get the same results after running the benchmark several times in a row?

@tangxu00
Copy link
Author

tangxu00 commented Nov 24, 2017

my bandwidth numbers is 24 , for RGB .
I try several times ,"random: fast init done" appeard in the first time , I dont know what it mean.The time is always 27.71s.
I just want to use VDMA rx channel to rx a frame picture of HDMI video which has been transfer into 24 bit RGB format, and storage it into a DDR ,how can i use the vdma driver? Can you give me some guidance?thank you!

@bperez77
Copy link
Owner

I see, so the time is consistent. That's very odd though, because obviously 7.91 Mb / 27.71 s obviously does not come out to 285.42 Mb/s. The 285.42 Mb/s throughput number is about what I would expect, but the elapsed time is way too long. You haven't made any changes to the axidma benchmark code, correct?

Sure thing, that's pretty straightforward. So you can just use axidma_malloc to allocate your single frame buffer. Then, you can use axidma_video_transfer to setup a loop transfers, where the buffer is continuously streamed out from DRAM to your IP. Then, it's up to your application code to update the buffer as need.

Alternately, if you're planning on using Xilinx's DRM driver, you can use axidma_register_buffer to share the DRM driver's DMA buffer with my driver. Naturally, you need to get a handle to the DRM driver's buffer through libdrm first.

@bperez77
Copy link
Owner

Oh one other thing I noticed is the following line:

axidma: loading out-of-tree module taints kernel.

This indicates that the kernel you're running on your board, and the one that you built the driver against do not match. This isn't the cause of the timing issue, but it can cause the driver to crash, so I'd recomment making sure you're using the kernel you built the driver against.

@tangxu00
Copy link
Author

I have seen the banchmark.c ,in " DMA Timing Statistics " function , Number of DMA Transfers is 1000 ,so 7.91Mb/27.71s *1000=285.42Mb/s.
Inorder to use your driver , I bulild linux-4.9 to my board ,I can sure that I use the it to bulid the driver . this problem maybe related to my config for kernel .

@bperez77
Copy link
Owner

Sorry had a bit of a brain fart there, you're right about the time reported by the benchmark.

What throughput are you expecting for the transfers? I know that my driver does introduce a bit of an overhead, but you should still get near the maximum performance. Looking at table 2-3 of the AXI VDMA User Guide, I think it's near the expected. Naturally, the exact throughput depends on data width configured in your AXI VDMA IP block.

@tangxu00
Copy link
Author

I see "axidma_video_transfer" function just use tx channel for display , on the contrary , I need datas streamed out from IP to DRAM buffer , not a loop , but a oneway road from external video signal to DRAM. I have not seen any “video_read” founction , maybe you can add AXIDMA_DMA_VIDEO_READ in ioctl when you have time , just a suggestion .

@bperez77
Copy link
Owner

Ahh I see. Actually, if you're just doing transfer one at a time, you can utilize axidma_oneway_transfer or axidma_twoway_transfer, with a VDMA channel. Of course, this will be slower, which I'm guessing is what you were referring to in your question as to why it's so slow.

Otherwise, I can send you a patch for continuous loop video read, analgous to axidma_video-write. Unfortunately, I'm busy the next few weeks, so I won't be able to test it fully, but I can still send you the patch.

@tangxu00
Copy link
Author

Thank you so much , that's will be great helpful !

bperez77 added a commit that referenced this issue Dec 1, 2017
This was implemented as discussed in issue #31. The software stack now
supports receive video transfers. This is a continuous receive transfer
through VDMA, or a loop transfer. Transfers like this are useful for
cameras or other video input devices that utilize frames. The call
supports an arbitrary number of frame buffers.

NOTE: This code is untested, and was only checked for correct
compilation.
@bperez77
Copy link
Owner

bperez77 commented Dec 1, 2017

Ok, I added support with c3181d8.

This code is currently untested, I only checked that it compiled against the most recent version of Xilinx's kernel. Unfortunatley, I don't have a device to test with, so could you run and test the code, and let me know if you run into any issues.

@tangxu00
Copy link
Author

tangxu00 commented Dec 3, 2017

vdma_test.txt
I write a simple app use vmda loop to transfer files , but i failed .
that is dmsg:
axidma: axidma_dma.c: axidma_dma_init: 706: DMA: Found 0 transmit channels and 0 receive channels.
axidma: axidma_dma.c: axidma_dma_init: 708: VDMA: Found 1 transmit channels and 1 receive channels.
Unhandled fault: page domain fault (0x01b) at 0xbeea3cb0
pgd = ef318000
[beea3cb0] *pgd=3de24831
Internal error: Oops - BUG: 1b [#1] PREEMPT SMP ARM
Modules linked in: axidma(O)
CPU: 0 PID: 720 Comm: vdma_test Tainted: G O 4.9.0-xilinx #2
Hardware name: Xilinx Zynq Platform
task: ef1b8440 task.stack: ef174000
PC is at axidma_video_transfer+0xd4/0x194 [axidma]
LR is at 0x500
pc : [] lr : [<00000500>] psr: 80000013
sp : ef175df8 ip : beea3cb0 fp : beea3c34
r10: 00000000 r9 : ef174000 r8 : 00000005
r7 : 002a3000 r6 : ef12ad80 r5 : 00000000 r4 : ef175e7c
r3 : 000002d0 r2 : 00000000 r1 : ef25aa80 r0 : ef12ad80
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 18c5387d Table: 2f31804a DAC: 00000051
Process vdma_test (pid: 720, stack limit = 0xef174210)
Stack: (0xef175df8 to 0xef176000)
5de0: 002a3000 60000013
5e00: ef31ada8 00000001 ef25aa80 00000000 00000000 00000000 00000000 00000000
5e20: 00000000 00000001 00000001 00000000 00000022 ef1b8440 00000000 00000500
5e40: 00000003 000002d0 00000005 beea3c14 ef12ad80 00000051 80185707 bf000dc0
5e60: 00000707 c0a08648 ef3c0200 00000000 00000000 ef25a780 ee9714d0 00000000
5e80: 00000001 beea3cb0 00000500 00000003 000002d0 c01b7998 ef3b21c4 ee978c40
5ea0: ee9714d0 c01b8b28 ee9714d0 00000707 040444fb c01b8bdc ee9714d0 040444fb
5ec0: ef0c6780 c01ba354 ed85e748 00000000 ef0c6780 00000000 00000000 002a3000
5ee0: 00002000 ef0c6780 beea3c14 ef3b20f0 80185707 c01ded20 40049409 c01df680
5f00: ef0c6780 00000001 002a3000 00000003 00000000 ee978c40 000002a3 c01ba754
5f20: 00000000 00000000 00000000 b68d9000 00000001 ee978c78 ef0c6780 00000003
5f40: 002a3000 00000000 ef174000 00000000 beea3c34 c01aa3e8 00000001 00000000
5f60: 00000000 ef175f6c 00000005 ef0c6780 ef0c6780 beea3c14 80185707 00000005
5f80: ef174000 00000000 beea3c34 c01df70c 00008964 00000000 000000d8 00000036
5fa0: c0106f64 c0106da0 00008964 00000000 00000005 80185707 beea3c14 beea3c14
5fc0: 00008964 00000000 000000d8 00000036 b6f65000 00000000 b6fc3000 beea3c34
5fe0: 00000000 beea3c00 b6f91cac b6ef1c1c 60000010 00000005 00000000 00000000
[] (axidma_video_transfer [axidma]) from [] (axidma_ioctl+0x804/0x9a0 [axidma])
[] (axidma_ioctl [axidma]) from [] (vfs_ioctl+0x18/0x34)
[] (vfs_ioctl) from [] (do_vfs_ioctl+0x838/0x88c)
[] (do_vfs_ioctl) from [] (SyS_ioctl+0x38/0x54)
[] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x3c)
Code: e1a00006 e58d7000 e1a02005 e59d1010 (e79c3105)
---[ end trace 94a751a56bb75147 ]---

I guess something wrong in memory of my app, segmentation fault.
But i also test your axidma_display_image.c , same question.

bperez77 added a commit that referenced this issue Dec 4, 2017
This was implemented as discussed in issue #31. The software stack now
supports receive video transfers. This is a continuous receive transfer
through VDMA, or a loop transfer. Transfers like this are useful for
cameras or other video input devices that utilize frames. The call
supports an arbitrary number of frame buffers.

NOTE: This code is untested, and was only checked for correct
compilation.
@bperez77
Copy link
Owner

bperez77 commented Dec 4, 2017

Yeah, even if there's an issue with your app's memory, it shouldn't cause a segfault in the kernel. I've pushed something new, so can you pull and try again.

Also, can you post the output as a text file? It looks the formatting of it got messed up a bit.

@tangxu00
Copy link
Author

tangxu00 commented Dec 4, 2017

hei , I tried again . I changed your axidma_transfer.c , use axidma_get_vdma_tx(axidma_dev) function to change into vdma channel , It worked well to transfer file . post file axidma_transfer.txt and axidma_transfer-dmsg.txt .
so i changed axidma_twoway_transfer into axidma_oneway_transfer ,the same issue comes . post file vdma.txt and vdma-dmsg.txt.
and axidma_display.c also failed.
I used axidma_oneway_transfer to tx , and then axidma_oneway_transfer to rx ,It should be equal to axidma_twoway_transfer , but it failed , axidma_twoway_transfer worked .
maybe something wrong in one channel config.
axidma_transfer.txt
axidma_transfer-dmsg.txt

vdma.txt
vdma-dmsg.txt

axidma_display_image.txt
axidma_display_image-dmsg.txt

@bperez77
Copy link
Owner

Huh, that's really odd. I don't think I've seen any error like that before. The segfault from your logs seems to be pointing to the crash happening in the driver's video transfer function. However, even after combing over the funcntion, I can't find any obvious place where it would segfault. It's eve more odd that the two way transfer works perfectly fine for you.

Just to confirm, you are on the latest version of the driver?

@dlaurentiu
Copy link

dlaurentiu commented Dec 25, 2017

Hi Brandon,

I've just tried to use example axidma_display_image with a VDMA core and I get the same crash. Running the latest xilinx-linux on a ZC702 board.

crash.txt

(PS. Thank you for this project; it's a lifesaver.)

(Later: After some testing I've noticed that the error comes from accessing axidma_video_transaction.frame_buffer[0], which should point to the userspace address of the buffer. This is where the kernel faults; it seems that the specified address should be in kernel space. I've made a quick test and change void **frame_buffers, to void *frame_buffers, given that there is only one fb address and it runs without the crash).

@bperez77
Copy link
Owner

bperez77 commented Dec 30, 2017

Oh got it, I think I might know what the issue is. I made a pretty simple mistake. In the IOCTL for AXIDMA_VIDEO_WRITE (and incidentally in the read IOCTL as well), I'm not copying the array of frame buffers from user space to kernel space, which is a big no-no. What is likely happening is that memory happens to be paged out, which leads to the segfault. The reason changing to void *frame_buffers works is because it's handled by the first copy_from_user call in that IOCTL.

So, I need a second call to copy_from_user to copy the array of frame buffers. Let me make that change and then push it.

@tangxu00
Copy link
Author

sorry for later ,after long time debug , I cant get a complete frame picture use your vdma driver , so I begin my own driver use ioremap register of vdma ,and it easy to work . but it is not normative to linux kernel ,so I still want to try your driver , have you update the code ?
thank you .

@bperez77 bperez77 added the bug label Jan 12, 2018
@bperez77
Copy link
Owner

Not yet, but I was planning on getting it up this weekend. I'll update this issue once it's up.

@elektrokokke
Copy link

elektrokokke commented Feb 20, 2018

I can confirm that this issue is caused by not calling copy_from_user on the frame buffer array.

My dirty fix for this was:

--- a/driver/axidma_chrdev.c
+++ b/driver/axidma_chrdev.c
@@ -345,6 +345,7 @@
     struct axidma_inout_transaction inout_trans;
     struct axidma_video_transaction video_trans;
     struct axidma_chan chan_info;
+    void *framebuffers[32];
 
     // Coerce the arguement as a userspace pointer
     arg_ptr = (void __user *)arg;
@@ -452,16 +453,13 @@
                            "AXIDMA_DMA_VIDEO_READ.\n");
                 return -EFAULT;
             }
-
-            // Verify that we can access the array of frame buffers
-            size = video_trans.num_frame_buffers *
-                   sizeof(video_trans.frame_buffers[0]);
-            if (!axidma_access_ok(video_trans.frame_buffers, size, true)) {
-                axidma_err("Unable to copy frame buffer addresses from "
-                           "userspace for AXIDMA_DMA_VIDEO_WRITE.\n");
-                return -EFAULT;
-            }
-
+            if (copy_from_user(framebuffers, video_trans.frame_buffers,
+            		video_trans.num_frame_buffers * sizeof(void*)) != 0) {
+				axidma_err("Unable to copy framebuffer pointers from userspace for "
+						   "AXIDMA_DMA_VIDEO_READ.\n");
+				return -EFAULT;
+			}
+            video_trans.frame_buffers = framebuffers;
             rc = axidma_video_transfer(dev, &video_trans, AXIDMA_READ);
             break;

There is a lot of room for improvement on this, however...

Cheers

P.S.: Thx for the nice piece of work. For simple interfacing with VDMA cores I however begin to think that the Xilinx driver makes it much much to complicated compared to the bare-metal way...

@ImagotechGmbH
Copy link

Hello Brandon,
first of all I would like to send you a huge THANK YOU for your great work!
I tried to get VDMA working for - believe it or not - months (!) now and I finally succeeded using your code, what a help!
I managed to get VDMA working on a Zynq Zybo 7010 board, running kernel xilinx 4.4.30 and ubuntu 16.04.
I modified the "image" size of axidma_benchmark a little and got the following data rates, I'm not yet sure if this is a good and plausible value for a PL clock of 150MHz.

./axidma_benchmark -v

AXI DMA Benchmark Parameters:
Transmit Buffer Size: 1.17 Mb
Receive Buffer Size: 1.17 Mb
Number of DMA Transfers: 1000 transfers

Using transmit channel 0 and receive channel 1.
Step #1
Single transfer test successfully completed!
Beginning performance analysis of the DMA engine.

DMA Timing Statistics:
Elapsed Time: 35.04 s
Transmit Throughput: 33.44 Mb/s
Receive Throughput: 33.44 Mb/s
Total Throughput: 66.88 Mb/s

Next step will be to use a custom IP as axi stream source, I will also give the axidma_transfer a try.

Please keep developing great software, your work is highly appreciated!
All the best from Munich, home of Oktoberfest ;-)
J.

@bperez77
Copy link
Owner

@tangxu00 and @elektrokokke the most recent commit should resolve this issue. Let me know if you guys encounter any additional issues.

@bperez77
Copy link
Owner

@juergenmuc thanks, I appreciate it! Those numbers seem reasonable, though it's hard to say at a high-level glance. The smaller your transfer size, the less throughput you will see from the driver. This is because the system calls to initiate transfers have a relatively high overheard, so the larger the transfer, the more this cost will be amortized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants