See AMD Vitis™ Development Environment on xilinx.com |
The AI Engine Development Design Tutorials showcase the two major phases of AI Engine-ML application development: architecting the application and developing the kernels. Both phases are demonstrated in these tutorials.
Tutorial | Description |
Versal Custom Thin Platform Extensible System | This is a Versal system example design based on a VEK280 thin custom platform (Minimal clocks and AXI exposed to PL) that includes HLS/RTL kernels and AI Engine kernel using a full Makefile build-flow. |
AIE-ML Programming | This tutorial helps user understand the differences between AI Engine and AI Engine-ML architecture, it is based on matrix multiplication which is a usual algorithm in Machine Learning applications. |
Prime Factor FFT-1008 on AIE-ML | This Versal system example implements a 1008-pt FFT using the Prime Factor algorithm. The design uses both AI Engine and PL kernels working cooperatively. AI Engine elements are hand-coded using AIE API, PL elements are implemented using Vitis HLS. System integration in Vitis is managed using the new v++ Unified Command Line flow. This tutorial targets the AIE-ML architecture. |
AIE-ML LeNet Tutorial | This tutorial uses the LeNet algorithm to implement a system-level design to perform image classification using the AIE-ML architecture and PL logic, including block RAM (BRAM). The design demonstrates functional partitioning between the AIE-ML and PL. It also highlights memory partitioning and hierarchy among DDR memory, PL (BRAM), Memory tile and AI Engine memory. |
AIE API based FFT for Many Instances Applications | This tutorial walks the user through the design and the implementation of an FFT for many parallel signals on a Real-Time system, using the AI Engine APIs. The design performance objective is minimizing power and utilization, maintaining a high throughput to at least match the Real-Time acquisition bandwidth. Moreover, the design leverages the AIE-ML Memory Tiles to minimize programmable logic utilization. The considered case study comprises 128 parallel signals, each with a 125MSa/s sample rate and CINT16 datatype, with a total aggregated bandwidth of 64GBytes/s. |
Copyright © 2020–2024 Advanced Micro Devices, Inc