【CS Peer Talk #7】Automatic Code Generation for Rocket Chip RoCC Accelerators
Heterogeneous SoCs that are coupled with accelerators are becoming prevalent for various deep learning applications thanks to their outstanding flexibility and performance. However, programming for these platforms remains hard due to their low-level programming interface and complex memory systems. Meanwhile, automatic code generation for tensor programs provides reasonable performance with great accessibility and flexibility. In this work, we bring these two topics together by proposing a flow of automatic code generation for heterogeneous SoCs. We present how to implement the proposed flow using TVM for RoCC. We also develop a performance evaluation platform to enable practical automatic code generation on embedded devices. Experiments using TVM for the Gemmini GEMM accelerator demonstrate that the generated code achieves a peak of 25.24 GIOPS and a best-case 3.6x speedup compared to the hand-tuned kernels from Gemmini developers.