Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis

Deep learning (DL) workloads mainly run on accelerators like GPUs. Recent DL
quantization techniques demand a new matrix multiplication operator with mixed
input data types, further complicating GPU optimization. Prior high-level
compilers like Triton lack the expressiveness to implement key optimizations
like fine-grained data pipelines and hardware-friendly memory layouts for these
operators, while low-level programming models, such as Hidet, Graphene, and
CUTLASS, require significant programming efforts. To balance expressiveness
with engineering effort, we propose Hexcute, a tile-based programming language
that exposes shared memory and register abstractions to enable fine-grained
optimization for these operators. Additionally, Hexcute leverages task mapping
to schedule the GPU program, and to reduce programming efforts, it automates
layout and task mapping synthesis with a novel type-inference-based algorithm.
Our evaluation shows that Hexcute generalizes to a wide range of DL operators,
achieves 1.7-11.28$\times$ speedup over existing DL compilers for mixed-type
operators, and brings up to 2.91$\times$ speedup in the end-to-end evaluation.

Este artículo explora los viajes en el tiempo y sus implicaciones.

Descargar PDF:

2504.16214v1

Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis

Plataforma Online

Enlaces

Verbalus Mater

Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis

Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis

Plataforma Online

Enlaces

Verbalus Mater

Signo en

Regístrate

— PRÓXIMO CURSO ONLINE EMPIEZA EL 15 DE ENERO —

La Ciencia Real Detrás de los Viajes Temporales 25% DTO

La Ciencia Real Detrás de
los Viajes Temporales
25% DTO