Generative AI (GenAI) services powered by large language models (LLMs)
increasingly deliver real-time interactions, yet existing 5G multi-access edge
computing (MEC) architectures often treat communication and computing as
separate domains, limiting their ability to meet stringent latency
requirements. To address this challenge, we introduce an Integrated
Communication and Computing (ICC) framework where computing capabilities are
enabled to reside directly in radio access network (RAN) nodes and jointly
manage bandwidth and computing resources. Our queueing-theoretic analysis shows
that ICC outperforms 5G MEC, achieving higher service capacity (defined as the
maximum arrival rate that maintains a specified fraction of jobs completed
within a given delay budget) by 98%. We corroborate these gains through
system-level simulations that account for transformer-based LLM workloads,
realistic GPU specifications, and a priority-based scheduling scheme. The
simulations show that ICC improves service capacity by 60%, demonstrating its
potential to enable efficient, cost-effective real-time GenAI services in 6G.
Este artículo explora los viajes en el tiempo y sus implicaciones.
Descargar PDF:



