Retrieval Augmented Generation (RAG) has emerged as a powerful application of
Large Language Models (LLMs), revolutionizing information search and
consumption. RAG systems combine traditional search capabilities with LLMs to
generate comprehensive answers to user queries, ideally with accurate
citations. However, in our experience of developing a RAG product, LLMs often
struggle with source attribution, aligning with other industry studies
reporting citation accuracy rates of only about 74% for popular generative
search engines. To address this, we present efficient post-processing
algorithms to improve citation accuracy in LLM-generated responses, with
minimal impact on latency and cost. Our approaches cross-check generated
citations against retrieved articles using methods including keyword + semantic
matching, fine tuned model with BERTScore, and a lightweight LLM-based
technique. Our experimental results demonstrate a relative improvement of
15.46% in the overall accuracy metrics of our RAG system. This significant
enhancement potentially enables a shift from our current larger language model
to a relatively smaller model that is approximately 12x more cost-effective and
3x faster in inference time, while maintaining comparable performance. This
research contributes to enhancing the reliability and trustworthiness of
AI-generated content in information retrieval and summarization tasks which is
critical to gain customer trust especially in commercial products.
Este artículo explora los viajes en el tiempo y sus implicaciones.
Descargar PDF:
2504.15629v1