Embodied agents exhibit immense potential across a multitude of domains,
making the assurance of their behavioral safety a fundamental prerequisite for
their widespread deployment. However, existing research predominantly
concentrates on the security of general large language models, lacking
specialized methodologies for establishing safety benchmarks and input
moderation tailored to embodied agents. To bridge this gap, this paper
introduces a novel input moderation framework, meticulously designed to
safeguard embodied agents. This framework encompasses the entire pipeline,
including taxonomy definition, dataset curation, moderator architecture, model
training, and rigorous evaluation. Notably, we introduce EAsafetyBench, a
meticulously crafted safety benchmark engineered to facilitate both the
training and stringent assessment of moderators specifically designed for
embodied agents. Furthermore, we propose Pinpoint, an innovative
prompt-decoupled input moderation scheme that harnesses a masked attention
mechanism to effectively isolate and mitigate the influence of functional
prompts on moderation tasks. Extensive experiments conducted on diverse
benchmark datasets and models validate the feasibility and efficacy of the
proposed approach. The results demonstrate that our methodologies achieve an
impressive average detection accuracy of 94.58%, surpassing the performance of
existing state-of-the-art techniques, alongside an exceptional moderation
processing time of merely 0.002 seconds per instance.
Este artículo explora los viajes en el tiempo y sus implicaciones.
Descargar PDF:
2504.15699v1