$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

In order for robots to be useful, they must perform practically relevant
tasks in the real world, outside of the lab. While vision-language-action (VLA)
models have demonstrated impressive results for end-to-end robot control, it
remains an open question how far such models can generalize in the wild. We
describe $\pi_{0.5}$, a new model based on $\pi_{0}$ that uses co-training on
heterogeneous tasks to enable broad generalization. $\pi_{0.5}$\ uses data from
multiple robots, high-level semantic prediction, web data, and other sources to
enable broadly generalizable real-world robotic manipulation. Our system uses a
combination of co-training and hybrid multi-modal examples that combine image
observations, language commands, object detections, semantic subtask
prediction, and low-level actions. Our experiments show that this kind of
knowledge transfer is essential for effective generalization, and we
demonstrate for the first time that an end-to-end learning-enabled robotic
system can perform long-horizon and dexterous manipulation skills, such as
cleaning a kitchen or bedroom, in entirely new homes.

Este artículo explora los viajes en el tiempo y sus implicaciones.

Descargar PDF:

2504.16054v1

$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Plataforma Online

Enlaces

Verbalus Mater

$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Plataforma Online

Enlaces

Verbalus Mater

Signo en

Regístrate

— PRÓXIMO CURSO ONLINE EMPIEZA EL 15 DE ENERO —

La Ciencia Real Detrás de los Viajes Temporales 25% DTO

La Ciencia Real Detrás de
los Viajes Temporales
25% DTO