LaViRA: Language-Vision-Robot Actions Translation for Zero-Shot Vision Language Navigation in Continuous Environments

Published in IEEE International Conference on Robotics and Automation (ICRA), 2025

Project pageCodearXiv

LaViRA is a zero-shot vision-language navigation framework for continuous environments. It decomposes navigation into language action, vision action, and robot action to improve reasoning, grounding, and control without task-specific training.

Recommended citation: Hongyu Ding, Ziming Xu, Yudong Fang, You Wu, Zixuan Chen, Jieqi Shi, Jing Huo, Yifan Zhang, and Yang Gao. "LaViRA: Language-Vision-Robot Actions Translation for Zero-Shot Vision Language Navigation in Continuous Environments." In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2026.
Download Paper

Direct Link