LaViRA: Language-Vision-Robot Actions Translation for Zero-Shot Vision Language Navigation in Continuous Environments
Published in IEEE International Conference on Robotics and Automation (ICRA), 2025
| Project page | Code | arXiv |
LaViRA is a zero-shot vision-language navigation framework for continuous environments. It decomposes navigation into language action, vision action, and robot action to improve reasoning, grounding, and control without task-specific training.
Recommended citation: Hongyu Ding, Ziming Xu, Yudong Fang, You Wu, Zixuan Chen, Jieqi Shi, Jing Huo, Yifan Zhang, and Yang Gao. "LaViRA: Language-Vision-Robot Actions Translation for Zero-Shot Vision Language Navigation in Continuous Environments." In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2026.
Download Paper