Sustainable Development Goals
Abstract/Objectives
Real-world applications including robot vacuums, tour-guide robots and delivery robots employ autonomous navigation systems as the core function. However, constructing such a system could be costly, as such systems typically incorporate high-cost sensors such as depth cameras or LIDARS. Therefore, we present an effective, easy-to-implement, and low-cost modular framework for completing complex autonomous navigation tasks. Our proposed method is based on a single monocular camera to localize, plan, and navigate. A localization module in our framework first locates and acquires the robot’s pose, which is then forwarded to our planner module to generate a global path to the goal and its intermediate waypoints. This information along with the pose of the robot is then reinterpreted by our framework to form the “virtual guide”, which serves as a virtual lure for enticing the robot to move toward a specific direction. We evaluate our framework on a Husky Automatic Guided Vehicle (AGV) in a number of virtual and real-world environments, and validate that our framework is able to adapt to unfamiliar environments and demonstrate robustness to various environmental condition.
Results/Contributions

 Autonomous navigation has been attracting attention in recent years for controlling autonomous guided vehicles (AGV) (e.g., Fig. 1), as self-driving cars [1], autonomous drones, and domestic delivery robots have become increasingly prevalent in everyday life. Modern approaches for autonomous navigation have conventionally been accomplished with multiple sensor fusion, mostly splitting the task into two basic behaviors: local planning, which creates trajectories from a robot’s current position to a given local target; and path following, which guides the robot to follow a general direction to reach a desired goal. A popular means for achieving such an objective is the adoption of LIDAR, which is able to deliver high accuracy and promises stability. Although LIDAR is widely used and studied, its high cost makes it infeasible for deployment or low-budget projects. On the other hand, vision-based navigation, which makes use of rich information from the unstructured physical world, offers an alternative for lowering the expenses. Interpreting and representing visual inputs to perform actions and interact with objects, however, is especially challenging in unstructured environments, as colored images are typically complex and noisy [2, 3]. This makes designing rule-based robots which satisfy the requirements of comprehending visual scene semantics and navigating to specific desired destinations especially difficult.


We propose a new modular framework (Fig. 2) for addressing the reality gap in the vision domain and navigating a robot via virtual signals. Our robot uses a single monocular camera for navigation, without assuming any usage of LIDAR, stereo camera, or odometry information from the robot. The proposed framework consists of four modules: a localization module, a planner module, a perception module, and a local controller module. The localization module is responsible for estimating the current pose of the robot from the visual input, and conveys it to the planner module. The planner module then constructs a path between the robot’s current location and the desired goal, defining a global direction for the robot to follow. This global direction is then reinterpreted by our framework as the tendency, which is rendered on the semantic segmentation (which serves as the meta-state representation relating the modules in our framework [10]) generated by the perception module as the “virtual guide” for the local controller module. The virtual guide is depicted as a yellow ball in this work, however, it is not restricted to any specific form of representation. In the proposed framework, the role of the virtual guide is similar to a carrot (i.e., a lure) for enticing the robot to move toward a specific direction. The local controller module is implemented as an RL agent, and is trained completely in simulated environments, with an aim to learn a policy for chasing the virtual guide and avoiding obstacle collision. While in the execution phase, the local controller module receives actual segmented images with virtual guidance, enabling the agent to complete obstacle-avoidance and guide-following tasks simultaneously in the real world. Compared to conventional navigation approaches, our methodology is not only highly adaptable to diverse environments, but is also generalizable to complex scenarios.


Keywords
Robot Navigation
Contact Information
李濬屹
cylee@cs.nthu.edu.tw