KAIST's VOTP Tech Learns Robot Behavior From Just a Few Videos

For years, the gold standard for training physical AI has been a labor-intensive grind. To teach a robot to perform a task—whether it is suturing tissue in a surgical setting or navigating a complex urban intersection—engineers have relied on human operators to provide thousands of individual feedback signals. This process, known as reward function construction, requires humans to manually evaluate vast swaths of behavioral data to define what constitutes a 'good' or 'bad' action. The result is a massive bottleneck in robotics, where the cost and time required to generate sufficient training data scale exponentially with the complexity of the task.

KAIST Introduces Video-Based Optimal Transport for Preference Learning

On June 10, a research team led by Professor Chang-Dong Yoo at the KAIST School of Electrical Engineering unveiled a new approach to this problem: Video-based Optimal Transport for Preference (VOTP). The team sought to mimic the human ability to learn new tasks by observing only a handful of demonstrations, rather than requiring exhaustive trial-and-error feedback. By utilizing a small set of video clips showing both successful and unsuccessful task execution, the VOTP algorithm allows an AI to autonomously extract human preference patterns. This shift effectively collapses the traditional, high-cost data evaluation pipeline into a process requiring only a few visual examples, addressing a fundamental barrier to the commercial deployment of physical AI.

Global Recognition at ICML 2026

The significance of this development has been validated by the machine learning community. The research, titled Video-Based Optimal Transport for Feedback-Efficient Offline Preference-Based Reinforcement Learning, was accepted for the International Conference on Machine Learning (ICML) 2026. Out of 23,918 submissions, the paper was selected as one of only 168 oral presentations, placing it in the top 0.7% of all research presented at the conference. Led by doctoral student Luu Minh Tung, the project was supported by the Ministry of Science and ICT, the Institute of Information and Communications Technology Planning and Evaluation (IITP), and the National Research Foundation of Korea (NRF). The findings are scheduled to be presented at the conference in Seoul this July.

How Optimal Transport Powers Generalization

The core innovation of VOTP lies in its application of optimal transport theory to interpret human intent from sparse video data. Rather than simply mimicking the visual input, the AI uses this mathematical framework to quantify human preferences and integrate them into its learning model. This allows the system to understand the underlying intent behind an action and generalize that knowledge to novel environments. In testing, the research team demonstrated that even when provided with an extremely limited amount of data, the model could adapt its behavior to new scenarios, increasing data efficiency by several orders of magnitude compared to conventional reinforcement learning methods.

Scaling Physical AI Across Industries

The implications of this technology extend far beyond basic robotics. The team successfully validated the generalization capabilities of VOTP across a wide range of applications, including robotic arm control, humanoid robotics, autonomous driving, smart factory automation, drone navigation, and even AI agents designed to operate computer interfaces. By reducing the reliance on massive human-labeled datasets, companies can now significantly lower the barrier to entry for deploying sophisticated physical AI systems. The future of physical AI will likely be defined not by the sheer volume of data collected, but by the efficiency with which algorithms can translate human intent into actionable, generalized behavior.