News Release 3-Dec-2025

A survey of embodied learning for object-centric robotic manipulation

Peer-Reviewed Publication

Beijing Zhongke Journal Publising Co. Ltd.

An illustration of robotic manipulation system and the typology of embodied learning methods for object-centric robotic manipulation. — **image:**
Fig. 1(a) illustrates a typical robotic manipulation system. It features a robotic arm equipped with sensors like cameras and end-effectors such as grippers, enabling it to manipulate a wide range of objects. The system’s intelligence revolves around three key aspects, corresponding to the three types of embodied learning methods depicted in Fig. 1(b). 1) Advanced perception capabilities, which involve utilizing data captured by different sensors to understand the target object and external environment; 2) Precise policy generation, which entails analyzing the perceived information to make optimal decisions; 3) Task-orientation, which ensures the system can adapt to specific tasks by optimizing the execution process for maximum effectiveness.
view more

Credit: Beijing Zhongke Journal Publising Co. Ltd.

During the previous decade, remarkable progress has been made in machine learning research centered on the field of deep learning, revolutionizing various applications such as computer vision and natural language processing. Different from traditional machine learning methods that solely rely on pre-constructed datasets for pattern recognition and prediction, embodied learning, a cornerstone of embodied AI, aims to empower intelligent agents the capability of environment perception and decision making. Embodied learning allows robots to learn through physical interaction with the environment and feedback from sensors, enabling them to adapt to new situations. It emphasizes the importance of the robot’s embodiment and knowledge acquisition through physical interactions and practical experiences. The data sources encompass a broad spectrum, including sensory inputs, bodily actions, and immediate environmental feedback. This learning mechanism is highly dynamic, continuously refining behaviors and manipulation strategies through real-time interactions and feedback loops. Embodied learning is essential in robotics as it equips robots with enhanced environmental adaptability, enabling them to handle changing conditions and undertake more intricate and complex tasks.

While a plethora of embodied learning methods have been proposed, this survey published in Machine Intelligence Research primarily focuses on the task of object-centric robotic manipulation. The inputs for this task are data collected from sensors, and the outputs are operational strategies and control signals for the robot to perform manipulation tasks. The objective is to enable the robot to efficiently and autonomously perform various object-centric manipulation tasks while enhancing its generality and flexibility across different environments and tasks. This task is highly challenging due to the diversity of objects and manipulation tasks, the complexity and uncertainty of the environment, and challenges such as noise, occlusion, and real-time constraints in real-world applications.

In recent years, extensive research has been conducted around those above three key aspects, particularly with the flourishing of large language models (LLMs), neural radiance fields (NeRFs), diffusion models, and 3D Gaussian splatting, leading to a host of innovative solutions. However, there is a notable absence of a comprehensive survey that encapsulates the latest research in this rapidly evolving field. This motivates researchers to write this survey to systematically recap the cutting-edge advancements and summarize the encountered challenges, along with the prospective research directions.

Over the past few years, many survey articles have emerged on embodied AI and robot learning, addressing various domains like navigation, planning, grasping, and manipulation. There is one table which sums the recent surveys related to embodied AI and robot learning in this paper, researchers compare the recent surveys with their work. To explicitly compare these survey papers, researchers utilized two key criteria: timeliness and systematicness. Timeliness assesses whether the reviewed papers are up-to-date and cover the latest research. Specifically, researchers consider review papers that include work from the past three years, i.e., those published in 2022 and later, as timely. Systematicness, on the other hand, applies specifically to surveys related to robotic manipulation (RM). If a survey only addresses certain aspects of RM like datasets and imitation learning, it is deemed lacking in systematicness.

From the table mentioned above, it can be observed that the number of surveys related to RM is the highest, indicating the significance of research in the RM field. In addition, although RG can be considered as the foundation of RM, it is often studied as an independent field due to its involvement in many subtasks and specific problems. These existing surveys primarily focus on specific aspects of robot manipulation, such as deep learning-based grasp synthesis and manipulation policy learning. Additionally, some latest surveys delve into recent advancements in vision-language-action models and large language model-based autonomous agents. However, researchers’ survey is unique that it provides a comprehensive overview of embodied learning methods for object-centric robotic manipulation, encompassing embodied perceptual learning, policy learning, and task-oriented learning.

The most closely related work to researchers’ work is the survey paper by Cong et al. (2021), which primarily reviews research on 3D vision-based robotic manipulation up to 2021. In contrast, researchers’ work is not limited to specific input modalities; they systematically summarize and categorize representation methods based on 2D images, 3D-aware techniques, and tactile sensing. Moreover, researchers provide a comprehensive introduction to critical aspects of robotic manipulation, such as policy and task-oriented learning. Notably, this survey covers a wide range of the latest research achievements mainly published after 2021, offering a more cutting-edge and comprehensive perspective. Therefore, this work stands out as the only survey in the RM field that combines both timeliness and systematicness. Researchers hope this survey will serve as a worthwhile reference for other researchers and practitioners in the field of embodied learning for object-centric robotic manipulation.

To perform object-centric robotic manipulation, the robot must first learn to perceive the target object and its surrounding environment, which involves data representation, object pose estimation, and affordance learning. In Section 2, researchers provide a comprehensive overview of these works.

Embodied policy learning aims to empower robots with the sophisticated decision-making capabilities required to perform manipulation tasks efficiently. Section 3 delineates the process of embodied policy learning into two fundamental phases: policy representation and policy learning, elucidating how these techniques enable robots to accomplish predefined objectives.

Embodied task-oriented learning not only involves strategic planning through powerful perception but also necessitates robots to understand how their physical attributes influence decision-making and task execution. It helps robots develop the ability to make decisions in complex and dynamic scenarios. Specifically, existing work of embodied task-oriented learning centers on two domains: object grasping and object manipulation. Section 4 introduces methods tailored for these two tasks, revealing how embodied learning improves the efficiency and precision of robots.

In Section 5, researchers introduce some primary datasets and evaluation metrics in the area of robotic manipulation. Datasets are categorized into object grasping and object manipulation. For evaluation metrics, object grasping uses metrics like accuracy and grasp success rate, while object manipulation primarily employs task success rate.

With the continuous advancement of artificial intelligence, machine learning, and robotics technology, intelligent robots will be applied more extensively and deeply across various fields. Section 6 explores applications of embodied learning for object-centric robotic manipulation across various fields, including industrial robots, agricultural robots, domestic robots, surgical robots, and other areas like space exploration, education and research.

In the past few years, there has been a significant increase in research on embodied learning methods for object-centric robot manipulation tasks, leading to rapid development in this field. However, current technology still faces some highly challenging issues. Further exploration of these issues will be crucial in promoting the widespread application of intelligent robots in various fields. Section 7 discusses several challenges and potential future research directions.

Section 8 is the conclusion. In this paper, researchers present a comprehensive survey of the existing methods for embodied learning in object-centric robotic manipulation. Researchers begin by introducing the concept of this task and its essential components and then compare it with related survey articles. Next, they systematically present the main works across three categories. They then explore the commonly used datasets and evaluation metrics, highlighting some representative applications. Finally, they discuss the challenges and suggest promising directions for future research. Researchers hope this survey will provide researchers with a comprehensive understanding and new insights in this emerging field.

See the article:

A Survey of Embodied Learning for Object-centric Robotic Manipulation

http://doi.org/10.1007/s11633-025-1542-8

Journal

Machine Intelligence Research

DOI

10.1007/s11633-025-1542-8

Article Title

A Survey of Embodied Learning for Object-centric Robotic Manipulation

Article Publication Date

20-Jun-2025

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.