image: IMAGE: INTERATION BETWEEN THE PRE- AND POST-SYNAPSE TUNED BY THE MODULATORY TERMINAL FOR THE REALIZTION OF BIOLOGICAL R-STDP CHARACTERISTIC. ILLUSTRATION OF AN APPLICATION SCENARIO OF THE R-STDP ALGORITHM IN AI AGENTS, WHERE A ROBOT COULD LOCATE AND TRACK AN APPLE FROM A DISTANCE WITH LOW TIME- AND ENERGY-CONSUMPTION.
Credit: ©Science China Press
Spike neural network (SNN), as one type of the neuromorphic computing paradigms, has attracted great attentions for its low latency and high energy efficiency when performing the artificial intelligence (AI) tasks. Spike-timing dependent plasticity (STDP), a biometric learning rule, enables SNN to achieve outstanding performance in learning and recognizing static samples. However, the network trained with STDP exhibits poor adaptability to the dynamic environment. Inspired by the dopamine-dependent tuning of synaptic plasticity in biological systems, the reward-modulated STDP (R-STDP) has been proposed to train the weights of SNN for the adaptation to the dynamic surroundings. However, a large scale of devices are required to mimic the behavior of R-STDP, leading to the complex hardware architectures and high power consumption.
To address the issue, the research team led by Prof. Qi Liu, Prof. Du Xiang, and Prof. Xumeng Zhang from Fudan University recently demonstrated R-STDP synaptic plasticity within a single two-dimensional ferroelectric floating-gate heterojunction device (graphene/CuInP2S6/MoS2), and constructed SNN to simulate robotic capture tasks for both static and dynamic objects. In this floating-gate heterostructure device, the independent application of bias and gate voltages induces the ferroelectric polarization in CuInP2S6 (CIPS), lowering the interfacial barrier height and facilitating the carrier injection into the floating gate, which demonstrates pronounced non-volatile memory characteristics and long-term synaptic plasticity. Based on the tuning effect, the Vg pulses were then used as the reward signal to modulate the STDP behavior governed by the Vds pulses, enabling the switching between STDP and anti-STDP. Therefore, the R-STDP was realized within a single device, which significantly simplifies the hardware architecture comparing to the previous reports.
Based on the R-STDP behavior exhibited by the device, an SNN was constructed to simulate the capturing task of both static and dynamic targets performed by a robot. During the simulation, the robot demonstrated excellent performance for both the static or dynamic objects. Particularly, the network achieved a success rate up to 85.5% in the dynamic capturing task, and maintained the rate around 80% even when the device-to-device and cycle-to-cycle variation reach 10%, demonstrating strong robustness of the network. In summary, this report proposes a compact hardware structure that mimics R-STDP within a single device, facilitating the development of compact and efficient interactive AI agents.
Looking ahead, the research team plans to extend their studies by fabricating the ferroelectric memtransistors in arrays using the large area ferroelectric film such as barium titanate (BTO), bismuth ferrite (BFO), hafnium zirconium oxide (HZO). The writing/erasing speed of the memtransistor might be further improved to the microsecond or nanosecond range through optimizing the band alignment and interface of the ferroelectric heterostructure. Moreover, the programming voltage could also be further reduced by thinning the ferroelectric dielectric. Through boosting the speed and lowering the voltage together, it is possible to achieve the energy consumption per operation at sub-femtojoule level, which is compatible to that in a biological neural network. Furthermore, to implement the R-STDP trained SNN at hardware level, the peripheral circuit integrated with the ferroelectric memtransistors should be co-developed deliberately, which is important to ensure reliable voltage regulation, low-latency data conversion, and high-efficient power management in the whole system.
As interactive artificial intelligence continues to develop, this study of achieving R-STDP learning rule in a single van der Waals ferroelectric memtransistor opens the door for constructing the compact neuromorphic systems for robotic recognition and tracking, which will further invigorate the diverse research and development of interactive AI agents for applications in dynamic and complex learning environments.
For details, see the original article by Y. Cao et al., “Reward-modulated spike-timing-dependent plasticity in van der Waals ferroelectric memtransistor for robotic recognition and tracking,” Sci. Bull. (2025), doi: 10.1016/j.scib.2025.05.044.