image: Researchers used a multilayered multimodal latent Dirichlet allocation model to integrate bodily signals, sensory information and language from human participants. By learning emotion concepts from multimodal data and evaluating their consistency with human emotional categories, this computational model provides valuable insights into the mechanisms underlying human emotion formation.
Credit: Assistant Professor Chie Hieida from the Nara Institute of Science and Technology, Japan
Ikoma, Japan—Emotions are a fundamental part of human psychology—a complex process that has long distinguished us from machines. Even advanced artificial intelligence (AI) lacks the capacity to feel. However, researchers are now exploring whether the formation of emotions can be computationally modeled, providing machines with a deeper, more human-like understanding of emotional states.
In this vein, Assistant Professor Chie Hieida from the Nara Institute of Science and Technology (NAIST), Japan, in collaboration with Assistant Professor Kazuki Miyazawa and then-master’s student Kazuki Tsurumaki from Osaka University, Japan, explore computational approaches to model the formation of emotions. In a recent study, this team of researchers built a computational model that aims to explain how humans may form the concept of emotion. The study was made available online on July 3, 2025, and was published in Volume 16, Issue 4 of the journal IEEE Transactions on Affective Computing on December 3, 2025.
This model is based on the theory of constructed emotion, which proposes that emotions are not innate reactions but are built in the moment by the brain. Emotions arise from integrating internal bodily signals (interoception, like heart rate) with external sensory information (exteroception, like sight and sound), allowing the brain to create a concept, not just a reflex.
“Although there are theoretical frameworks addressing how emotions emerge as concepts through information processing, the computational processes underlying this formation remain underexplored,” says Dr. Hieida.
To model this process, the research team used multilayered multimodal latent Dirichlet allocation (mMLDA), a probabilistic generative model designed to discover hidden statistical patterns and categories by analyzing how different types of data co-occur, without being pre-programmed with emotional labels.
The developed model was trained using unlabeled data collected from human participants who viewed emotion-evoking images and videos. The system was not informed about which data corresponded to emotions such as fear, joy, or sadness. Instead, it was allowed to identify patterns on its own.
29 participants viewed 60 images from the International Affective Picture System, which is widely used in psychological research. While viewing the images, researchers recorded physiological responses such as heart rate using wearable sensors and collected verbal descriptions. Together, these data captured how people interpret emotions: what they see, how their bodies respond, and how they describe experiencing them.
When the trained model’s emotion concepts were compared with participants’ self-reported emotional evaluations, the agreement rate was about 75%. This was significantly higher than would be expected by chance, suggesting that the model categorized emotion concepts that closely matched how people experience emotions.
By modeling emotion formation in a way that mirrors human experience, this research paves the way for more nuanced and responsive AI systems. “Integrating visual, linguistic, and physiological information into interactive robots and emotion-aware AI systems could enable more human-like emotion understanding and context-sensitive responses,” says Dr. Hieida.
Moreover, because the model can infer emotional states that people may struggle to express in words, it could be particularly useful in mental health support, healthcare monitoring, and assistive technologies for conditions such as developmental disorders or dementia.
“This research has important implications for both society and industry, as it provides a computational framework that connects emotion theory with empirical validation, addressing the long-standing question of how emotions are formed,” concludes Dr. Hieida.
###
Resource
Title: Study of Emotion Concept Formation by Integrating Vision, Physiology, and Word Information Using Multilayered Multimodal Latent Dirichlet Allocation
Authors: Kazuki Tsurumaki, Chie Hieida, and Kazuki Miyazawa
Journal: IEEE Transactions on Affective Computing
DOI: 10.1109/TAFFC.2025.3585882
Information about the Laboratory of Mathematical Informatics can be found at the following websites: https://www.hieida.com/ and https://sites.google.com/view/milab/home
About Nara Institute of Science and Technology (NAIST)
Established in 1991, Nara Institute of Science and Technology (NAIST) is a national university located in Kansai Science City, Japan. In 2018, NAIST underwent an organizational transformation to promote and continue interdisciplinary research in the fields of biological sciences, materials science, and information science. Known as one of the most prestigious research institutions in Japan, NAIST lays a strong emphasis on integrated research and collaborative co-creation with diverse stakeholders. NAIST envisions conducting cutting-edge research in frontier areas and training students to become tomorrow's leaders in science and technology.
Journal
IEEE Transactions on Affective Computing
Method of Research
Computational simulation/modeling
Subject of Research
Not applicable
Article Title
Study of Emotion Concept Formation by Integrating Vision, Physiology, and Word Information Using Multilayered Multimodal Latent Dirichlet Allocation
Article Publication Date
3-Dec-2025
COI Statement
The authors declare they have no competing interests.