Feature Story | 11-Jul-2025

Innovating echocardiography video segmentation with temporal and noise-resilient techniques for refined cardiovascular diagnostic imaging

The Hong Kong Polytechnic University

Cardiovascular diseases are a leading health concern in Hong Kong, prompting many to undergo regular heart check-ups for their early detection and management. Echocardiography, a key diagnostic imaging tool, plays a crucial role in assessing heart function, offering non-invasive insights into cardiovascular health and aiding in timely intervention. However, interpreting these ultrasound images manually is challenging due to speckle noise and ambiguous boundaries, requiring significant expertise and time. Consequently, a heart check-up is rarely included in a regular annual body check scheme.

Prof. Harry QIN, Professor of the School of Nursing at The Hong Kong Polytechnic University, and his team have developed a novel model, MemSAM, which aims to revolutionise echocardiography video segmentation by adapting the artificial intelligence (AI) model Segment Anything Model (SAM) from Meta AI to meet the specific demands of medical imaging.

MemSAM introduces a unique approach to echocardiography video segmentation through a temporal-aware and noise-resilient prompting scheme. Released by Meta, SAM is an advanced AI model dedicated to image segmentation which can quickly identify elements in any image and segment the elements.

While traditional SAM applications excel in natural image segmentation, their direct application to medical videos has been limited due to the lack of temporal consistency and the presence of significant noise. MemSAM addresses these issues by incorporating a space-time memory mechanism that captures both spatial and temporal information, ensuring consistent and accurate segmentation across video frames.

The introduction of MemSAM has the potential to substantially mitigate financial and specialised expertise requirements, potentially alleviating the burden associated with prolonged wait times for advanced cardiac imaging modalities. Furthermore, it could enable the incorporation of simplified cardiac assessments into routine health screenings, enhancing accessibility and early detection.

Echocardiography videos are notoriously difficult to segment due to several inherent challenges. The presence of massive speckle noise and numerous artifacts, coupled with the ambiguous boundaries of cardiac structures, complicates the segmentation process.

Moreover, the dynamic nature of heart movements results in large variations of target objects across frames. MemSAM's memory reinforcement mechanism enhances the quality of memory prompts by leveraging predicted masks, effectively mitigating the adverse effects of noise and improving segmentation accuracy.

A standout feature of MemSAM is its ability to achieve state-of-the-art performance with limited annotation. In clinical practice, the labour-intensive nature of annotating echocardiographic videos often results in sparse labelling, typically restricted to key frames such as end-systole and end-diastole. MemSAM excels in a semi-supervised setting, demonstrating comparable performance to fully supervised models while requiring significantly fewer annotations and prompts.

MemSAM's efficacy has been rigorously tested on two public datasets, CAMUS and EchoNet-Dynamic, showcasing its superior performance over existing models. The model's ability to maintain high segmentation accuracy with minimal prompts is particularly noteworthy, highlighting its potential to streamline clinical workflows and reduce the burden on healthcare professionals.

The technology behind MemSAM is rooted in the integration of SAM with advanced memory prompting techniques. SAM, known for its powerful representation capabilities, has been adapted to handle the unique challenges posed by medical videos. The core innovation lies in the temporal-aware prompting scheme, which utilises a space-time memory to guide the segmentation process. This memory contains both spatial and temporal cues, allowing the model to maintain consistency across frames and avoid the pitfalls of misidentification caused by mask propagation.

The memory reinforcement mechanism is another critical component of MemSAM. Ultrasound images are often plagued by complex noise, which can degrade the quality of image embeddings. To counter this, MemSAM employs a reinforcement strategy that uses segmentation results to emphasise foreground features and reduce the impact of background noise. This approach not only enhances the discriminability of feature representations but also prevents the accumulation and propagation of errors in the memory.

MemSAM's architecture is built on SAMUS, a medical foundation which is in turn a model based on SAM and optimised for medical-used images. The model processes videos in a sequential frame-by-frame manner, relying on memory prompts rather than external prompts for subsequent frames. This design minimises the need for dense annotations and external prompts, making it particularly suited for semi-supervised tasks.

While MemSAM represents a significant leap forward in echocardiography video segmentation, future research aims to enhance the model's robustness, particularly in scenarios where initial frame quality is poor. Additionally, exploring the application of MemSAM across other medical imaging domains and optimising its computational efficiency are exciting avenues for further development.

MemSAM not only addresses the longstanding challenges of ultrasound video segmentation but also sets a new benchmark for integrating advanced machine learning techniques into medical imaging. By bridging the gap between cutting-edge technology and clinical application, MemSAM holds the promise of improving diagnostic accuracy and patient outcomes in cardiovascular care. This innovative model exemplifies the potential of AI to revolutionise healthcare, offering a glimpse into a future where automated, accurate and efficient diagnostic tools are the norm.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.