image: image
Credit: HIGHER EDUCATON PRESS
Semi-supervised learning (SSL) aims to improve performance by exploiting unlabeled data when labels are scarce. Conventional SSL studies typically assume close environments where important factors (e.g., label, feature, distribution) between labeled and unlabeled data are consistent.
However, more practical tasks involve open environments where important factors between labeled and unlabeled data are inconsistent. It has been reported that exploiting inconsistent unlabeled data causes severe performance degradation, even worse than the simple supervised learning baseline. Manually verifying the quality of unlabeled data is not desirable, therefore, it is important to study robust SSL with inconsistent unlabeled data in open environments.
This paper, published in Frontiers of Computer Science by Higher Education Press and Springer Nature, briefly introduces some advances in this line of research, focusing on techniques concerning label space mismatch between labeled and unlabeled data, feature space mismatch between labeled and unlabeled data, and distribution mismatch between labeled and unlabeled data.
The authors also provide new evaluation benchmark and performance metrics for comprehensive evaluation of robust semi-supervised learning methods in open environments.
Moreover, the authors provided an open-sourced Python toolkit for semi-supervised learning studies:
Journal
Frontiers of Computer Science
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Robust semi-supervised learning in open environments
Article Publication Date
15-Aug-2025