Two stage iterative approach for addressing missing values in small-scale water quality data
Compuscript Ltd
image: Map of Xiaoqing River and the location of the Fanli station
Credit: Marine Development
https://doi.org/10.1007/s44312-024-00040-3
Announcing a new publication for Marine Development journal. The availability of water quality data is crucial for water quality assessment and environmental management. In real-time monitoring, the issue of missing data frequently occurs because of various reasons, such as natural hazards, facility malfunction, periodic maintenance, and parameter adjustment. Missingness can cause a mismatch in sample sizes and introduce unpleasant blank cells in data matrices, which reduces the utility and precision of related statistical analysis. Handling missing values in real water quality monitoring systems is essential for environmental analysis, particularly in some small-scale datasets.
This study proposes a two-stage approach for addressing missing water quality data of small size on the basis of accuracy assessment. Missingness is formulated as the coexistence of ‘random missing over short periods’ and ‘long-term continuous missing’. In the first stage, the traditional mean imputation, median imputation, linear interpolation, k-nearest neighbor imputation, random forest imputation, and multiple imputation by chained equations are compared to select the optimal method. As the most suitable method across all variables, linear interpolation is used to fill in small random missing portions of the original data, providing an opportunity to expand the dataset to perform subsequent imputation. In the second stage, together with the autoregressive integrated moving average, the filling methods are similarly evaluated on the basis of data already filled in the first step. The most suitable method obtained from the comparison is used to populate the remaining long-term continuous missing data. The efficacy of the proposed approach is validated on a real water quality dataset. The results demonstrate that the two-stage iterative approach offers a feasible roadmap to impute missing values on the small-scale water quality dataset.
As integrity and reliability of monitoring data are critical parts of various fields, the findings may provide some reference in water quality monitoring data imputation or other research conditions heavily relying on observed data similarly. Given that the imputation methods built on different mechanisms had their own advantages and limitations, suitable imputation methods need to be selected for data of specific circumstances, particularly if the amount of data is insufficient. The proposed two-stage iterative imputation is recommended to be adopted in other study areas and other data mining contexts. In the context of scale data and validated methods, it can be useful to determine whether the proposed approach is a viable alternative to enhance interpolation performance in other possible cases, which is expected to be studied in future work.
Article reference: Wang, F., Cui, X., Gui, Y. et al. Two stage iterative approach for addressing missing values in small-scale water quality data. Mar Dev 2, 27 (2024). https://doi.org/10.1007/s44312-024-00040-3
Keywords: Accuracy assessment, Methods evaluation, Missing data, Small-scale dataset, Two-stage iterative imputation, Water quality
# # # # # #
Marine Development aims to publish research papers in all relevant disciplines related to the ocean and the sea. Its scope spans diverse domains, including but not limited to marine resource management, marine environmental conservation, marine biodiversity, fisheries management, marine energy, marine policy, and international maritime law. The journal particularly values research that explores the complex links between marine issues and broader global challenges, such as climate change, sustainable economic development, and international cooperation. As a platform for interdisciplinary knowledge exchange, the journal will enable scholars to communicate their research and promote interdisciplinary research that advances our understanding of marine issues. It welcomes original research with a multidisciplinary focus and also encourages review articles that highlight the latest research trends and those with significant global impacts.
For more information, please visit https://link.springer.com/journal/44312
Editorial Board: https://link.springer.com/journal/44312/editorial-board
Marine Development is available on SpringerLink (https://link.springer.com/journal/44312/articles).
Submissions to Marine Development may be made using Editorial Manager (https://www.editorialmanager.com/made/default.aspx).
Abstracted and indexed in:
Abstracted and indexed in
Astrophysics Data System (ADS)
Baidu
CLOCKSS
CNKI
CNPIEC
DOAJ
Dimensions
EBSCO
Google Scholar
Japanese Science and Technology Agency (JST)
Naver
OCLC WorldCat Discovery Service
Portico
ProQuest
TD Net Discovery Service
Wanfang
e-ISSN: 3004-832X
# # # # # #
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.