image: The multi-level testing framework is designed across spatial relations, spatial scenes, and prompt engineering strategies, with standardized scripts ensuring normalization.
Credit: Beijing Zhongke Journal Publising Co. Ltd.
Recently, the Journal of Geo-Information Science has published online a significant research study conducted by Ruoling Wu, a graduate student, and Prof. Danhuai Guo from the School of Information Science and Technology at Beijing University of Chemical Technology. Based on an analysis of existing Large Language Models (LLMs) characteristics, this research develops a comprehensive evaluation standard for spatial cognition in LLMs. The testing standard is constructed along three dimensions: spatial object types, spatial relations, and prompt engineering strategies in spatial scenarios. It includes three types of spatial objects, three categories of spatial relations, and three prompt engineering strategies. Ultimately, it establishes a testing standard framework, SRT4LLM, along with standardized testing processes to evaluate and quantify spatial cognition in LLMs.
The effectiveness of the SRT4LLM standard and the stability of the results are verified through multiple rounds of testing on eight LLMs with different parameter scales. Testing results indicate that the geometric complexity of input spatial objects influences the spatial cognition of LLMs. While different LLMs exhibit significant performance variations, the scores of the same model remain stable. As the geometric complexity of spatial objects and the complexity of spatial relations increase, LLMs' accuracy in judging three spatial relations decreases by only 7.2%, demonstrating the robustness of the test standard across different scenarios. Improved prompt engineering strategies can partially enhance LLM's spatial cognitive Question-Answering (Q&A) performance, with varying degrees of improvement across different models. This verifies the effectiveness of the standard in analyzing LLMs' spatial cognitive abilities. Additionally, Multiple rounds of testing on the same LLM indicate that the results are convergent, and score differences between different LLMs exhibit a stable distribution. SRT4LLM effectively measures the spatial cognitive abilities of LLMs and serves as a standardized evaluation tool.
Published in the Journal of Geo-Information Science, this study establishes a fundamental basis for future investigations, emphasizing the need for further optimization of the SRT4LLM standard and enhanced strategies for improving the spatial cognitive ability of LLMs. It can be used to assess LLMs' spatial cognition and support the development of native geographic large models in future research, promoting deeper integration between LLMs and geographic information science.
For more details, please refer to the original article:
Research on Evaluation Standards for Spatial Cognitive Abilities in Large Language Models.
https://www.sciengine.com/JGIS/doi/10.12082/dqxxkx.2025.240694(If you want to see the English version of the full text, please click on the “iFLYTEK Translation” in the article page.)
Article Title
Research on Evaluation Standards for Spatial Cognitive Abilities in Large Language Models
Article Publication Date
25-May-2025