image: UNC research team check a plant specimen at the UNC Herbarium.
Credit: Shanna Oberreiter
A new study from UNC-Chapel Hill researchers shows that advanced artificial intelligence tools, specifically large language models (LLMs), can accurately determine the locations where plant specimens were originally collected, a process known as georeferencing. This task has traditionally been slow, expensive and dependent on significant manual effort. The team found that LLMs can complete this work with near-human accuracy while being significantly faster and more cost-effective.
“Our study explores how large language models can take on one of the biggest bottlenecks in digitizing plant collections,” said Yuyang Xie, first author and postdoctoral researcher in the department of biology at UNC. “We are pioneering the use of these tools for georeferencing, a breakthrough that will accelerate the digitization of plant specimens and unlock new possibilities for ecological research.”
The research set out to answer a central question: Can AI automate one of the most time-consuming steps in digitizing natural history collections? The Carolina team found out that yes, it can. LLMs not only performed georeferencing with an error margin of less than 10 kilometers, outperforming traditional methods, but also completed the task at a fraction of the time and cost.
“Recent advances in LLMs can potentially transform the georeferencing process, making it faster and more accurate,” said Xiao Feng, corresponding author and assistant professor in the department of biology at UNC. “This gives researchers unprecedented opportunities to advance our understanding of global biodiversity distributions.”
The implications are significant. An estimated 2–3 billion herbarium specimens exist worldwide, but only a small fraction have been digitized. Without digital records and spatial data, researchers face major limitations in tracking biodiversity loss, understanding species movement under climate change and analyzing ecosystem shifts. By deploying AI-powered georeferencing, scientists may soon be able to rapidly digitize vast natural history collections that have remained largely inaccessible.
“This technology allows us to unlock millions of records that are currently sitting in cabinets,” said Xie. “With the power of LLMs, we can rapidly digitize plant specimen data that will be critical for addressing global environmental challenges.”
Traditional approaches to georeferencing rely on manual interpretation, specialized software, or multiple rounds of expert review. The UNC study is among the first to apply LLMs to this task and to show they can outperform existing methods in accuracy, efficiency, and scalability. This new approach opens the door to digitizing natural history collections at a speed never before possible.
The research paper is available online in Nature Plants at: https://www.nature.com/articles/s41477-025-02162-y
Journal
Nature Plants
Article Title
Using large language models to address the bottleneck of georeferencing natural history collections
Article Publication Date
5-Dec-2025