Achieving >97% on GSM8K: Deeply understanding the problems makes LLMs better solvers for math word problems
Peer-Reviewed Publication
Updates every hour. Last Updated: 10-Apr-2026 18:15 ET (10-Apr-2026 22:15 GMT/UTC)
Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors. Prior studies involve addressing the calculation errors and step-missing errors, but neglect the semantic misunderstanding errors, which is the major factor limiting the reasoning performance of LLMs.
With the rapid advancement of Large Language Models (LLMs), an increasing number of researchers are focusing on Generative Recommender Systems (GRSs). Unlike traditional recommendation systems that rely on fixed candidate sets, GRSs leverage generative capabilities, making them more effective in exploring user interests.
Researchers have developed a fully real-valued, end-to-end optical neural network chip that overcomes the physical limitation of non-negative light intensity to enable efficient, on-chip image generation.
A new book by a Cambridge engineer and an Oxford theologian argues that our faith in technology to solve the climate crisis is distracting us from the uncomfortable truth: that saving the planet is neither a task for future technologies nor for world leaders alone. It is something all of us — especially those with comfortable lives — can and must do, now.
A novel bioengineering strategy utilizing peptide display technology on the AAV1 capsid has successfully generated next-generation viral vectors with significantly enhanced specificity and efficiency for inner ear cell transduction, offering a promising advance toward targeted gene therapies for hearing and balance disorders.