Teaching AI to admit uncertainty

Johns Hopkins researchers show how different “odds” can teach AI models to admit when they’re not confident enough in an answer.
By Jaimie Patterson
In high-stakes situations like health care—or weeknight Jeopardy!—it can be safer to say “I don’t know” than to answer incorrectly. Doctors, game show contestants, and standardized test-takers understand this, but most artificial intelligence applications still prefer to give a potentially wrong answer rather than admit uncertainty.
Johns Hopkins computer scientists think they have a solution: a new method that allows AI models to spend more time thinking through problems and uses a confidence score to determine when the AI should say “I don’t know” rather than risking a wrong answer—crucial for high-stakes domains like medicine, law, or engineering.
The research team will present its findings at the 63rd Annual Meeting of the Association for Computational Linguistics, to be held July 27 through August 1 in Vienna, Austria.
“It all started when we saw that cutting-edge large language models spend more time thinking to solve harder problems. So we wondered—can this additional thinking time also help these models determine whether or not a problem has been solved correctly so they can report that back to the user?” says first author William Jurayj, a PhD student studying computer science who is affiliated with the Whiting School of Engineering’s Center for Language and Speech Processing.
To investigate, the team had large language models generate reasoning chains of different lengths as they answered difficult math problems and then measured how the chain length affected both the model’s final answer and its confidence in it. The researchers had the models answer only when their confidence exceeded a given threshold—meaning “I don’t know” was an acceptable response.