Chat boats can be really impressive when you see them work in which they are good, such as writing a realistic sound text or making pictures that look strange. But try to solve the puzzles that you find in the back of a newspaper, and things can quickly get away from the trains.
Researchers at Colorado Bolder University found this when they challenged various language models to solve Sudoko. And not even standard 9×9 puzzles. An easy 6×6 puzzle was often beyond the capabilities of the LLM without external help (in this case, tools that resolve the specific puzzle).
The more important search came when the models were asked to show their work. For most parts, they could not do. Sometimes they lied. Sometimes they explained things that did not mean. Sometimes they cheated and started talking about the weather.
If General AI Tolls cannot describe their decisions correctly or transparently, we should be careful because we give these things more and more control over our lives and decisions, in Bolder, a professor of computer science at Colorado University and one of the articles published in July, which was published in the search.
“We will really want those explanations to be transparent and reflect why the AI made this decision, and not by giving this explanation to the man that man likes,” said Trividi.
When you make a decision, you can at least try to justify it or explain how you have reached it. It is a fundamental component of society. We are held accountable for our decisions. An AI model may not be able to explain itself correctly or transparently. Will you trust that?
Why LLMS struggles with Sudoko
We have seen that AI models fail in basic sports and puzzles first. In the 1979 attic game, the computer’s opponent has completely crushed the openness of the open (among others) in chess. A recent Apple research paper shows that model can struggle with other puzzles like Hanoi’s tower.
It has to do with the work of LLMS and filling the space in information. These models try to complete the balp on the basis of which their training data or other things have been seen in the past. With Sudoko, the question is one of the logic. AI can try to fill every gap in order, based on which it looks like a reasonable answer, but to solve it properly, it has to look at the whole picture and find a logical setting that turns from the puzzle to the puzzle.
Read more: According to our experts, AI accessories: 29 ways you can work for General AI
Chat boats are also bad in chess for that reason. They find logical next tricks, but do not necessarily think that three, four or five tricks are ahead. The basic skills needed to play chess well. Chat boats sometimes tend to transmit chess pieces in ways that do not really follow the rules, nor put the pieces in a meaningless risk.
You can be expected to be able to resolve LLMS Sudoko because they are computer and it contains the puzzle number, but the puzzles themselves are not really mathematical. They are symbolic. “Sudoco is known as a puzzle that can be done with numbers that are not numbers,” said Fabio Somnazi, one of the CU’s professor and research paper.
I used a sample sign from the researchers’ paper and gave it to Chat GPT. The device showed its work, and repeatedly told me that the answer was before displaying a puzzle that didn’t work, then goes back and corrects it. It was as if the boot was turning into a presentation that continued to be edited in the last second: this is the final answer. No, in fact, no object, These The final answer is. The answer was finally received by trial and error. But the trial and the error is not a practical way for a person to resolve Sudoko in the newspaper. It destroys too much and destroying entertainment.
AI and robots can be good in games if they are designed to play, but general purpose tools such as large language models can struggle with logic puzzles.
AI is struggling to showcase his work
Colorado researchers just didn’t want to see if boats could solve the puzzles. He sought clarification on how the boats worked through them. Matters did not go well.
In examining the model of Open O1-preview reasoning, researchers found that the specifications-even the correctly resolved puzzles-their actions were not correctly explained or justified and the basic terms were false.
“One of the things in which he is good is to provide specifications that seem to be reasonable,” said Maria Pachico, an assistant professor of computer science in the CU. “They align with humans, so they learn to speak as we like it, but whether they are loyal to the need for real steps to solve this thing, where we are struggling a little.”
Sometimes, the specifications were fully irrelevant. Since the work of this paper is over, researchers have continued to test new models. Somnazi said that when he and Trividi were running a model of Openi’s O4 argument through the same tests, at one point, it seemed that it fully abandoned.
He said, “The next question we asked, the answer was the prediction of the weather for Denver.”
–
Explaining yourself is an important skill
When you solve a puzzle, you are definitely able to walk someone else by your thinking. The fact is that these LLMs have failed so brilliantly in this basic task, this is not a minor problem. AI companies permanently talk about “AI agents” that can take steps on your behalf, must be able to explain yourself.
Now consider the types of jobs given to AI, or have been planned in the near future: driving, taxing, deciding business strategies and translating important documents. Imagine what would happen if you, a person did one of them and something went wrong.
“When humans have to put their face in front of their decisions, they will be able to better explain what is the reason for this decision,” Somnazi said.
This is not just a matter of answering a reasonable sound. It needs to be correct. One day, AI’s self -explanation may have to be kept in court, but how can its testimony be taken seriously if it is known to lie? You will not trust anyone who has failed to explain yourself, and you will not be trusted by anyone you found what you want to hear rather than the truth.
“The explanation is very close to the manipulation if it is done for the wrong reason,” said Trividi. “We have to be very careful about the transparency of these explanations.”


