OpenAI has raised fresh concerns over the reliability of AI chatbots, warning that hallucinations, plausible but false responses, remain a major challenge despite technological advancements.
According to a newly published research paper, large language models, including GPT-5 and ChatGPT, often generate convincing but inaccurate information due to limitations in the pretraining process and evaluation methods. The paper cites examples where a chatbot repeatedly gave incorrect answers about a researcher’s academic work and personal details.
Researchers argue that current accuracy-based evaluation systems encourage models to “guess” rather than admit uncertainty. They propose new testing methods that penalize confident errors while rewarding appropriate uncertainty, aiming to make AI responses more trustworthy.