AI Under Fire: GPT-5 Criticized Amid Health Risks and Misinformation Concerns

GPT-5’s medical advice capabilities are facing serious scrutiny, with users reporting dangerous inaccuracies and hallucinations. The AI model achieves only 46.2% accuracy on HealthBench Hard tests, prompting many to revert to GPT-4 for health guidance. While OpenAI has implemented enhanced safety measures and reduced hallucinations by 45%, experts remain concerned about patient safety risks. Like a doctor mixing up medical textbooks, GPT-5’s current performance suggests there’s much more to this story.

Three major concerns have emerged surrounding OpenAI’s latest language model, GPT-5, as health experts and users raise alarms about its safety protocols and medical advice capabilities.

Despite OpenAI’s implementation of enhanced safeguards and a multilayered defense system, critics question whether these measures adequately protect users from potential biological harm and health misinformation.

The model’s tendency to produce occasional hallucinations and factual inaccuracies in health-related queries has sparked particular concern. Some users have even petitioned to revert to the previous GPT-4 version, finding its predecessor’s medical advice more reliable.

Users are demanding a return to GPT-4, citing dangerous medical hallucinations and unreliability in its successor’s health guidance.

Think of it as getting a second opinion from a doctor who sometimes mixes up their medical textbooks – not exactly the confidence boost you’re looking for.

OpenAI hasn’t taken these criticisms lying down. They’ve equipped GPT-5 with physician-informed reasoning and content classifiers that act like digital bouncers, keeping harmful biological content out of the conversation.

But even with these fancy safety measures, users report error-prone outputs that could impact public health – kind of like having a well-meaning friend who googled your symptoms and is absolutely convinced you have an exotic tropical disease.

The FDA’s potential scrutiny of GPT-5’s health capabilities without proper scientific validation adds another layer of complexity to the situation.

The model’s performance on HealthBench Hard has shown 46.2% accuracy rate, indicating there’s still significant room for improvement in medical responses.

Meanwhile, OpenAI faces stiff competition from rival companies developing similar models, all while trying to retain top talent for their safety research teams.

The debate extends beyond mere technical capabilities. Industry insiders question whether current benchmarks truly capture the real-world risks of health misinformation.

While GPT-5 Pro’s responses have received some positive feedback, the consensus seems clear: when it comes to health advice, this AI might be too green to play doctor.

Recent data shows the model has made substantial progress in reducing misleading information, with a 45% reduction in hallucinations compared to previous versions.

As the technology continues to evolve, finding the right balance between innovation and safety remains a critical challenge for OpenAI and the broader AI community.