An AI Council Just Aced the US Medical Licensing Exam

Edd Gent in Singularity Hub:

Despite their usefulness, large language models still have a reliability problem. A new study shows that a team of AIs working together can score up to 97 percent on US medical licensing exams, outperforming any single AI. While recent progress in large language models (LLMs) has led to systems capable of passing professional and academic tests, their performance remains inconsistent. They’re still prone to hallucinations—plausible sounding but incorrect statements—which has limited their use in high-stakes area like medicine and finance.

Nonetheless, LLMs have scored impressive results on medical exams, suggesting the technology could be useful in this area if their inconsistencies can be controlled. Now, researchers have shown that getting a “council” of five AI models to deliberate over their answers rather than working alone can lead to record-breaking scores in the US Medical Licensing Examination (USMLE). “Our study shows that when multiple AIs deliberate together, they achieve the highest-ever performance on medical licensing exams,” Yahya Shaikh, from John Hopkins University, said in a press release. “This demonstrates the power of collaboration and dialogue between AI systems to reach more accurate and reliable answers.”

More here.

Enjoying the content on 3QD? Help keep us going by donating now.