Benvenuto nel Poliverso

Pro via Technology

2 settimane fa • •

Pro
2 settimane fa • •

Top AI models fail spectacularly when faced with slightly altered medical questions

::: spoiler Comments
- Reddit.
:::

Our findings reveal a robustness gap for LLMs in medical reasoning, demonstrating that evaluating these systems requires looking beyond standard accuracy metrics to assess their true reasoning capabilities.^6^ When forced to reason beyond familiar answer patterns, all models demonstrate declines in accuracy, challenging claims of artificial intelligence’s readiness for autonomous clinical deployment.
A system dropping from 80% to 42% accuracy when confronted with a pattern disruption would be unreliable in clinical settings, where novel presentations are common. The results suggest that these systems are more brittle than their benchmark scores suggest.

#technology

Questa voce è stata modificata (2 settimane fa)

copymyjalopy likes this.

⇧

Pro via Technology

Pro 2 settimane fa • •

Top AI models fail spectacularly when faced with slightly altered medical questions

Pro
2 settimane fa • •