ClockBench: Even the best AI models can't reliably read the clock
ClockBench AI Benchmark
ClockBench evaluates whether models can read analog clocks - a task that is trivial for humans, but current frontier models struggle with.ClockBench
Questa voce è stata modificata (13 ore fa)
Technology Channel reshared this.