Your LLM Won’t Stop Lying Any Time Soon
Researchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS. Anyone who has used an LLM has encountered it; some people seem to find it behind every prompt, while others dismiss it as an occasional annoyance, but nobody claims it doesn’t happen. A recent paper by researchers at OpenAI (PDF) tries to drill down a bit deeper into just why that happens, and if anything can be done.
Spoiler alert: not really. Not unless we completely re-think the way we’re training these models, anyway. The analogy used in the conclusion is to an undergraduate in an exam room. Every right answer is going to get a point, but wrong answers aren’t penalized– so why the heck not guess? You might not pass an exam that way going in blind, but if you have studied (i.e., sucked up the entire internet without permission for training data) then you might get a few extra points. For an LLM’s training, like a student’s final grade, every point scored on the exam is a good point.
The problem is that if you reward “I don’t know” in training, you may eventually produce a degenerate model that responds to every prompt with “IDK”. Technically, that’s true– the model is a stochastic mechanism; it doesn’t “know” anything. It’s also completely useless. Unlike some other studies, however, the authors do not conclude that so-called hallucinations are an inevitable result of the stochastic nature of LLMs.
While that may be true, they point out it’s only the case for “base models”– pure LLMs. If you wrap the LLM with a “dumb” program able to parse information into a calculator, for example, suddenly the blasted thing can pretend to count. (That’s how undergrads do it these days, too.) You can also provide the LLM with a cheat-sheet of facts to reference instead of hallucinating; it sounds like what’s being proposed is a hybrid between an LLM and the sort of expert system you used to use Wolfram Alpha to access. (A combo we’ve covered before.)
In that case, however, some skeptics might wonder why bother with the LLM at all, if the knowledge in the expert system is “good enough.” (Having seen one AI boom before, we can say with the judgement of history that the knowledge in an expert system isn’t good enough often enough to make many viable products.)
Unfortunately, that “easy” solution runs back into the issue of grading: if you want your model to do well on the scoreboards and beat ChatGPT or DeepSeek at popular benchmarks, there’s a certain amount of “teaching to the test” involved, and a model that occasionally makes stuff up will apparently do better on the benchmarks than one that refuses to guess. The obvious solution, as the authors propose, is changing the benchmarks.
If you’re interested in AI (and who isn’t, these days?), the paper makes an interesting, read. Interesting if, perhaps disheartening if you were hoping the LLMs would graduate from their eternal internship any time soon.
Via ComputerWorld, by way of whereisyouredat.
Punti di contatto tra DMA e GDPR: ecco le linee guida congiunte di EDPB e Commissione UE
@Informatica (Italy e non Italy 😁)
L’European Data Protection Board e la Commissione europea hanno approvato, lo scorso 9 ottobre, un documento che esprime gli orientamenti comuni tra le due normative sui dati: il Digital Market Act e il GDPR. Il tutto al fine di
Gazzetta del Cadavere reshared this.
PLA Gears Fail To Fail In 3D Printed Bicycle Drivetrain
Anyone who has ever snapped a chain or a crank knows how much torque a bicycle’s power train has to absorb on a daily basis; it’s really more than one might naively expect. For that reason, [Well Done Tips]’s idea of 3D printing a gear chain from PLA did not seem like the most promising of hacks to us.
Contrary to expectations, though, it actually worked; at the end of the video (at about 13:25), he’s on camera going 20 km/h, which while not speedy, is faster than we thought the fixed gearing would hold up. The gears themselves, as you can see, are simple spurs, and were modeled in Fusion360 using a handy auto-magical gear tool. The idler gears are held in place by a steel bar he welded to the frame, and are rolling on good old-fashioned skateboard bearings–two each. (Steel ones, not 3D printed bearings.) The healthy width of the spur gears probably goes a long way to explaining how this contraption is able to survive the test ride.
The drive gear at the wheel is steel-reinforced by part of the donor bike’s cassette, as [Well Done Tips] recognized that the shallow splines on the freewheel hub were not exactly an ideal fit for PLA. He does complain of a squeaking noise during the test ride, and we can’t help but wonder if switching to helical gears might help with that. That or perhaps a bit of lubricant, as he’s currently riding the gears dry. (Given that he, too, expected them to break the moment his foot hit the pedal, we can’t hardly blame him not wanting to bother with grease.)
We’ve seen studies suggesting PLA might not be the best choice of plastic for this application; if this wasn’t just a fun hack for a YouTube video, we’d expect nylon would be his best bet. Even then, it’d still be a hack, not a reliable form of transportation. Good thing this isn’t reliable-transportation-a-day!
youtube.com/embed/PHHgMWuk23o?…