They don't.
LLM's aren't new technology, they've existed for 60 years. We were promised that "moar compute" was going to solve all of the known problems with these systems and it's just not happening. Whenever we get given metrics that they're improving at (the new one is "look at these competitions vs humans they've improved and/or are better than humans at), it turns out that the metrics have been carefully designed for them to excel at vs how the work is expected to be done in a functioning business environment.
The core problems with LLM's remain. They're just not reliable if you need actual accuracy in your work and can't turn in iterations that invent data, lie, hallucinate, etc.