Some of us would say GPT 5 has failed to deliver. With all those posts trying to count characters or ask GPT 5 to solve logical problems where it keeps failing more often than not, it’s not that difficult to see why that would be happening.
I don’t necessarily agree, even though one thing is certain – there is no AGI yet.
In that sense, in terms of what was promised by some of the AI companies vs what we ultimately got, well, with all due respect for the efforts, I won’t be too surprised if investors start asking questions. That’s not why I’m writing this, though.
It was a twisted reality anyways where we would be expecting an AI model that’s basically using texts as training data to figure out everything else about this world just by looking at the relationships between text tokens, even if the models can see a lot more relationships than us, humans, can.
Asking such a model to solve logical problems or to count something might not even be reasonable to begin with. Except that, of course, Open AI could have equipped Chat GPT with a simple tool to count those characters after all this time. But that sSo till would not be GPT 5, that would be a tool that would work with any other model that can work with tools.
With all this in mind, perhaps it’s good GPT 5 has confirmed the limitations. From the markets perspective, maybe some stocks will finally stop rising as if there were no limits. Not that I care too much (as long as they don’t fall too much). However, from the technical perspective, maybe it’s time to stop waiting for another magical LLM breakthrough and to start recognizing those LLM models for what they are.
No, not just “stochastic parrots” – that’s not, really, fair, and that would be reducing the models too much. I think a better option would be to recognize them as very powerful text understanding tools which can uncover relationships between different pieces of text faster and better than most of us can.
You want to make an email sound more formal? You want to organize your resume? You want to get a summary of a long blog post? LLM-s can do it in no time.
You want LLM-s to solve some problems for you along the way? That’s where they do require tools, since they can be used to formulate the task in the way those tools would understand, but they can’t solve most of those problems themselves.
In that sense, the fact that GPT 5 did not seem to get that much closer to the AGI does not, really, change anything for “agentic AI” except that now it’s becoming obvious that, in the agentic AI world, tools mean as much or even more than LLM-s.
Which likely means the show will go on, even if the focus and expectations will somewhat shift – instead of expecting the LLM-s to become perfect on their own, agentic AI world will double down on the agents/tools development.
As in, perhaps all of the AI vendors will finally add a tool to count characters to stop that never-ending mockery.
PS. And just to get back to the question that kind of very close to what I do. Will software developers be replaced by the coding agents? I used to think this is where this was heading, but now I think it’s more complicated. With the source code being “test”, LLM-s seem to be very well suited to generating the source code. There is a lot of training data, and it’s even possible to test some of the generated code automatically. However, LLM-s, likely, cannot be relied on to generate code that requires “fresh thinking”. Was there a similar code somewhere? Great, LLM can put it into your application. Was there no similar code at all? LLM will have no clue how to do it. That said, not every requirement requires new code to be invented.
But there is another aspect, too. With Stackoverflow slowly dying, where will all those LLM-s be getting new “reliable” code? Maybe Stackoverflow just needs to keep going for a bit longer, and, once the dust settles, it’ll be alive and well again?