One positive outcome of GPT 5 inability to demonstrate AGI-like behavior is that, well, we can now start questioning some claims and assumptions we might be making subconsciously when thinking about the GPT models.
Personally, I think the way we overgeneralize AI-related statements because we sort of assume “it’s all or nothing” leaves any other assumption in the dust.
For example, when talking about the coding agents, we would say: “AI will take over software development”. And the argument would start right away where some folks would support this statement, and other folks would feel absolutely unhappy with their own attempts to use AI for vibe coding. But let’s think for a second: isn’t LLM model ability to answer questions essentially based on the quality of the training data? Yes, it also depends on the model size, but, to begin with, you need good reliable training data. Hence, when trying to utilize a coding agent to develop an app for you, are you asking it to do what it’s been trained to do? Was there even enough training data for the model to reply properly, or is it just making it all up and that’s why you are not able to get good results?
Then one might say that there are reasoning models, and they are supposed to be able to reason. However, imagine something simple. Let’s say there were a hypothetical programming language where “+” operators would have to be literally spelled as “plus”. And let’s say that part were conveniently missed from the training dataset. The model might be able to look at the other languages, it might be able to come up with everything else you need in that code, but it would not be able to reason that “+” has to be spelled differently. Yes, you could give it few-shot examples, potentially, but this was an oversimplified scenario. In a more complex situation, what if you had thousands of code samples for one framework, yet you’d only have ten or so for another. Most likely, generated quality would be very different depending on which framework you’d ask the coding agent to use.
Point being, a coding agent might be good in certain areas, it might be good in some specific languages, and it might be terrible in the other areas/languages. It’s not “all or nothing”.
There is one very specific limitation of those models (at least the way I understand it) – they cannot invent things, they can combine what they know in some unexpected (but still “correct”) ways, but they can’t invent what they absolutely don’t know.
We, humans, can experiment in the real world to see if our “mental” constructs are valid. But what can a model do in that sense? Although, funny enough, as far as software development goes, it’s relatively easy to try and run the code, maybe with the help of a live person. Except that, if the outcome is not satisfactory, the model is not going to be updated right away. Re-training is expensive, so the next developer working with the same model will likely have the same issue until there is a newer model.
Hence there is a lot for everyone to consider.
Companies like Microsoft might want to think about what it means for them. Incorporating AI into all those low-code platforms is going to be painful – there is no “stable” knowledge by now to just train the models on, so someone has to work on preparing all that training material. That’s expensive. An alternative would be to start switching towards more popular frameworks which are used all around by pro devs, which seems to be what Microsoft has started to do with the custom code in Power Pages. Low Code might no be dead, yet, but it might well be, at least in the long term.
Developers may need to consider the same scenario but from a different perspective. Whichever frameworks stay afloat after this, it’ll probably be for one (or both) of the following reasons:
(1) AI will learn to use a framework, and that means there will be enough training data
(2) A particular framework will not have enough training data for the AI, but it’ll be too widespread to just get rid of it, so human developers will need to continue supporting applications initially developed with such frameworks/platforms
I guess in the long term everything will shift towards the first option, but, in the short-to-mid term the second option will be valid, too. Except that I’d probably not be betting on it if I were entering software development today.
However, just to illustrate how the same “all or nothing” approach might be limiting AI applications in other areas, what about using AI in health care? Those who don’t like this idea would say they wouldn’t trust AI with their health, of course, but, come to think of it, you don’t need to trust “all” of your health to the AI.
Is there a particular health issue that can be diagnosed and treated without having to re-invent the wheel? Where all the steps and procedures have been documented, described, and do not require any guess work? AI should likely be able to take that over.
Is there a health issue that is rare and unusual, or where there is still no consensus, or where the doctors are not necessarily certain as to how to diagnose and treat such issue? Basically, it’s a health issue where the training data would be hard to come up with. Well, that’s a case for a live person to get involved, even if AI may still be able to help with the more specific tasks.
Which is, again, why I think GPT 5 is almost a blessing in disguise. If we did not fall into the trap of “all mighty AI is going to take everything over” and stuck to the “it depends, let us try and see where it works well”, we could have saved ourselves a lot of time and energy by now.
PS. Which is, ultimately, why I think GPT 5 is neither AGI nor garbage. It’s, well, another model. Ask me 5 years ago, I’d say it would be a miracle it has gotten that far. Then again, I have a lot of disdain at the moment because of how we’ve all been told to expect AGI if not today then tomorrow at most.
