We are fast approaching two years of the generative AI revolution, sparked by the November 2022 release of ChatGPT by OpenAI. So far it’s been a mixed bag.
OpenAI recently announced it had crossed 200 million weekly active users – nothing to be sniffed at, but it got its first 100 million within two months of release. A recent YouGov study found that the inclusion of AI in a product is as likely to turn off a potential purchaser as much as it is to get them to hand over their cash.
Nevertheless, money keeps flowing into the sector, and advances keep coming. OpenAI is casting around investors for money to fund future development that would see the company valued at $150 billion. That would put it on a par with Cisco, Shell and McDonalds. And last week, it unveiled its latest model, called o1, which it has touted as a step change in the development of generative AI.
The o1 model, previously codenamed Strawberry, is designed to reason through decisions, much in the same way humans do. The latest version of the model underpinning ChatGPT is actually a step backwards when it comes to speed of output and the size of the model, which is smaller for the time being. Think of it as GPT-4.5, rather than the rumoured next big iteration, GPT-5, which is reportedly still in development.
Mission: Impossible?
While on paper o1 is a damp squib, it does something that Alex had previously highlighted in this newsletter as an issue with LLM-based chatbots, and which he called the “Tom Cruise problem”. The issue was that researchers could ask a question of ChatGPT one way, but when asked a question that directly related to the initial one – for instance, who is Tom Cruise’s mother? (Answer: Mary Lee Pfeiffer), then being asked who is Mary Lee Pfeiffer’s son? (Answer: Tom Cruise) – it would balk.
Ask o1 that pair of questions and it aces it. It even provides traces of how it gets to the answer – which OpenAI has cannily, and inaccurately because AI models don’t have a brain, called “thoughts”. (If you want to know why anthropomorphising AI models is an issue, check out this story I wrote in February.) When asked the second question, o1 “thought” for four seconds, including tracing out the family connections and confirming details.
So far, so good. OpenAI says o1 can reason. Many are less sure about such a declarative statement like that, but let’s let them have it for the purposes of marketing. That would mean a significant shift in how you can use generative AI: rather than regurgitating facts from its training data, or producing answers it statistically reckons is most likely to please users, it could consider information and respond.
“Could”, however, is the key word. We are still largely in the dark about how these things work – and “we” includes the developers of such tools. OpenAI has said this ability to reason is a big thing – the company has even trotted out a questionable claim that o1 is its most dangerous model yet (see here for how that’s sometimes more marketing spiel than anything). Those who have tried probing the limits of the o1 model seem to agree with their point about the reasoning, but less so with the danger part.
Pay no attention to that man behind the curtain!
Well, sort of. Because the probing can only go so far. To try and understand the chain of thought process that underpins o1 – if you want a good primer, Simon Willison is ever-dependable – users wanting to look under the hood have been trying to get a little more detail on exactly what o1’s “thought” process is. The information users are currently shown is a brief summary of each step in the chain of thought.
And because of that, they’ve been asking the model itself about how it comes up with its answers – though they have also received emails from OpenAI asking them to stop, otherwise their accounts will be suspended.
It all means that we’re left somewhat in the dark. This looks like a transformative step change in the world of AI, and something that could turn the tool from one whose output you have to look at with a side-eye of suspicion to a must-use.
What’s particularly interesting is that OpenAI’s dominance has effectively squeezed out coverage of any and all competitors of late. Mistral, the highly-touted French competitor, released its first multimodal model last week. The Pixtral 12B model adds image recognition to text generation. It should have gained huge plaudits. But OpenAI and o1 sucked up all the oxygen.
Still, it all means the AI train keeps on rolling, and it’s starting to finally live up to its promise. Whether those who tried ChatGPT in its early days and found it lacking can be persuaded to come back to try the newer whizz-bang models is another question.