After months of refinement and safety testing, OpenAI has deemed GPT-4, its largest and most comprehensive AI model to date, ready for primetime. The San Francisco-based startup unveiled GPT-4 on its research blog on Tuesday.
GPT-4 expands on the capabilities of OpenAI’s most recently deployed large language model, GPT-3.5, which powers the popular ChatGPT chatbot. Most notably, GPT-4 is multimodal, meaning that it can “see,” accepting for the first time both images and text as inputs. (Images as input are currently in research preview while OpenAI conducts additional safety tests.) The system generates text as output.
Uses of GPT-4’s image-to-text functionality might include snapping a photo of your closet, and then asking the system to generate a packing list for an upcoming trip. Similarly, the system is capable of generating a dinner recipe based on a picture of the ingredients available in your fridge.
To casual users of ChatGPT, still reliant for now on text inputs, the platform’s GPT-4 upgrade may be imperceptible. But on tests of reasoning, GPT-4 improves on its predecessor in significant ways. While GPT-3.5 barely passed the uniform bar exam, for example, GPT-4 scores in the 90th percentile, according to the company. GPT-4 performs at a similar level when taking the LSAT, the SAT, and the GRE. (Though it flunks AP English.)
AI systems have historically struggled with reasoning, which requires knowledge of common sense and of the physical world.
“Where people will probably see GPT-4 shine is on doing things that were on the edge of the boundaries of what 3.5 could do previously,” says Nick Ryder, an OpenAI research scientist. “With enabling tasks, for businesses and for consumers, consistency can be very important.”
GPT-4 remains an imperfect technology, particularly with regard to areas that have antagonized critics. The system is 40% more likely to answer prompts truthfully, according to OpenAI—an improvement, but an incremental one.
Efforts to better align GPT-4’s safety performance with human values, meanwhile, have yielded a system that is 82% less likely to provide content that violates OpenAI policies, the startup says.
As if anticipating the company’s critics, OpenAI CEO Sam Altman described GPT-4 as “still flawed, still limited” on Twitter.
But for researchers focused on safety, GPT-4 is now performing well enough that it requires a new set of safety considerations. “People may say, this model is probably right, because it’s right 80% of the time on a specific task, and then that can lead to over reliance, and almost cause people to trust the model in situations which they shouldn’t,” says OpenAI policy researcher Sandhini Agarwal. She and her colleagues are identifying ways that OpenAI’s systems can keep up with this potential shift in user expectations.
GPT-4 arrives as the AI wars are in full swing. Earlier this week, Google announced that it would be incorporating more AI tools into its Workspace software, including Gmail. Amazon is teaming up with AI platform Hugging Face, a hub for open-source models. And startups continue to attract investment; for example, Tome, an AI-based storytelling startup, raised $43 million in Series B funding in February.
OpenAI has been further developing GPT, the large language model that today powers ChatGPT, since the first version launched with 117 million parameters, or calculation nodes, in 2018. GPT-2 followed in 2019, with 1.5 billion parameters, and GPT-3 in 2020, with 175 billion parameters. (OpenAI declined to reveal how many parameters GPT-4 has.) AI models learn to optimize their parameters, which function as configuration variables, during training. By adding parameters, researchers have shown that they can further develop their models’ generalized intelligence.
GPT-4 will be available on ChatGPT Plus, the premium version of ChatGPT, and via API.
Education, which OpenAI CTO Mira Murati described as an area of interest earlier this year, comes into focus with GPT-4, which has been developed alongside partners Duolingo and Khan Academy. Duolingo, for example, has created two new features that leverage GPT-4: an AI conversation partner and an explanatory tool for unpacking mistakes. The features will be packaged as a new, premium subscription tier. Khan Academy plans to introduce a GPT-4-powered assistant that can serve as both a tutor for students and an assistant for teachers.
Already, half of U.S. teachers have tried out ChatGPT, and 40% are using it weekly, according to a survey of 2,000 teachers conducted by the Walton Family Foundation.
“Pandemic learning loss has created an urgent need to think about solutions and approaches to strengthen teaching and learning for students,” says Romy Drucker, director of the education program at the foundation. “Our research tells a very clear story—teachers need better tools and resources to meet this moment, and that’s why they’re among the earliest adopters of ChatGPT.”