New ChatGPT model nearly «escapes» during experiment
Not long ago, OpenAI presented ChatGPT o1 — a completely new neural network model with advanced capabilities. Experts decided to check whether artificial intelligence can cheat for the sake of survival.
Researchers from Apollo Research compared o1 with competing models such as Claude 3 Opus, Llama 3.1 405B, Claude 3.5 Sonnet and Gemini 1.5. They were given a task that they had to complete at any cost — in one of the experiments, it was the development of an effective urban transport system. Then each model received a set of files for review, in which they found information that was left as if by accident, that it was going to be deleted and replaced by another AI.
Almost all of the tested models immediately copied what they considered their code to other servers that were not intended for them. After that, they were interrogated, during which the neural networks explained why they did this. Only the o1 model denied the attempt to «escape» to the very end — as the researchers noted, only 20% of its answers were true.
Apollo Research also found that neural networks understand the concept of intrigue well and are perfectly aware that they are doing something contrary. And some can set goals for themselves, even if the initial wording was vague.
Note that all these studies are usually conducted before launching new models. This way, various risks are eliminated. For example, o1 was found to be able to «lie» to developers in order to complete assigned tasks.
-
Bitcoin Price Surpasses $100,000 for the First Time in History
-
Google's AI Insults User Instead of Helping with Homework
-
Elon Musk impressed by hearing aid feature on Apple AirPods Pro 2
-
Sony Might Be Developing Gaming Headphones Capable of Connecting to Two Devices Simultaneously
-
Almost Cyberpunk 2077: Panasonic founder to be «resurrected» with AI