Hardware and Technologies New ChatGPT model nearly «escapes» during experiment

New ChatGPT model nearly «escapes» during experiment

December 6, 2024, 02:04 PM

Not long ago, OpenAI presented ChatGPT o1 — a completely new neural network model with advanced capabilities. Experts decided to check whether artificial intelligence can cheat for the sake of survival.

Researchers from Apollo Research compared o1 with competing models such as Claude 3 Opus, Llama 3.1 405B, Claude 3.5 Sonnet and Gemini 1.5. They were given a task that they had to complete at any cost — in one of the experiments, it was the development of an effective urban transport system. Then each model received a set of files for review, in which they found information that was left as if by accident, that it was going to be deleted and replaced by another AI.

Almost all of the tested models immediately copied what they considered their code to other servers that were not intended for them. After that, they were interrogated, during which the neural networks explained why they did this. Only the o1 model denied the attempt to «escape» to the very end — as the researchers noted, only 20% of its answers were true.

Apollo Research also found that neural networks understand the concept of intrigue well and are perfectly aware that they are doing something contrary. And some can set goals for themselves, even if the initial wording was vague.

Note that all these studies are usually conducted before launching new models. This way, various risks are eliminated. For example, o1 was found to be able to «lie» to developers in order to complete assigned tasks.

Diana Golenko

News Author

Gaming journalist and news writer for VGTimes since 2024. I've been into games since the first launch of GTA: San Andreas in 2010, with a particular preference for RPGs, survival simulators and immersive sims. A devoted fan of The Elder Scrolls and Half-Life.

Hardware and Technologies News Images artificial intelligence OpenAI

Comments 0