How Smart is Elon Musk's «Frighteningly Smart» Chatbot?

xAI has introduced a new language model, Grok 3, which the company's founder, Elon Musk, called "the smartest AI on Earth." The chatbot's creators claim that the new version significantly surpasses the previous one: it processes a larger volume of training data and features new self-correction mechanisms. The Grok 3 demo version launched today, and the first reviews have already surfaced.

What’s New

The key advantage of Grok 3 is access to enhanced computational resources. The chatbot is trained using the Colossus supercomputer: in the initial stages, its creators employed 100,000 NVIDIA H100 GPUs, later doubling that number. In the future, computing power is expected to increase fivefold.

Grok 3 includes built-in self-correction mechanisms. The AI analyzes its own responses, compares them with reference answers, and then makes adjustments. Interestingly, the chatbot receives "rewards" for accurate responses and "penalties" for so-called "hallucinations" — incorrect or fabricated information.

According to xAI representatives, Grok 3 is smarter than other models in math, natural sciences, and programming. Blind tests were used to evaluate response quality, meaning users were unaware of which chatbot was replying.

During the Grok 3 presentation, xAI also showcased Deep Search — a "next-generation" search agent capable of quickly finding and analyzing information online. While similar features exist in competing models, xAI claims that Deep Search is more accurate.

Additionally, Grok 3 will soon receive a voice interface, allowing users to interact with it as if speaking to a real person. Its voice is said to sound more natural and expressive than competing models.

Do you use artificial intelligence for work or study?

Results

How It Performs in Practice

Users on the X social network can access the new chatbot by subscribing to X Premium+ for $50 per month. While there aren’t many early reviews of Grok 3 yet, some stand out.

For instance, a user named Penny2x shared that they created a fully functional game using the new AI version:

Grok 3 was just released. You won't believe it, I've already created a game.

(I got early access THIS MORNING).

This game was 100% created by GROK, I just told it what I wanted, and put the code in the right place.

I just keep asking for adjustments, and it keeps spitting the game out in a single file that I can put on my desktop and run.

The game is changed forever. I've been developing a lot with AI's from every other major AI builder lately, trying to decide what I like best, and grok is a PLAYER. I don't have official benchmarks, and I don't have API setup yet so it's not my normal workflow, but it felt every bit as capable as Sonet, 4o, or anything else.

In the next day or so I'll get it set up as part of my workflow in NVIM and put it to real work.

This is incredible. We live in the future. Everyone is a developer now.

Even more interesting is what OpenAI co-founder Andrej Karpathy thinks about Grok 3. He also tested the new language model. According to Karpathy, in some areas, the chatbot rivals top competitors:

...Grok 3 clearly has an around state-of-the-art thinking model ("Think" button) and did great out of the box on my Settlers of Catan question:

"Create a board game webpage showing a hex grid, just like in the game Settlers of Catan. Each hex grid is numbered from 1..N, where N is the total number of hex tiles. Make it generic, so one can change the number of "rings" using a slider. For example, in Catan, the radius is 3 hexes. Single HTML page, please."

Few models get this right reliably. The top OpenAI thinking models (e.g., o1-pro, at $200/month) get it too, but all of DeepSeek-R1, Gemini 2.0 Flash Thinking, and Claude do not.

Andrej Karpathy also appreciated Grok 3's determination:

I like that the model will attempt to solve the Riemann hypothesis when asked to, similar to DeepSeek-R1 but unlike many other models that give up instantly (o1-pro, Claude, Gemini 2.0 Flash Thinking) and simply say that it is a great unsolved problem. I had to stop it eventually because I felt a bit bad for it, but it showed courage, and who knows, maybe one day...

However, there were some drawbacks. The Deep Search agent raised a few concerns:

…the model doesn't seem to like to reference X as a source by default, though you can explicitly ask it to. A few times I caught it hallucinating URLs that don't exist. A few times it said factual things that I think are incorrect and it didn't provide a citation for it (it probably doesn't exist).

In conclusion, Andrej Karpathy noted that, based on initial impressions, Grok 3 has approached the level of OpenAI’s top models, such as o1-pro ($200 per month), and even slightly surpasses DeepSeek-R1 and Gemini 2.0 Flash Thinking. Considering that the xAI team started developing this AI from scratch about a year ago, the progress is impressive. However, more comprehensive tests are needed before determining whether the chatbot truly deserves the title of "the smartest."

Bias Concerns

It’s no secret that Elon Musk actively participates in U.S. political life and openly expresses his views. Some internet users worry that Grok 3 might also push certain narratives.

These concerns are not unfounded: Musk shared a screenshot showing the chatbot criticizing one news media outlet while praising X as the most reliable source of information. This is despite Grok 3 being positioned as a product with minimal censorship. Many people believe that AI should remain neutral in its judgments.

***

Regardless, the launch of another promising language model marks an important milestone in the ongoing AI race. The higher the competition, the faster progress advances.

What do you think about Grok 3? Share your thoughts in the comments.

How do you feel about the rapid development of AI?

Results
0
Comments 0