Google finds AI chatbots are only 69% accurate… at best

Written by on December 16, 2025

AI chatbots still get one in three answers wrong

phone-showing-ai-chatbots


Solen Feyissa / Unsplash

Google has published a blunt assessment of how reliable today’s AI chatbots really are, and the numbers are not flattering. Using its newly introduced FACTS Benchmark Suite, the company found that even the best AI models struggle to break past a 70% factual accuracy rate. The top performer, Gemini 3 Pro, reached 69% overall accuracy, while other leading systems from OpenAI, Anthropic, and xAI scored even lower. The takeaway is simple and uncomfortable. These chatbots still get roughly one out of every three answers wrong, even when they sound confident doing it.

The benchmark matters because most existing AI tests focus on whether a model can complete a task, not whether the information it produces is actually true. For industries like finance, healthcare, and law, that gap can be costly. A fluent response that sounds confident but contains errors can do real damage, especially when users assume the chatbot knows what it is talking about.

What Google’s accuracy test reveals

plan-trip-book-hotels-in-google-search-ai-mode
Google

The FACTS Benchmark Suite was built by Google’s FACTS team with Kaggle to directly test factual accuracy across four real-world use. One test measures parametric knowledge, which checks whether a model can answer fact-based questions using only what it learned during training. Another evaluates search performance, testing how well models use web tools to retrieve accurate information. A third focuses on grounding, meaning whether the model sticks to a provided document without adding false details. The fourth examines multimodal understanding, such as reading charts, diagrams, and images correctly.

ai-accuracy-rankings-by-facts-google
Google

The results show sharp differences between models. Gemini 3 Pro led the leaderboard with a 69% FACTS score, followed by Gemini 2.5 Pro and OpenAI’s ChatGPT-5 nearly at 62% percent. Claude 4.5 Opus landed at ~51% percent, while Grok 4 scored ~54%. Multimodal tasks were the weakest area across the board, with accuracy often below 50%. This matters because these tasks involve reading charts, diagrams, or images, where a chatbot could confidently misread a sales graph or pull the wrong number from a document, leading to mistakes that are easy to miss but hard to undo.

The takeaway isn’t that chatbots are useless, but blind trust is risky. Google’s own data suggests AI is improving, yet it still needs verification, guardrails, and human oversight before it can be treated as a reliable source of truth.

Manisha Priyadarshini

Manisha likes to cover technology that is a part of everyday life, from smartphones & apps to gaming & streaming…

I found a Mac tool that you’ll love as a sleeker dock with extra tricks

The Mac’s dock has remained static over the years. Loopty replaces it with a lot practical pizzazz.

Loopty app switcher for Mac

The shift to macOS Tahoe introduced a whole bunch of upgrades to core Mac systems. Spotlight, in particular, got some noteworthy tweaks such as support for custom shortcuts and an improved AI-powered search system. The disappearance of LaunchPad, however, proved to be a controversial change.

Apple also didn’t pay attention to deeper cross-app integrations that have made apps such as RayCast a hot favorite in the user community. The new Spotlight wants to be the hub of your core Mac activities, but not without its fair share of clutter and a few big omissions.


Read more

AMD to play safe at CES 2026, but it may still deserve your attention

AMD’s CES 2026 keynote is shaping up to be far more about AI strategy than shiny new consumer chips.

AMD CEO Dr. Lisa Su holding up a chip at Computex 2024.

For years, the Consumer Electronics Show (CES) has evolved from a consumer-electronics showcase to a global premier launchpad for chipmakers, turning the event into a key battleground for leadership in computing and AI hardware. The upcoming 2026 edition is expected to be no less. 

AMD has confirmed that President and CEO, Dr. Lisa Su will deliver the opening keynote on January 5, outlining the company’s AI vision across cloud, enterprise, edge, and consumer devices. While we aren’t expecting any major announcements like a new GPU generation or a surprise Zen 6 tease (though we can still dream), expect some important launches. 


Read more

ChatGPT gets major update (GPT-5.2) as OpenAI battles Google in AI arms race

OpenAI’s GPT-5.2 upgrade boosts real-world productivity just as Google escalates the competition with its latest Deep Research model.

featured image OpenAI GPT-5.2

OpenAI has officially launched GPT-5.2, the latest iteration of its flagship AI model series and its answer to Google’s Gemini 3. The new model is meant to be faster, smarter, and more helpful for the complex, real-world queries with improvements in reasoning and long-document processing.

It is rolling out to ChatGPT’s paid subscribers as part of the Plus, Pro, Team, and Enterprise tiers, and developers via API. OpenAI provides GPT-5.2 in three models: GPT-5.2 Instant, GPT-5.2 Thinking, and GPT-5.2 Pro (is it just me, or does the naming sound similar to that of the Gemini models?).


Read more

Read More


Reader's opinions

Leave a Reply


Current track

Title

Artist