Search
A spoon on an empty stomach burns 26 lbs in a week

a spoon on an empty stomach burns 26 lbs in a week...

May 19, 2026

12:10 pm

55-year-old woman with baby face. Here's her secret!

55-year-old woman with baby face. here's her secret!...

May 19, 2026

11:58 am

By

AI Researcher Lun Wang Departs DeepMind, Spotlights Gaps in LLM Evaluation

May 19, 2026

12:16

AI Researcher Lun Wang Departs DeepMind, Spotlights Gaps in LLM Evaluation

Artificial intelligence keeps getting smarter. What systems are used to measure that intelligence? Not so much.

That’s the warning from Lun Wang, a senior researcher who recently left Google DeepMind and used his departure to spotlight what he sees as one of the biggest blind spots in modern AI development: the way large language models are evaluated.

In a post shared on X, formerly Twitter, Wang argued that current AI benchmarks are no longer sufficient for measuring increasingly advanced systems. His proposed solution — “self-evolving evals” — could reshape how the industry tests safety, intelligence, and reliability in the next generation of AI models.

Hair Will Grow Back! No Matter How Severe the Baldness

hair will grow back! no matter how severe the baldness...

May 19, 2026

12:06 pm

This Simple Trick Removes All Parasites From Your Body!

this simple trick removes all parasites from your body!...

May 19, 2026

11:53 am

Read This Immediately if You Have Moles or Skin Tags, It's Genius

read this immediately if you have moles or skin tags, it's genius...

May 19, 2026

12:07 pm

America is in Shock! It Helps to Get Rid of Varicose Veins. Do It at Night

america is in shock! it helps to get rid of varicose veins. do it at night...

May 19, 2026

12:01 pm

The concern lands at a critical moment for the AI industry. Companies are racing to build more capable systems, but researchers increasingly worry that existing evaluation methods are too static to keep up.

TL;DR

  • Lun Wang has left Google DeepMind.
  • He says AI evaluation methods are becoming outdated.
  • Current benchmarks fail when models develop new behaviors or learn to “game” tests.
  • Wang proposes “self-evolving evals,” adaptive testing systems that improve alongside AI models.
  • The issue matters because weak evaluations could lead to flawed safety decisions and misleading performance claims.

Why AI Evaluation Has Become a Major Problem

AI benchmarks once served a simple purpose: to compare one model against another.

Researchers would feed systems a standardized set of questions, coding tasks, or reasoning problems. Scores helped determine which models performed better. But as AI capabilities accelerated, those tests began losing their usefulness.

Get rid of joint pain in minutes! This will help you...

get rid of joint pain in minutes! this will help you......

May 19, 2026

11:54 am

Doctor: Іf You Have Nail Fungus, Do This Immediately

doctor: Іf you have nail fungus, do this immediately...

May 19, 2026

11:56 am

After Reading This, You Will Be Rich in 7 Days

after reading this, you will be rich in 7 days...

May 19, 2026

11:51 am

Lose 40 lbs by Consuming Before Bed for a Week

lose 40 lbs by consuming before bed for a week...

May 19, 2026

12:09 pm

Today’s large language models can memorise benchmark datasets, exploit patterns in evaluation methods, or perform impressively in narrow tests while failing badly in real-world scenarios.

That creates a dangerous gap between what AI appears capable of and what it can actually do.

Wang described this mismatch as “the most important unsolved problem” in understanding LLMs. The statement reflects a growing concern among AI researchers that the industry’s measurement tools are lagging behind the technology itself.

Easy Ways to Get Rid of Wrinkles at Home! (Try Now)

easy ways to get rid of wrinkles at home! (try now)...

May 19, 2026

12:00 pm

Hair Will Grow Back! No Matter How Severe the Baldness

hair will grow back! no matter how severe the baldness...

May 19, 2026

11:51 am

4 Signs Telling That Parasites Are Living Inside Your Body

4 signs telling that parasites are living inside your body...

May 19, 2026

12:15 pm

Read This Immediately if You Have Moles or Skin Tags, It's Genius

read this immediately if you have moles or skin tags, it's genius...

May 19, 2026

12:00 pm

The benchmark problem in plain English

Imagine giving students the same exam every year.

Eventually, students memorise the answers instead of learning the subject. Their scores rise, but their understanding may not.

Researchers say something similar is happening with AI models.

Varicose veins will go away ! The easiest way!

varicose veins will go away ! the easiest way!...

May 19, 2026

11:57 am

Forget about joint pain forever – the solution is here!

forget about joint pain forever – the solution is here!...

May 19, 2026

12:09 pm

The Fungus Will Disappear In 1 Day! Write Down An Expert's Recipe

the fungus will disappear in 1 day! write down an expert's recipe...

May 19, 2026

12:09 pm

After Reading This, You Will Be Rich in 7 Days

after reading this, you will be rich in 7 days...

May 19, 2026

11:46 am

Static benchmarks can become predictable. Once models are trained on enough internet data, they may effectively “see” parts of the tests beforehand. That can inflate performance scores without reflecting genuine reasoning ability.

Consider adding an infographic here comparing:

  • Traditional static AI benchmarks
  • Adaptive or evolving evaluation systems
  • Real-world failure examples from current AI models

What Are ‘Self-Evolving Evals’?

Wang’s proposed solution is straightforward in concept but difficult in execution.

A spoon on an empty stomach burns 26 lbs in a week

a spoon on an empty stomach burns 26 lbs in a week...

May 19, 2026

11:54 am

A young face overnight. You have to try this!

a young face overnight. you have to try this!...

May 19, 2026

11:49 am

This method will instantly start hair growth

this method will instantly start hair growth...

May 19, 2026

12:02 pm

4 Signs Telling That Parasites Are Living Inside Your Body

4 signs telling that parasites are living inside your body...

May 19, 2026

12:13 pm

Instead of fixed benchmarks, AI systems would be tested using dynamic evaluations that continuously adapt as models improve.

These “self-evolving evals” would:

  • Generate new testing scenarios automatically
  • Detect emerging capabilities in AI systems
  • Identify hidden weaknesses or deceptive behaviors
  • Adjust difficulty levels over time
  • Prevent models from simply memorizing answers

The goal is to create evaluation systems that evolve at nearly the same pace as the models themselves.

If You Find Moles or Skin Tags on Your Body, Read About This Remedy

if you find moles or skin tags on your body, read about this remedy...

May 19, 2026

12:09 pm

Varicose Veins and Blood Clots Will Disappear Very Quickly ! at Home!

varicose veins and blood clots will disappear very quickly ! at home!...

May 19, 2026

11:49 am

People From America Those With Knee And Hip Pain Should Read This!

people from america those with knee and hip pain should read this!...

May 19, 2026

11:56 am

The Fungus Will Disappear In 1 Day! Write Down An Expert's Recipe

the fungus will disappear in 1 day! write down an expert's recipe...

May 19, 2026

12:02 pm

Why adaptive testing matters

Current AI evaluations often focus on narrow capabilities:

  • Solving math problems
  • Writing code
  • Summarizing text
  • Answering factual questions

But advanced AI systems can display unexpected behaviors outside those controlled settings.

For example:

After Reading This, You Will Be Rich in 7 Days

after reading this, you will be rich in 7 days...

May 19, 2026

12:08 pm

I weighed 332 lbs, and now 109! My diet is very simple trick. 1/2 Cup Of This (Before Bed)

i weighed 332 lbs, and now 109! my diet is very simple trick. 1/2 cup of this (before bed)...

May 19, 2026

12:10 pm

A young face overnight. You have to try this!

a young face overnight. you have to try this!...

May 19, 2026

12:14 pm

Salvation From Baldness Has Been Found! (Do This Before Bed)

salvation from baldness has been found! (do this before bed)...

May 19, 2026

12:11 pm

  • A model may excel in benchmark tests but hallucinate dangerous misinformation in open-ended conversations.
  • It may follow instructions correctly most of the time while quietly failing in edge cases.
  • It may appear aligned during testing, but behave differently under pressure or novel prompts.

Adaptive evaluations could help researchers catch those issues earlier.

The Bigger Concern: AI Safety and Trust

Wang’s warning is not just about technical accuracy. It is also about governance and public trust.

If companies rely on outdated testing methods, they could make poor decisions about:

This Simple Trick Removes All Parasites From Your Body!

this simple trick removes all parasites from your body!...

May 19, 2026

12:11 pm

If You Find Moles or Skin Tags on Your Body, Read About This Remedy. Genius!

if you find moles or skin tags on your body, read about this remedy. genius!...

May 19, 2026

12:01 pm

Varicose veins will go away ! The easiest way!

varicose veins will go away ! the easiest way!...

May 19, 2026

11:53 am

Knee & Joint Pain Will Go Away if You Do This Every Morning!

knee & joint pain will go away if you do this every morning!...

May 19, 2026

11:55 am

  • Deploying new AI systems
  • Granting broader autonomy to models
  • Releasing products to the public
  • Assessing risks tied to misinformation or manipulation

In other words, weak evaluations can create false confidence.

That concern has become increasingly important as AI companies compete to release more powerful models at a faster pace. Many labs now emphasise “frontier AI” development, systems designed to handle increasingly complex reasoning and autonomous tasks.

But measuring those systems remains difficult.

Do This Every Night and the Fungus Will Disappear in 5 Days

do this every night and the fungus will disappear in 5 days...

May 19, 2026

12:07 pm

This is a sign! Money is in sight! Read this and get rich.

this is a sign! money is in sight! read this and get rich....

May 19, 2026

11:52 am

I weighed 332 lbs, and now 109! My diet is very simple trick. 1/2 Cup Of This (Before Bed)

i weighed 332 lbs, and now 109! my diet is very simple trick. 1/2 cup of this (before bed)...

May 19, 2026

12:11 pm

Always look young. This product removes wrinkles instantly!

always look young. this product removes wrinkles instantly!...

May 19, 2026

11:50 am

Why current benchmarks may fail

One major issue is that benchmarks often measure performance snapshots instead of long-term behavior.

A model might pass:

  • A coding test
  • A logic challenge
  • A safety filter check

Yet still behave unpredictably in live environments.

This method will instantly start hair growth

this method will instantly start hair growth...

May 19, 2026

11:52 am

This Simple Trick Removes All Parasites From Your Body!

this simple trick removes all parasites from your body!...

May 19, 2026

12:06 pm

Read This Immediately if You Have Moles or Skin Tags, It's Genius

read this immediately if you have moles or skin tags, it's genius...

May 19, 2026

12:12 pm

Varicose Veins Disappear As if They Never Happened! Use It Before Bed

varicose veins disappear as if they never happened! use it before bed...

May 19, 2026

12:10 pm

Researchers sometimes refer to this as the “capability-evaluation “gap”—the difference between benchmark success and real-world reliability.

AI Researchers Are Increasingly Questioning Benchmarks

Wang is not alone in raising concerns about AI evaluation.

Across the industry, researchers have started questioning whether benchmark culture has distorted AI progress.

People From America Those With Knee And Hip Pain Should Read This!

people from america those with knee and hip pain should read this!...

May 19, 2026

11:48 am

The Fungus Will Disappear in 1 Day! Write a Specialist's Prescription

the fungus will disappear in 1 day! write a specialist's prescription...

May 19, 2026

11:47 am

Carry this with you and luck will find you.

carry this with you and luck will find you....

May 19, 2026

12:01 pm

My weight was 198 lbs, and now it’s 128 lbs! My diet is simple. 1/2 Cup Of This (Before Bed)

my weight was 198 lbs, and now it’s 128 lbs! my diet is simple. 1/2 cup of this (before bed)...

May 19, 2026

11:59 am

Some critics argue that companies optimize models specifically to score well on popular public tests. That can create leaderboard-driven development instead of genuine advances in reasoning or safety.

Others warn that many benchmarks become obsolete too quickly.

For example:

An unusual way of rejuvenation. Better than botox!

an unusual way of rejuvenation. better than botox!...

May 19, 2026

12:05 pm

Hair grows 2 cm per day! Just do this

hair grows 2 cm per day! just do this...

May 19, 2026

12:02 pm

4 Signs Telling That Parasites Are Living Inside Your Body

4 signs telling that parasites are living inside your body...

May 19, 2026

12:12 pm

If You Find Moles or Skin Tags on Your Body, Read About This Remedy. Genius!

if you find moles or skin tags on your body, read about this remedy. genius!...

May 19, 2026

12:12 pm

  • A benchmark released in 2023 may already be saturated by 2025 training data.
  • Public datasets can leak into model training pipelines.
  • Some evaluations fail to measure multimodal reasoning, long-term planning, or deceptive behavior.

This is partly why companies have started building private evaluation systems that are harder for models to anticipate.

Still, no universal standard exists.

What Happens Next?

The idea of self-evolving evaluations is still largely conceptual. Building them would require:

Varicose Veins and Blood Clots Will Disappear Very Quickly ! at Home!

varicose veins and blood clots will disappear very quickly ! at home!...

May 19, 2026

12:07 pm

Knee Pain Gone! I Didn't Believe It, But I Tried It!

knee pain gone! i didn't believe it, but i tried it!...

May 19, 2026

12:15 pm

The Fungus Will Disappear in 1 Day! Write a Specialist's Prescription

the fungus will disappear in 1 day! write a specialist's prescription...

May 19, 2026

11:59 am

Say Goodbye to Debt and Become Rich, Just Carry Them in Your Wallet

say goodbye to debt and become rich, just carry them in your wallet...

May 19, 2026

11:47 am

  • Automated test generation
  • Constant dataset refreshes
  • Human oversight
  • Adversarial testing systems
  • Stronger safety auditing frameworks

It would also require cooperation across the AI industry, something that has historically been difficult in competitive technology races.

Yet the push for better evaluations is likely to intensify.

As AI systems gain stronger reasoning abilities and broader autonomy, the industry may no longer be able to rely on old-style benchmarks designed for earlier generations of models.

A spoon on an empty stomach burns 26 lbs in a week

a spoon on an empty stomach burns 26 lbs in a week...

May 19, 2026

12:00 pm

Stars are now ditching botox thanks to this new product...

stars are now ditching botox thanks to this new product......

May 19, 2026

11:51 am

Hair Grows Back in 2 Weeks! at Any Stage of Baldness

hair grows back in 2 weeks! at any stage of baldness...

May 19, 2026

11:57 am

Doctor: A Teaspoon Kills All Parasites In Your Body!

doctor: a teaspoon kills all parasites in your body!...

May 19, 2026

11:53 am

Wang’s departure from Google DeepMind adds extra visibility to that debate. His comments highlight a growing realization inside the AI community: building smarter models is only half the challenge.

Understanding them may be even harder.

Why This Story Matters Beyond Silicon Valley

The benchmark debate may sound technical, but it affects everyday users more than most people realise.

Read This Immediately if You Have Moles or Skin Tags, It's Genius

read this immediately if you have moles or skin tags, it's genius...

May 19, 2026

11:51 am

Varicose Veins Will Disappear in the Morning! Read!

varicose veins will disappear in the morning! read!...

May 19, 2026

12:00 pm

The Secret Way to Get Rid of Knee and Joint Pain!

the secret way to get rid of knee and joint pain!...

May 19, 2026

11:55 am

The Fungus Will Disappear in 1 Day! Write a Specialist's Prescription

the fungus will disappear in 1 day! write a specialist's prescription...

May 19, 2026

11:47 am

AI evaluations influence:

  • Which tools do companies release?
  • How safe are chatbots considered?
  • Whether governments trust AI systems
  • How businesses integrate AI into workplaces
  • What risks regulators prioritize

If evaluation systems fail, the consequences can spread quickly, from misinformation problems to flawed automated decision-making.

That is why researchers increasingly see evaluation not as a side task but as a core part of responsible AI development.

And according to Wang, the industry is running out of time to modernize it.