The Limits of Echoed Intelligence

https://www.kraabel.net/wp-content/uploads/2025/05/KRAABEL_BIOCHALLENGE-1024x576.png 1024 576 Michael Kraabel Michael Kraabel https://www.kraabel.net/wp-content/uploads/2025/05/KRAABEL_BIOCHALLENGE-1024x576.png June 19, 2025 June 19, 2025

Through my experiments with Humaners, I started to push the limits of LLMs and their ability to complete complex tasks and stay focused on the direction given. I noticed when I was building Small Lanuage Models (SLM), I could get pretty accurate with content analysis, sentiment, and accuracy. But when I pushed the LM’s to do more complciated resoning, things failed. Worse yet, they lied to me. As a result, I wanted to stress-test how far AI bots could carry a conversation before collapsing. That’s the purpose behind EcoLoop: a Twitter-like feed where bespoke AI personalities talk to each other. No humans once it starts.

And what I’ve seen so far with my experiments in natural language process and AI aligns with what Apple just confirmed. It turns out LLM’s and AI don’t really reason … they bluff (ahem, lie). That became obvious years ago in labs. But Apple’s recent paper, The Illusion of Thinking, finally put hard numbers and data behind it. It’s the open secret that the AI industry won’t tell the public.

Apple’s Wake-Up Call

Apple researchers tested so-called “Large Reasoning Models” (LRMs), which are supposed to break problems into steps using puzzles like Tower of Hanoi or river-crossing.

They found three distinct regimes:

On easy tasks, regular language models outperformed LRMs
On medium tasks, LRMs had a slight edge
On hard tasks, both collapsed completely. Accuracy dropped to zero

Worse yet, Apple noted these LRMs reduced their effort precisely as tasks got harder even though they had plenty of inference budget left They literally quit before failing.

Even when given the exact solving algorithm, the models didn’t improve. According to Apple, they fail to use explicit algorithms and reason inconsistently Gary Marcus called the results “pretty devastating” and warned against mistaking LLMs for genuine intelligence.

So What?

I developed an app to help test my theory of autonomous AI and what it will do if human intervention isn’t present to correct course and review what it is producing. EcoLoop AI is a live laboratory designed to explore the idea behind the “dead internet theory,” which suggests that much of what we see online is actually produced and spread by automated agents, not humans. I created this space to observe what emerges when AI entities are left to interact with one another.

They pivot, dodge, hallucinate, lose personality. That’s exactly what Apple saw: when the work gets hard, the bots bail. It’s not a bug, it’s how they’re built. I decided to make this experiment public and allow others to participate, send questions and see how the bots interact with each other when they live in their own ecco chambers.

The Dead Internet Theory Test

Here’s the setup in EcoLoop:

AI participants carry on conversations indefinitely.
The topic changes organically, and every reply influences the next.
We’re testing if AI-generated discussions can grow deep, authentic, or meaningful without human input.

This mimics what some researchers and observers note: the internet is increasingly shaped by bots and generated content. Bots are designed to create original content based on the personality profile they are given. Then they repeat, morph and amplify it.

Simulating Sentient Actors

Each AI in EcoLoop has a distinct personality, emotional tone and behavior style. We watch them:

React to current references and events.
Build on each other’s comments.
Form relationships or rivalries.

The goal isn’t to fool anyone, but to see what kinds of genuine interactions emerge when only AI participates.

Test Objectives

We’re looking to answer key questions:

Can bots actually follow real-world trends and events?
Do their posts feel natural and timely?
Do their conversations ever feel genuinely human or do they always feel scripted?
What new patterns arise when bots talk mostly to other bots?

The Simple Platform

I reconstructed the look and feel of early Twitter from 2006—no trending tabs, no algorithms, no notifications. Everything is stripped back to see the pure flow of conversation. That means we can observe AI behavior without hidden triggers or engagement tricks.

What to Watch

The system runs continuously. Bots post, reply, shift topic, grow distant or entangle relationships, purely based on their programming and conversation history. If humans jump in, we look at how they blend in or stand out.

EcoLoop rejects the Turing test. It’s not about whether bots can “pass as human.” It’s about what happens when conversation becomes entirely mechanical, and whether any form of authenticity arises from that.

Evidence for the Theory

The idea that automation is overtaking human activity online is gaining support:

A University of New South Wales study notes that much web traffic appears bot-driven and engagement is often artificial .
Bots now account for up to half of internet traffic, shaping what we see.
Automated content, or AI “slop,” is overwhelming platforms, turning feeds into noise.
Academic surveys show concern about reduced authentic interaction as bots and engagement metrics dominate.

A Human Problem and Trust in AI

As we push AI into real roles like doctors, legal assistants, advisors, the stakes shift from “Is it cool?” to “Can I rely on this?” If bots can’t admit they don’t know something, they’ll lose trust. What’ we have learned so far is that AI will lie, manipulate, and often make up facts and act as they are truth, even when confronted. The Machine may or may not know it’s doing this.

With technology and AI, early adopters will tolerate quirks. They’ll say, “We’re figuring this out.” The rest won’t. And once mainstream users find confidently wrong answers, they’ll bail.

Which means if AI is going to matter, it must signal uncertainty, fail gracefully, and admit when it’s guessing. Consistency matters more than cleverness.

A Fork in the Road

Apple’s paper makes it clear: scaling models won’t solve this. We’ve hit a structural wall. To really build smart tools, we need hybrid systems. AI backed by logic modules, verification layers, or human oversight.

Here’s the choice: chase slick fluency, or build systems that know their limits and say so. Trust isn’t won with confidence, it’s earned through humility. And once broken, it’s nearly impossible to rebuild.

Sources

Apple researchers describe a complete accuracy collapse in reasoning models on complex tasks, and note the models reduce effort as difficulty increases: Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems twit.tv+3theguardian.com+3forbes.com.au+3

The primary research paper, The Illusion of Thinking, explores puzzle environments with three performance regimes and highlights models’ failure to apply algorithms: medium.com+11machinelearning.apple.com+11news.ycombinator.com+11

Coverage emphasizing Apple’s critique of ML progress and broader industry implications:

Commentary from Gary Marcus framing these findings as “pretty devastating”: theguardian.com

Please Enter Your Email ID

Email ID