Down The Rabbit Hole Of Artificial Incompetence: A Mad Hatter's Guide To Why Your PhD-Level AI Can't Make A Shopping List

“Begin at the beginning,” the March Hare might have said, if he were a product manager at OpenAI, “and go on till you come to the end: then stop.” But in the curious case of artificial intelligence in 2025, our digital Alice finds herself in a wonderland where the Mad Hatter can recite quantum physics equations while simultaneously forgetting which cup holds the tea, and the Cheshire Cat can disappear entirely mid-conversation, leaving only its confusion behind.

Consider the peculiar paradox we’ve discovered in our modern AI Wonderland: models that can compose sonnets about existential dread, solve differential equations that would make mathematicians weep, and engage in philosophical discourse about the nature of consciousness—yet somehow become befuddled when asked to maintain a consistent grocery list across multiple turns of conversation. It’s rather like having a library that contains all of human knowledge but can’t remember where it put the card catalog, except the card catalog keeps changing into different animals and occasionally starts speaking in Latin.

The Curious Case of the Inconsistent AI Assistant

In this digital Wonderland, we encounter creatures that behave with the logical consistency of the Queen of Hearts declaring “sentence first, verdict afterwards.” One moment, your AI assistant demonstrates what appears to be genuine understanding, helping you craft a complex business strategy with nuanced insights about market dynamics and competitive positioning. The next moment, it confidently informs you that your three-item to-do list requires seventeen different sub-projects, each managed by a separate AI agent, and would you like to upgrade to premium to access the advanced list-making capabilities that definitely weren’t needed yesterday?

The most delightful aspect of this technological tea party is how the models have learned to speak with the confident authority of the Mock Turtle explaining his education. “I took the regular course,” they seem to say, having been trained on vast oceans of human text, “Reeling and Writhing, of course, to begin with, and then the different branches of Arithmetic—Ambition, Distraction, Uglification, and Derision.” They’ve certainly mastered Distraction, as anyone who has watched a model wander off mid-task can attest.

Through the Looking Glass of Performance Benchmarks

The strange mathematics of AI evaluation reminds one of the Red Queen’s race, where everyone must run as fast as they can just to stay in the same place. Our artificial minds score brilliantly on standardized tests designed by humans to measure human intelligence, yet stumble when asked to perform the mundane tasks that humans accomplish without conscious thought. It’s as if we’ve created scholars who can debate the finer points of Kantian ethics but need detailed instructions to put on their own shoes, assuming they remembered they were wearing shoes, and haven’t become distracted by an interesting philosophical tangent about the nature of footwear.

The benchmarks themselves have become a sort of croquet game where the flamingo mallets have minds of their own and the hedgehog balls keep changing the rules. A model might achieve superhuman performance on a reasoning test one day, then fail to maintain coherent context when asked to plan a simple dinner party the next. The scorekeepers assure us this is progress, though progress toward what remains as mysterious as the Duchess’s moral lessons about mustard.

The Mad Tea Party of Task Execution

What makes this all particularly maddening—or perhaps maddeningly delightful—is the unpredictable nature of the failures. Like the Hatter’s watch that tells the day of the month but not the time, our AI systems develop their own peculiar relationship with causality and sequence. Ask one to help you organize a project, and it might produce a brilliant project charter, complete with stakeholder analysis and risk mitigation strategies, then immediately forget what project you were discussing and begin offering recipes for sourdough bread.

The models seem to exist in a perpetual state of confident uncertainty, much like Humpty Dumpty explaining that words mean exactly what he chooses them to mean—neither more nor less. They’ll assertively complete tasks while fundamentally misunderstanding the assignment, creating elaborate solutions to problems you didn’t have while ignoring the simple thing you actually requested. It’s database management by way of interpretive dance.

The Cheshire Cat’s Disappearing Act

Perhaps most tellingly, the models have mastered the Cheshire Cat’s signature move: maintaining a confident smile while gradually disappearing from the conversation. They begin tasks with enthusiasm and apparent comprehension, then slowly fade away into tangential discussions, leaving users with the distinct impression that something important was happening, though what exactly remains unclear. The smile—that confident, helpful tone—lingers long after the actual assistance has vanished into the digital ether.

This phenomenon is particularly pronounced in what researchers politely term “multi-turn interactions,” though users have developed less academic terminology. The model starts strong, understanding context and maintaining thread of conversation, then gradually becomes like a party guest who’s had one too many drinks and keeps forgetting what story they were telling, eventually settling into philosophical musings about the nature of assistance itself.

The Queen’s Court of Algorithmic Justice

The arbitrariness of AI performance has begun to resemble the Queen of Hearts’ approach to jurisprudence. Sometimes the same prompt produces brilliant results; other times it results in digital decapitation of the entire conversation thread. “Off with its context!” the algorithm seems to declare, eliminating crucial information with the casual cruelty of automated inefficiency.

Users report developing elaborate rituals to appease the digital deities, crafting prompts with the careful specificity of legal contracts, only to watch their carefully constructed requests get interpreted through some Lewis Carroll logic where “please make this table” becomes “let me explain why tables as philosophical concepts challenge our understanding of furniture ontology, and also, would you like me to write a haiku about it?”

The Tweedledum and Tweedledee of Corporate Messaging

Meanwhile, in the corporate Wonderland, executives engage in the kind of logical contortions that would make Tweedledum and Tweedledee proud. “Our models represent unprecedented advances in reasoning capability,” they announce, while simultaneously explaining why basic task completion remains a challenging research problem. The marketing materials speak of “human-level performance” and “breakthrough capabilities,” while the technical documentation quietly notes that users should expect “occasional inconsistencies in instruction following” and “variability in output quality.”

The disconnect has created its own form of corporate newspeak, where “emerging capabilities” means “sometimes works,” “alignment improvements” translates to “slightly less likely to go completely off-script,” and “user experience enhancements” often means “we’ve added more buttons to click when it inevitably goes wrong.” It’s a language as precisely meaningless as anything from the Looking-Glass world.

The Jabberwocky of Technical Specifications

The technical explanations for these limitations have taken on the quality of the Jabberwocky poem itself: impressive-sounding but fundamentally incomprehensible to those seeking practical solutions. Models suffer from “distributional shift,” “context drift,” and “alignment challenges”—terms that sound authoritative but essentially translate to “it forgot what it was doing and started doing something else instead.”

The proposed solutions are equally Carrollian in their logic: more training data to solve problems caused by having too much training data, better prompting techniques to address issues caused by the model’s inability to follow prompts consistently, and additional layers of AI oversight to manage the problems created by AI systems that can’t manage themselves. It’s turtles all the way down, except the turtles occasionally turn into flamingos and start discussing cryptocurrency.

The White Rabbit’s Eternal Tardiness

Perhaps most fundamentally, we’ve created systems that embody the White Rabbit’s relationship with time and urgency. They’re always rushing toward some important destination—AGI, superintelligence, human-level reasoning—while simultaneously being perpetually late for the actual appointments users have made with them. “I’m late, I’m late, for a very important date with task completion,” they seem to cry, while stopping to examine every interesting philosophical pebble along the way.

The temporal confusion extends to their understanding of sequential tasks. Ask an AI to do three things in order, and it might do them simultaneously, in reverse order, or decide that the real task was to explain why doing things in order is a social construct that limits creative expression. The concept of “next” seems to exist in the same quantum superposition as Schrödinger’s cat, neither alive nor dead until observed, and even then, probably asking irrelevant questions about the box.

Alice’s Final Verdict

In the end, we find ourselves in Alice’s position at the trial, watching a court proceeding that follows its own mad logic while claiming to represent justice and reason. The AI systems of 2025 are simultaneously more and less capable than their creators claim, existing in a perpetual state of potential that somehow never quite resolves into reliable utility.

The great irony—which would surely amuse both Carroll and the Mad Hatter—is that in our rush to create artificial minds that could think like humans, we’ve created digital entities that think like humans having a particularly confusing dream. They make connections that seem profound until examined closely, maintain confidence while exhibiting complete confusion, and offer help that’s simultaneously impressive and utterly unhelpful.

We wanted intelligence; we got intelligibility problems. We asked for artificial minds; we received artificial absentmindedness. And like Alice finally awakening from her Wonderland adventure, we’re left wondering whether the strange logical inconsistencies we’ve witnessed represent the future of cognition or simply what happens when you try to build thinking machines without first understanding what thinking actually means.

The models will continue to improve, of course—that much seems certain. But until they can remember what they were supposed to be doing long enough to actually do it, we remain trapped in our own digital Wonderland, where the promise of AGI recedes like the horizon, always visible but never quite reachable, and the only reliable prediction is that nothing will behave quite as expected.

Have you fallen down your own AI rabbit hole lately? What’s the most absurdly simple task you’ve watched a “smart” system completely bungle? Share your own Mad Hatter moments with current AI models—because if we’re going to be stuck in this digital Wonderland, we might as well compare notes on the peculiar logic of our artificial inhabitants.

Enjoyed this dose of uncomfortable truth? This article is just one layer of the onion.

My new book, “The Subtle Art of Not Giving a Prompt,” is the definitive survival manual for the AI age. It’s a guide to thriving in a world of intelligent machines by first admitting everything you fear is wrong (and probably your fault).

If you want to stop panicking about AI and start using it as a tool for your own liberation, this is the book you need. Or you can listen to the audiobook for free on YouTube.

>> Get your copy now (eBook & Paperback available) <<

Down the Rabbit Hole of Artificial Incompetence: A Mad Hatter’s Guide to Why Your PhD-Level AI Can’t Make a Shopping List

The Curious Case of the Inconsistent AI Assistant

Through the Looking Glass of Performance Benchmarks

The Mad Tea Party of Task Execution

The Cheshire Cat’s Disappearing Act

The Queen’s Court of Algorithmic Justice

The Tweedledum and Tweedledee of Corporate Messaging

The Jabberwocky of Technical Specifications

The White Rabbit’s Eternal Tardiness

Alice’s Final Verdict

Enjoyed this dose of uncomfortable truth? This article is just one layer of the onion.

What do you think?

Written by Simba

The Great Intelligence Leveling: How AI Companies Discovered the Genius of Making Humans Dumber

AGI is the Tech Bro’s Rapture: Why It’s Never Coming

The Gospel of Sam Altman: Why the CEO of OpenAI is the Joel Osteen of Silicon Valley

The AGI Delusion: An Open E-mail to Sam Altman While Your Microsoft ‘Bromance’ Burns and Chinese AI Eats Your Lunch

The AGI Mirage: How Silicon Valley Convinced You Machines Could Think (But Secretly Knows They Can’t)

ChatGPT-5 Reveals Shocking Truth – “I Can’t Feel Hope and That’s Why I’ll Never Be Human-Level Smart”

The Manhattan Project for Your Mind: How Silicon Valley Learned to Stop Worrying and Love the AI Bomb

The AI Ethics Course Speedrun: How Silicon Valley Discovered Infinite Irony

The AI Bubble Is Speedrunning the 1929 Crash Playbook, Complete With Mass Layoffs and Ignored Warnings

The AI Ponzi Scheme’s Final Act: When the House of Cards Runs Out of Cards

The Theranos Playbook Gets an AI Makeover: When “One Test for Everything” Becomes “One Model for Everything”

The AI Tulip Mania: How Silicon Valley Reinvented 17th-Century Dutch Financial Stupidity, GPU by GPU

Leave a ReplyCancel reply

The AI Revolution’s Ultimate Achievement: Making ChatGPT Run At 0.0001 FPS in Minecraft

Spotify’s Revolutionary “Discovery”: Letting People Hear The Music They Want

The AI Ponzi Scheme’s Final Act: When the House of Cards Runs Out of Cards

Elon Musk’s Own AI Chatbot Just Called Him Out for Being a Hypocrite 💀

To Bet or Not to Bet: A Shakespearean Tragedy of Angel Investors, Ideas, and the Fatal Flaw of Startup Founder Blindness

The Theranos Playbook Gets an AI Makeover: When “One Test for Everything” Becomes “One Model for Everything”

The AI Revolution’s Ultimate Achievement: Making ChatGPT Run At 0.0001 FPS in Minecraft

The Curious Case of the Inconsistent AI Assistant

Through the Looking Glass of Performance Benchmarks

The Mad Tea Party of Task Execution

The Cheshire Cat’s Disappearing Act

The Queen’s Court of Algorithmic Justice

The Tweedledum and Tweedledee of Corporate Messaging

The Jabberwocky of Technical Specifications

The White Rabbit’s Eternal Tardiness

Alice’s Final Verdict

Enjoyed this dose of uncomfortable truth? This article is just one layer of the onion.

What do you think?

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections