Karpathy Said the Era of Human Researchers Is Over. Then He Proved It.

In partnership with

Andrei Karpathy spent years at OpenAI building the systems that made modern AI possible.

Then he left, went independent, and spent his evenings building a tool called autoresearch.

He released it quietly on GitHub. Within days it had accumulated over 87,000 stars.

The description is four words: AI agents running research automatically.

What autoresearch does is structurally simple.

You give it an asset, a scoring mechanism, and a set of instructions. The agent runs experiments, keeps what improves the score, discards what doesn't, and loops again. It runs in five-minute cycles, overnight, without stopping, without waiting for a human to review the results.

Karpathy tested it on one of his own AI models. He had been working on that model for a long time and believed he had exhausted the improvements. Autoresearch found another eleven percent gain in a single night.

Then he wrote something that is worth reading carefully.

He described the old model of AI research as "meat computers," humans working in between eating, sleeping, and having fun, synchronizing with each other occasionally through "soundwave interconnect," meaning conversations in meetings. He said that era is now over. Research, he wrote, is now entirely the domain of autonomous swarms of AI agents.

He was not writing science fiction. He was describing what his tool was already doing.

The framing matters more than the tool.

Karpathy is not a commentator. He is not a venture capitalist with an incentive to talk up the sector. He is the person who wrote the curriculum that taught a generation of engineers how neural networks actually work. When he says the era of human researchers is over, he is not making a prediction. He is reporting what he observed.

Tobi Lütke, CEO of Shopify, ran autoresearch on Shopify's codebase overnight. He woke up to a nineteen percent performance improvement across thirty-seven automated experiments. Four days later he ran it again. The codebase was fifty-three percent faster.

Neither of them wrote a single line of the improved code.

The part that most coverage missed is the constraint Karpathy built into the system. The scoring mechanism, the file that defines what "better" means, cannot be touched by the agent. The agent can only modify the asset it is trying to improve. It cannot redefine success to make itself look better.

That architectural decision tells you something about how seriously Karpathy takes the alignment problem inside even a simple self-improvement loop. He built the guardrail before he built the tool.

That is not how most AI products are shipped.

No theory. No slides. Just pipeline.

Most founders know their product. Few know how to get it in front of the right people. In this hands-on session, Clay + HubSpot for Startups walk you through ICP definition, prospect list enrichment, and AI-personalized outreach. You launch your first sequence before the session ends. June 18. 11am ET / 4pm GMT.

The implications do not stop at code.

The same loop applies to any asset with an objective score and a fast feedback loop. Cold email subject lines. Website load times. Ad click-through rates. Any system where you can define a number and measure it in hours rather than weeks is now a candidate for autonomous overnight improvement.

Gary Tan of Y Combinator put it plainly: the bottleneck is no longer compute. It is the quality of the instructions you write for the agent. The human's job shifts from doing the work to specifying what better looks like.

That is a structural change in what expertise is worth. The people who can define the right scoring mechanism, who understand the problem well enough to know what to optimize for, become significantly more valuable. The people who execute the iterations become less so.

That shift is already underway. Karpathy just made it visible.

The repo is public:

GitHub - karpathy/autoresearch: AI agents running research on single-GPU nanochat training automatically

AI agents running research on single-GPU nanochat training automatically - karpathy/autoresearch

GitHub

What to build for:

organizations that still rely on human iteration cycles for anything measurable now have a structural cost disadvantage. The gap between teams using autonomous improvement loops and teams doing manual testing will compound weekly. For builders developing AI-adjacent tools or services, the near-term opportunity is not the loop itself. It is the scoring layer: helping clients define what better means, in measurable terms, for the specific assets they care about. That specification work is the part autoresearch cannot do for itself.

404 Found covers AI developments from a European Insider, three times a week.

See Why HubSpot Chose Mintlify for Docs

HubSpot switched to Mintlify and saw 3x faster builds with 50% fewer eng resources. Beautiful, AI-native documentation that scales with your product — no custom infrastructure required.

Simplify Your Docs Today