Skip to work

The agent that worked while I slept

I set up a Claude Code loop to refactor a noisy module overnight. I woke up to a clean diff, 47 passing tests, and a commit message that was funnier than mine usually are. Here's what I learned about scoping work for an agent that doesn't sleep.

On a Wednesday in February I went to bed at midnight with an open laptop, a long-running Claude Code session, and a single instruction: refactor the BOL ingestion module to use the shared error helper. Run tests after each file. Stop if anything red.

I woke up at 7am to a clean diff, 47 passing tests, and a commit message that opened with "this one was annoying." I had not, at any point during the previous seven hours, been awake.

This is not the part of AI I expected to find useful. I’m a paranoid reviewer. I don’t trust generated code without reading it. The idea of letting an agent loose overnight on production code felt insane. And then it kept working, and now it’s a habit.

What actually works overnight

Not everything works. I’ve killed agents at 3am via a phone notification more than once. After a few months of doing this regularly, the rough taxonomy of overnight work I trust looks like:

What I do not trust overnight:

The scoping mistake I kept making

The first three times I tried this, I wrote the brief like I would write a Jira ticket. "Refactor the X module, simplify the Y logic, clean up the Z helpers." I’d wake up to a diff that touched 40 files, with three things I liked, two things I didn’t, and one thing that broke a feature in production. Net negative.

What I do now is the opposite. I write the brief like a contract: one verb, one scope, one stop condition.

One verb: "Rename." Not "rename and clean up." Just rename.

One scope: "Files matching src/lib/billing/*." Not "wherever it seems necessary."

One stop condition: "Tests pass. If you can’t make them pass, stop and leave a note." Not "make it work."

The diff I wake up to with this kind of brief is small, surgical, and reviewable in fifteen minutes over coffee. The diff I used to wake up to was sprawling and required half a day to disentangle. Same agent. Different brief. Different outcome.

The economic insight

The thing I didn’t expect about overnight agents is how much it changes the calculus of "is this worth doing." There is a class of work — boring refactors, test backfill, dependency upgrades, dead code removal — that I knew was valuable but never prioritized because the activation energy was too high. "It would take me a day. I don’t have a day."

An overnight agent costs me about ten minutes of writing the brief and forty minutes the next morning of reviewing the diff. Fifty minutes for a day’s worth of code hygiene. That math is not subtle. Things that used to be "maybe next sprint" now happen on a Tuesday because I had a free hour before bed.

The thing that still surprises me

What I keep finding is that the agent does the work I find boring and exhausting better than I do it. Not because it’s smarter — it isn’t, on those tasks — but because it doesn’t get bored. The 47th identical rename in the 47th file is just as careful as the first. I can’t honestly say that about my own work. By the 30th rename I’m skimming.

The agent doesn’t skim. That’s the underrated thing.

Uzair SaleemUzair Saleem — full-stack engineer, Islamabad. see my work →
← previousAI is leverage. It's not autocomplete.