The expensive reflex
The instinct right now is to put AI on everything. It’s the wrong instinct. AI is the costliest, least predictable component you can put in a workflow — you pay for every call, and it can hand you a different answer today than it gave yesterday. The operator’s job isn’t to use more AI. It’s to call it in only where judgment is genuinely required, and let cheap, deterministic automation do everything else.
They’re not the same tool — they’re opposites
Automation and AI get talked about as one thing. They behave in opposite ways, and that’s exactly why a good system needs both.
| Automation (n8n, Make, code) | AI (LLM prompts) | |
|---|---|---|
| Cost to run | Effectively free per run | You pay tokens on every call; grows with volume |
| Behaviour | Same input, same output, every time | Varies run to run; can be confidently wrong |
| Messy language / judgment | Can’t do it | Yes — this is the entire point of using it |
| Testable / auditable | Yes, you can prove it works | Harder — you sample, validate, and monitor |
| Best at | Routing, scheduling, moving data, math, formatting | Reading, summarising, classifying, drafting |
Where the money quietly leaks
Every AI call costs tokens, and the bill grows with two things: how often you call it, and how much context you cram in each time. Take routing an incoming email to the right team. If the rule is “sales@ goes to sales, billing keywords go to billing,” that’s a filter — it runs ten thousand times a month for nothing and never gets it wrong. Hand the same job to AI and you’re paying for ten thousand calls to make a decision a rule makes perfectly. Multiply that across a dozen steps and you’ve built something more expensive and less reliable than the free version. The question for every step is simple: does this actually need judgment, or am I paying AI to do arithmetic?
The pattern that actually holds
The systems that hold aren’t “an AI” or “an automation.” They’re automation as the backbone, with AI dropped in only at the points that need it — which is most of what applied AI means in practice. The automation layer (this is what n8n is for) does the triggering, sequencing, routing, retries, logging, and guaranteed delivery. The AI does only the steps that truly need language or judgment: summarise this, is this customer angry, draft this in her voice, pull the date out of this mess. And crucially, the automation wraps the AI: it checks the output is valid, retries when it isn’t, falls back or flags a human when it can’t, and logs every call. That harness is what turns a probabilistic tool into a repeatable result. Running n8n alongside your AI prompts isn’t belt-and-braces — it’s the only way the AI part is trustworthy twice in a row.
Even then: edge cases and learning
A hybrid still isn’t finished. AI handles the happy path — the clean 80%. The trouble is always the 10–20% it has never seen: the multi-part email, the quote in another language, the input nobody trained on. It won’t tell you it’s struggling; it’ll just guess. So two more pieces are non-negotiable. Edge-case management: detect when an input is outside what the AI handles well and route it to a person rather than letting it guess. And a feedback loop: capture the misses, turn them into named patterns, and feed them back so the system sharpens over six months instead of quietly rotting. That’s the Group / Present / Feedback method — the part most builds skip, and the reason theirs drift while a proper one holds.
A rule of thumb
Before you reach for AI on any single step, ask:
- Is the rule fixed and the input clean? Use automation. Don’t pay AI to do an if/else.
- Does it need to read messy language or make a judgment? Use AI — wrapped in automation that checks its work.
- Would a wrong answer cost real money or trust? AI plus a human checkpoint and edge-case routing — never AI on its own.