The most common errors when building apps with Claude
Seven mistakes we see repeatedly in production Claude apps. Some cost you tokens, some cost you money, one or two can cost you the company. Each one has a simple fix.
Claude is the best general-purpose model for building real applications in 2026. But most of the Claude apps we audit repeat the same seven mistakes. Each one is simple to fix. None of them are obvious until you are already in production, the bill has tripled, or a customer has extracted your system prompt.
Here is the list we share with every team we onboard.
1. Treating the context window as a bucket
A 200K context window is not free real estate. As you fill it, accuracy and recall degrade. Anthropic calls this context rot. The mistake is to pour everything into context on the assumption that more data equals better answers.
Treat context as a resource you curate, not a bin you dump into. Keep only what the model needs for the current step. For agents running multi-step tasks, use context editing or summary compaction before the window fills.
2. Assuming prompt caching works
Prompt caching can cut costs by up to 90% on stable prompts. But it fails silently. If your cache breakpoint lands on a block that changes between requests (timestamps, per-user variables, randomly ordered JSON keys in tool schemas), you get zero cache hits and no warning. The request succeeds. The bill goes up.
Log cache_creation_input_tokens and cache_read_input_tokens from every response. If both are zero for a week, the cache is not working. Stabilize the order of everything before the breakpoint. Move dynamic content after it.
3. Tool schema ambiguity
Tool use fails the most on wrong tool selection and incorrect parameters. The usual cause is tool names and descriptions that are too similar. notification-send-user and notification-send-channel both look right to the model when the user says "send a message".
Invest in tool descriptions. Write them for a smart intern, not for yourself. Test with edge prompts. If two tools can be confused, consolidate them or rename them.
4. Treating prompt injection as a theoretical risk
Prompt injection is the top AI agent security risk in 2026. It is not theoretical. Attackers embed instructions in content the model processes: a GitHub issue, a PDF, an email. The model cannot reliably tell system instructions from user content from injected content.
Defenses: keep agent permissions minimum, sandbox execution, treat every untrusted input as hostile, filter outputs, never echo raw untrusted content back into the prompt without escaping. The OWASP LLM Prompt Injection Prevention cheat sheet is a good place to start.
5. Running agents with too many permissions
Do not give an agent blanket Bash access, filesystem access, or network access. Production agents should run in sandboxed containers with the smallest allowed_tools set that gets the job done. Use permission callbacks. Use PreToolUse hooks to block sensitive paths. Inject API keys through a proxy so the agent never sees them.
The flag --dangerously-skip-permissions exists for a reason. It is dangerous. Never run it in production. Even with it off, approval fatigue is real: developers rubber-stamp dozens of prompts per session and injected actions slip through. Make approvals rare and meaningful.
6. No audit trail, no kill switch
Every agent action in production should be logged: the tool called, the arguments, the result, the user who triggered it, the correlation ID. Audit trails are how you find out what went wrong after it goes wrong. Kill switches are how you stop it while it is going wrong.
If you cannot answer "what was agent X doing at 2:14pm?" in under thirty seconds, you do not have observability.
7. No model versioning strategy
Anthropic releases new model versions regularly. If you pinned to claude-sonnet-4-5 and built your prompts around its quirks, a switch to the latest Sonnet changes behavior. If you use the latest alias, the model under your app changes without your knowledge.
Pick one explicit strategy. Pin to a dated model and set a schedule to test new versions, or use the latest alias and run a regression test suite on every model change. Both work. No strategy does not.
The short list
Curate context. Verify your cache. Disambiguate tools. Defend against prompt injection. Give agents the minimum permissions they need. Log everything and build a kill switch. Pick a versioning strategy.
None of these are Claude-specific quirks. They are mistakes any team building with a large language model makes the first time. We repeat them on new projects until we codify the rules. This post is our current codified list.
If you are building something with Claude and want a second pair of eyes on your setup, get in touch.
Studio
Start a project.
We build digital products that ship faster, cost less, and don’t fall apart after launch. One team, one invoice, modern stack.