10 Things I Assumed Before Writing a Line of Code
I'm building a platform that runs 23 services on a single machine — API proxies, Telegram bots, webhook processors, task managers, and AI pipelines — all behind one gateway. Before writing any code, I wrote down 10 assumptions about the environment these services would operate in. Every design decision flows from these.
A1: Fragility — External APIs fail without warning. Every outbound call handles timeouts, 5xx errors, and the provider simply being gone.
A2: Latency — Network calls take anywhere from 100ms to 30 seconds. LLM calls get 20s+ timeouts. Everything else gets 5s minimum. No optimistic assumptions about speed.
A3: Cost — Every API call costs money. Every billable operation gets recorded. The gateway tracks per-request cost. If you can't tell me what last Tuesday cost, the system is broken.
A4: Reversibility — Mutations must be undoable. Every mutating endpoint has a documented rollback path or is idempotent. If it can't be reversed, it requires human approval first.
A5: Trust — No input is trustworthy. All external input validated at the system boundary. Secrets never appear in logs. A service that trusts its caller is a service waiting to be exploited.
A6: Observability — If it's not logged, it didn't happen. Every capability call emits a structured log with the service name, request ID, and why it was called. Not just what happened — why.
A7: Uncertainty — LLM outputs are probabilistic. Every LLM response gets validated and parsed. Fallback paths exist for when the model returns garbage. Treating LLM output as reliable is the fastest way to build a fragile system.
A8: Resource Limits — Quotas, rate limits, and budgets are real. Rate limiting is enforced. Budget caps halt execution. When a quota is exhausted, the system degrades gracefully instead of running up a bill.
A9: Explainability — Every action must be traceable. Full audit trail: who requested it, what happened, why, what it cost, and the outcome. If you can't reconstruct what happened after the fact, you don't have a production system.
A10: Coordination — Multi-service workflows fail at boundaries. Schema contracts are enforced. Cross-service calls use typed payloads. The service works fine in isolation — it's the handoff that kills you.
These aren't aspirational. Each one maps to a specific stress test that tries to break it. More on that in a future post.