Should You Self-Host Your AI? A Framework for the Edge vs. Cloud Decision
Should You Self-Host Your AI?
A Framework for the Edge vs. Cloud Decision
I ran the experiment on a Raspberry Pi 5 — and what it revealed about AI infrastructure strategy will change how you think about where intelligence should live.
Last month I deployed an AI agent system on a Raspberry Pi 5 running on my home network. It tracks my investment portfolio. It backs up my code to GitHub every night. It takes instructions over Telegram. It costs me exactly nothing to run.
That's not a hobbyist flex. That's a proof of concept that forces a question every enterprise architect should be asking right now: when does it make sense to run AI at the edge, and when should you keep it in the cloud?
The cloud vendors have made their answer obvious. But hardware, open models, and lightweight agent frameworks have quietly shifted the calculus. Here's the framework I developed by actually building something — not by reading a Gartner report.
The Decision Isn't Binary
The first mistake most teams make is framing this as edge or cloud. The real question is two-dimensional: data sensitivity on one axis, latency and cost tolerance on the other.
Map your AI workload onto these axes and the right deployment model becomes obvious.
Private data that cannot leave the building. Inference happens locally; no cloud egress.
→ Self-hostNon-sensitive workloads where scale and model capability matter more than sovereignty.
→ CloudSensitive data, but tolerant of latency. Route through a private gateway; use cloud for heavy lifting post-anonymisation.
→ HybridLow sensitivity AND high cost tolerance with no latency pressure — is AI the right tool here? Question the premise first.
→ ChallengeThe OpenClaw Pi deployment sits firmly in the top-left quadrant. My portfolio data is personal. The agent runs scheduled jobs — latency of a few seconds is irrelevant. Cloud egress is an unnecessary cost and an unnecessary risk. Self-hosting is the obvious answer.
What the Raspberry Pi 5 Actually Delivers
Here's where I want to challenge a comfortable assumption: the Pi is a toy. It isn't — not anymore.
| Capability | Raspberry Pi 5 | Enterprise Relevance |
|---|---|---|
| Compute | Quad-core Cortex-A76 @ 2.4GHz | Sufficient for agent orchestration, scheduling, and lightweight inference coordination |
| Memory | 8GB LPDDR4X | Runs Home Assistant OS + OpenClaw + Python agents simultaneously without swapping |
| Storage | NVMe via PCIe 2.0 | Production-grade I/O — no SD card fragility at scale |
| Power Draw | ~5–10W under load | 24/7 always-on at a fraction of a server's energy footprint |
| Network | Gigabit Ethernet + WiFi 5 | Full network citizenship — SSH, API calls, webhooks, cron, all work natively |
| OS | Home Assistant OS (HAOS) | Containerised services, supervised add-ons, resilient restart behaviour |
The Pi 5 is not running the LLM locally — that's an important distinction. It's running the agent: the orchestration logic, the scheduling, the memory management, the Telegram interface, the skill execution. The heavy inference is offloaded to Gemini via API. This is a hybrid architecture, not pure edge.
That separation is the key architectural insight. And it's directly applicable at enterprise scale.
Why the Free Tier Is a Feature, Not a Constraint
I made a deliberate decision to stay on Gemini's free tier for the first phase of this project. Not because I couldn't pay — but because budget constraints are the best forcing function for good architecture.
When every token costs something, you design prompts carefully. When you have a rate limit, you batch intelligently. When you can't brute-force your way through a problem with compute, you actually think about what the agent needs to know versus what it's convenient to pass it.
The --light-context flag in OpenClaw deliberately limits the context window passed to scheduled jobs. This isn't a workaround — it's a design discipline. An agent that works well with minimal context is a more robust agent.
Enterprise teams running AI at scale should adopt the same discipline. The move to unlimited cloud compute has made teams lazy about context hygiene. You're paying for tokens you don't need, sending data you shouldn't be, and getting noisier outputs as a result.
The Enterprise Translation
Let me be direct about what this means beyond the home lab:
For CIOs: The pattern I'm running — lightweight orchestration at the edge, LLM inference in the cloud, structured data staying local — is viable for department-level AI deployments today. You don't need a centralised AI platform for every use case. A well-configured edge agent can deliver 80% of the value at 10% of the cost and complexity.
For Enterprise Architects: Start mapping your AI workloads to the two-axis framework. You'll find a larger percentage than expected belong in the top-left quadrant — high sensitivity, low latency tolerance, better served by local orchestration than cloud pipelines. These are often the most valuable use cases precisely because they involve your most sensitive data.
For Technology Specialists: The tooling is ready. OpenClaw, Home Assistant OS, containerised agent frameworks — these are production-ready today. The gap is not capability. The gap is architectural clarity about where each workload belongs.
Edge AI is not a silver bullet. Multi-tenant workloads, high-concurrency inference, and use cases requiring the latest frontier models still belong in the cloud. The framework helps you find the boundary — don't force workloads left on the grid if they genuinely belong on the right.
What's Next in This Series
Article 1 established what I built. This article gave you a framework for why this architecture makes sense. Article 3 will go deeper into the nuts and bolts: how OpenClaw's skill system works, and how you'd replicate — or adapt — this pattern for a real organisational context.
If you're building something similar, or thinking through edge AI for your organisation, I'd genuinely like to hear about it. The interesting problems don't live in conference keynotes.
Follow the Series
This is Article 2 of a 12-part series documenting the OpenClaw project — from zero to a production AI agent system running on a $80 single-board computer.
Full technical write-ups live on the blog. LinkedIn articles cover the architecture and strategy. Both tracks are running in parallel — follow to catch both.

Comments
Post a Comment