My Agentic AI Setup at Home

My Agentic AI Setup at Home

πŸ€–
3 years ago I've been starting this blog and made a promise avoding AI content here. Things are changing, yet I don't want to give up on the promises.
I'm introducing "ai-augmented" tag that will be attached to all upcoming posts that was processed heavily by AI. You don't have to guess - decide what matters to you.

I run two agentic environments in parallel β€” OpenClaw and Hermes. Each lives in its own VM on top of a Proxmox host sitting in my office. Why two? I shared more of the context in this LinkedIn post about Stew and Bob. Short version: I want to compare approaches, and I don't want either one to be a single point of failure for the work I do every day.

The model layer

I pay for the MiniMax Token plan β€” twenty dollars a month for generous limits. Yesterday I migrated both setups to the freshly released M3. The jump is real: better at following skills, better at executing multi-step instructions, and surprisingly good at building helper scripts on the fly when it needs them. The "agentic" part of agentic AI is finally starting to feel like more than marketing.

My daily driver for non-agentic work is GPT-5.5, with Codex for code-heavy tasks. I gave up on Claude β€” too many forks, too many model choices to second-guess. Decision fatigue is a real productivity tax.

For local work I run OpenCode with a Zen Go subscription β€” ten dollars a month for access to a bunch of Chinese models. It's a cheap way to tinker.

Why two, in practice

Hermes feels more future-proof to me in the long term. The self-education approach β€” where the agent improves by reflecting on what it did, not by me hand-tuning prompts β€” is the direction I want my stack to go. OpenClaw got a "dreaming" feature recently, which is a kind of cron that reviews what happened and tweaks its own memory. That's interesting, but it's still bolted on. Hermes treats the same idea as a first-class concern.

On the previous model, M2.7, Hermes gave me better results when it came to preparing and publishing content. More importantly: when I reset a session, Hermes doesn't make me feel like I'm starting from day one. OpenClaw sometimes does. Continuity of context is a quiet feature until you lose it.

OpenClaw is the slower, more resource-hungry one in my observations. The trade-off is that it now has Codex plugged in, so when I'm bored I open Telegram and have it spin up side-coding tasks. That's been genuinely fun, and it changed how I think about idle time.

Hardware, the 128GB chapter

I tried self-hosting serious local models on 64GB of RAM first. I was running short on memory for Gemma4 31B β€” close, but not comfortable. So I bumped to 128GB and now I run Qwen3.6 27B / 35B. The 27B consumes around 80–85GB. If you're on Apple Silicon, make sure to bring MLX into the pipeline for serious performance β€” the difference is not subtle.

Twenty-seven billion is a sweet spot. I've seen a lot of people run it happily on smaller machines, and it should be one of the first models you try if you're getting started with local inference.

The smaller model conversation is even more interesting. A guy on Threads recently pointed me at Qwen 3.5 9B as a routing model β€” the cheap, fast model that decides which bigger model to call. He says it does well on 8GB of RAM. I'm going to build a second home server with a cheap 8GB GPU to test this. Probably going to point OpenClaw at it for the less critical parts of the workflow, where it has fallbacks (coding locally, for example). It's an experiment, not a commitment.

What can a small model actually do?

Yes, you can use 48GB of shared memory productively. No, it cannot replace frontier models. The honest answer is a daily-driver stack: GPT-5.5 for thinking, Codex for code, and the local models for the cheap, fast, replaceable parts. Agents? Cheap MiniMax. Coding locally? OpenCode with the cheap Chinese models. Heavy reasoning? Pay for the frontier.

The most useful pipeline I have

Here's the part that actually pays for the rest of the setup.

I drop an idea I want to post into Hermes. It runs a pipeline:

  • Topic assessment per social network
  • Fix typos and styling
  • Run a copywriting pass, consulting my Tone of Voice doc in Notion
  • Validate the length of the content and split it if necessary
  • Prepare the payload and push it to Postiz, my self-hosted posting tool

The whole flow takes a couple of minutes. The hardest piece used to be the splitting. Counting characters per platform, finding the right paragraph boundaries, keeping the prose coherent across fragments β€” it broke often enough that I dreaded the last mile. M3 fixed it. The model now follows the splitting rules the way the rules say, and that's been the single biggest quality-of-life improvement in this whole stack.

NotebookLM is on my radar. I added an unofficial integration recently but haven't had a real use case for it yet. I used to get that "dump a session summary to Notion" behavior from Claude. ChatGPT doesn't have an equivalent. Hermes and OpenClaw both fill that gap for me, and I'm planning to push Hermes further on the NotebookLM side β€” research-grounded work where I want a source, not a hallucination.

The part I'm still not happy with

Both setups are designed to be fully autonomous, non-limited agents. That's why they're VMs in the first place β€” so I can wipe them, snapshot them, and run them with different model layers. The problem is credentials. They still live in the same environment as the agent. If the agent is compromised, the credentials are compromised. That's the next thing I want to fix, and I'm thinking HashiCorp Vault or something similar β€” a real secrets boundary, not just a .env file in the same VM.

What I'd tell someone starting from zero

Run two. Make them disposable. Pick the cheap plan for agents, the frontier model for the things that matter, and a local model for the routing and the boring glue. Build the skills. They are the actual product. The model is replaceable. The pipeline is not.

And put the credentials somewhere that isn't the same VM as the agent. I wish I'd done that on day one.