Hermes Gemma 4: The Cheapest AI Agent Stack On Earth

Hermes Gemma 4 is the cheapest AI agent stack on the planet right now, and nobody's talking about it loud enough.

I want to change that.

Because if you're running an AI-powered business — or even just messing with agents for personal projects — this setup can save you thousands a year.

Here's the plot:

Gemma 4 just dropped as Google's latest open-source lightweight efficient model.

Hermes is the agent I orchestrate all my workflows with.

Ollama is the local runtime that hosts Gemma 4.

Stitch them together and you get a fully-fledged AI agent that costs you nothing beyond your electricity bill.

Let me walk you through it properly.

The Real Reason I Built This Hermes Gemma 4 Stack

A year ago I was paying hundreds per month in API bills.

Then thousands.

Because that's what happens when you start running agents at scale.

Each additional sub-agent costs more. Each tool call costs more. Each extra context token costs more.

It was working — but it wasn't sustainable.

I needed a layer of "good enough and free" for the bulk work.

That's where Hermes Gemma 4 comes in.

For the 80% of tasks that don't need frontier intelligence, Gemma 4 is plenty.

And it runs on my laptop for free.

If you've never seen Hermes before, check out my Ollama + Hermes breakdown for the foundation — this article picks up where that one leaves off.

What Makes Gemma 4 Different

Gemma 4 is Google's latest open-source lightweight efficient model.

Three things make it worth your attention:

1. It's properly lightweight. Runs on a normal laptop. Not "runs if you have an RTX 4090 and 64GB RAM". Just a laptop.

2. The big variants have 256K context. The 18GB and 20GB Gemma 4 variants ship with a 256K context window. That's bigger than MiniMax M2.7 and a lot of frontier cloud models. On your local machine.

3. It's built for agents. Handles tool calls. Plays nicely with sub-agents. Follows instructions without the weird refusals some models throw.

For an open-source free model, that's a stacked feature list.

The Full Hermes Gemma 4 Setup

Step 1 — Download Ollama

Go to ollama.com.

They show you a command like curl -fsSL https://ollama.com/install.sh | sh (or similar depending on your OS).

Copy it.

Paste into terminal.

Run it.

Ollama installs as a background service.

Confirm with ollama list in a new terminal window.

Step 2 — Install Gemma 4

Navigate to Models on ollama.com.

Find Gemma 4.

Pick your variant:

Smaller Gemma 4 → ~128K context, runs on most machines
Larger Gemma 4 (18GB, 20GB) → 256K context, needs more RAM

For beginners I'd say start with a smaller variant. You can always pull a bigger one later.

Ollama shows you the install command — something like ollama pull gemma:4-latest.

Run it. Wait for the download.

Gemma 4 is now on your machine.

Step 3 — Run `hermes model`

In your terminal, start a new Hermes chat.

Run:

hermes model

You'll see the list of available endpoints.

Step 4 — Select Custom Endpoint

Scroll through the options.

Find Custom Endpoint — where you manually enter a URL.

Select it.

Step 5 — Paste The Ollama URL

Hermes prompts you for a URL.

Paste http://localhost:11434 (Ollama's default local URL).

Important: Ollama must be running in the background. If it's not, Hermes gets nothing back.

If you're unsure, open a second terminal and run ollama list. If it works, Ollama is alive.

Step 6 — API Key

Hermes asks for an API key.

Two options:

Leave blank
Type "Ollama"

Either works fine.

I usually type "Ollama" — it feels cleaner.

Step 7 — Pick Gemma 4 And Run

Hermes pings Ollama, pulls the models you've installed, and shows them in a list.

Select Gemma 4 latest.

Leave the next prompt blank.

Run Hermes.

Boom. You're live on Hermes Gemma 4.

🔥 Stuck somewhere in the setup? Inside the AI Profit Boardroom, I've got a full Hermes + Ollama section with video tutorials covering this exact flow — including screenshot walkthroughs, troubleshooting common errors, and the sub-agent workflows I use daily. Plus weekly coaching calls where you can share your screen and get help with YOUR setup. 3,000+ members already inside. → Jump in here

The Whole Command Flow At A Glance

Install Ollama (one-time, from ollama.com)
Install Gemma 4 (one-time, from ollama.com/models/gemma)
hermes model
→ Custom Endpoint
→ http://localhost:11434
→ API key: Ollama
→ Gemma 4 latest
→ Leave blank
→ Run Hermes

That's the entire setup.

Nobody else on the internet explains it this simply. I had to figure it out by trial and error because every other guide skipped half the steps.

Why Sub-Agents Are The Real Win

If you take nothing else from this article, take this.

The real value of Hermes Gemma 4 is running sub-agents.

Here's the thinking:

Your orchestrator session (the one doing strategic thinking) runs on Claude Opus 4.7 or MiniMax M2.7. You pay for that because it's the brain.
Your sub-agents (the ones doing the grunt work — summarising, classifying, filling templates, running tools) run on local Gemma 4. You don't pay a penny for them.

If you've got 10 sub-agents, you've just saved yourself 10x the API cost.

If you've got 50, you've saved 50x.

And because Gemma 4 is local, there's no rate limit — you can run as many sub-agents in parallel as your hardware can handle.

This is how you scale AI operations without a five-figure monthly bill.

I went deeper into the multi-agent orchestration idea in Hermes agent mission control.

Bonus — MiniMax 2.7 Cloud With Hermes

The custom endpoint flow in Hermes isn't only for Gemma 4.

Want to run MiniMax M2.7 cloud through Hermes?

hermes model
Select custom endpoint
Type "2" for MiniMax M2.7 cloud
Leave blank
Run Hermes

MiniMax 2.7 is agentic, self-improving, and built from the ground up to run tools.

You can even run OpenClaw with MiniMax 2.7 on top of Ollama.

For the agent comparison, I'd peek at Hermes vs OpenClaw to pick which one suits your workflow.

The combo move: Hermes orchestrator (Claude Opus or MiniMax 2.7) + Gemma 4 sub-agents (free, local).

That's the stack.

Real Talk — When Cloud Still Wins

I'd be lying if I said Hermes Gemma 4 replaces cloud models entirely.

It doesn't. Cloud is still better at:

The absolute hardest reasoning tasks
Complex multi-step code generation
Deep research synthesis
High-polish client-facing content

For those, I still use Claude Opus 4.7 or MiniMax M2.7.

But for everything else — which is honestly 70-80% of my agent workload — Gemma 4 is perfectly adequate.

Free and good enough beats paid and perfect, every time, when volume matters.

Troubleshooting

Ollama not running. The #1 issue. Check with ollama list. Restart Ollama if needed.

Gemma 4 not in the Hermes model list. Install didn't finish — re-run the install command from the Gemma 4 page on ollama.com.

"Connection refused" error. Wrong URL. Use http://localhost:11434.

Slow responses / overheating. Your machine can't handle the variant you chose. Drop to a smaller Gemma 4 variant.

Outputs feel generic. Local models benefit from more prompt structure. Be explicit. Give examples. Specify tone and format.

FAQ — Hermes Gemma 4

What is Hermes Gemma 4?

The pairing of the Hermes AI agent with Google's open-source Gemma 4 model, running locally via Ollama. A fully free-to-run AI agent stack.

Is Hermes Gemma 4 actually free?

Yes. Gemma 4 is open-source, Ollama is open-source, Hermes doesn't charge you to point at a local model. Only cost is electricity.

What's the context window on Hermes Gemma 4?

128K on the smaller Gemma 4 variants. 256K on the 18GB and 20GB variants — bigger than MiniMax M2.7 and most frontier cloud models.

How do I install Hermes Gemma 4?

Install Ollama from ollama.com. Pull Gemma 4 via ollama.com Models. Then in Hermes: hermes model → Custom Endpoint → paste http://localhost:11434 → API key "Ollama" (or blank) → select Gemma 4 latest → leave blank → run.

Can Hermes Gemma 4 replace paid AI APIs?

For 70-80% of agent work, yes — especially sub-agent workflows, bulk classification, template filling, and privacy-sensitive work. For the hardest reasoning tasks, keep a cloud model like Claude Opus 4.7 or MiniMax M2.7 in the mix.

Does Hermes Gemma 4 support tool calling?

Yes. Gemma 4 is designed to run agentically — it handles tool calls and works cleanly inside Hermes' agent loop.

Wrapping Up

Hermes Gemma 4 is the cheapest serious AI agent stack you can build today.

Install Ollama. Pull Gemma 4. Wire Hermes to the custom endpoint. Run.

Five minutes of setup.

Zero ongoing cost.

256K context on the big variants.

Sub-agents galore.

🚀 Ready to turn this into real money-saving workflows? Inside the AI Profit Boardroom, I've got a 2-hour course on how to use Hermes to save time and grow. 6-hour OpenClaw course. Daily SOP trainings (yesterday's was the new Hermes 0.7 update). Weekly coaching calls. 145 pages of member wins. The classroom has the best AI automation trainings anywhere. → Join us here

Video notes + links to the tools 👉 https://www.skool.com/ai-profit-lab-7462/about

Learn how I make these videos 👉 https://aiprofitboardroom.com/

Get a FREE AI Course + Community + 1,000 AI Agents 👉 https://www.skool.com/ai-seo-with-julian-goldie-1553/about

Run Hermes Gemma 4 tonight. It's the clearest "free money" move in AI right now — Hermes Gemma 4.