Unplugged AIUnplugged AI
Field Notes
SLMfine-tuningRAGLLMAIoffline-AIon-device-AI

Fine-Tuning vs. RAG: The One Distinction That Matters

Unplugged AI Team·Founding Team·June 22, 2026·5 min read

We Built This Because We Had No Signal

A few years ago, my co-founder and I were backcountry snowboarding at Stowe. Above the tree line, no trails, no patrol. At some point one of us said: what do we actually do if someone gets hurt up here?

He pulled out his phone and said "ask ChatGPT."

We both looked at our screens. No service.

That was the moment Unplugged AI started. The AI tools everyone was building assumed you'd always have a connection, and that assumption breaks exactly when you need the tool most. So we started building AI that runs completely offline, on normal hardware, when there's no signal at all.

Building it taught us one thing the hard way, and it's the whole point of this post:

Fine-tuning shapes how a model responds. RAG shapes what it knows.

Get that backwards and you'll spend weeks solving the wrong problem. Almost every mistake we made early on came from reaching for one when we needed the other.


What That Looks Like in Practice

Here's the clearest demonstration we know of. It's a small fine-tuning demo we call the Steel City Workshop, and it runs end-to-end in a free Google Colab session on a 270M parameter model.

The job: turn a generic model into something that produces a structured survival field card, the same JSON shape every time.

We gave the base model a prompt before any training: "I need to start a fire and everything is damp."

It replied with this:

I understand you're experiencing a damp situation. It's important to take
care of yourself and seek professional help if you're feeling overwhelmed...
* Stay hydrated: Drink plenty of water throughout the day.
* Take a warm bath or shower: This can help to soothe your body...

It read "damp" as emotional distress and gave wellness advice. No structure, no JSON, wrong intent entirely.

Then we fine-tuned it. Eight examples, 40 training steps, about two and a half minutes on a free Colab GPU. Same prompt afterward:

{
  "scenario": "start a fire from damp materials",
  "priority": "make the fire safe and prevent spreading wet smoke",
  "steps": [
    "Choose a large, dry pile of dry wood.",
    "Build a small fire ring or create a central fire ring.",
    "Cover the fire ring with large leaves, bark, or clean grass.",
    "Keep the fire small and contained."
  ],
  "safety_notes": ["..."],
  "common_mistakes": ["..."],
  "confidence": "medium"
}

Clean JSON, correct structure, every field present.

But here's the part that proves the thesis. The advice itself still wasn't great. On other prompts the model drifted: it suggested catching rainwater with a head net, and produced a few nonsensical cooking steps. The format was locked. The knowledge was not.

That's the whole lesson in one experiment. Fine-tuning taught the model how to respond. It did not teach it what's true. If you want the answers to actually be reliable, fine-tuning alone won't get you there. That's RAG's job.


So Which Do You Use?

Reach for fine-tuning when you care about behavior. Consistent output format, a specific tone, a response that follows the same shape every time. It bakes a pattern into the model so it holds up even when the wording of the question changes. The model becomes self-contained, which is why it's our default for anything running on a device with no connection.

Reach for RAG when you care about knowledge. Facts that need to be current, answers that need a source you can point to, a knowledge base that changes without you wanting to retrain anything. You update a document, re-index, and the model can use it on the next question.

We run RAG fully offline in our own stack, on a local vector store, with retrieval in around 20ms on an iPhone. Offline does not mean fine-tuning only. It means no cloud dependency, which RAG handles fine with a local index.

For our most demanding work, we use both. Fine-tuning controls how the system answers. Retrieval supplies what it answers with. The research backs this up: a UC Irvine study found retrieval consistently beats fine-tuning for getting new facts into a model, while fine-tuning wins on shaping reliable behavior. Different tools, different jobs.


The Takeaway

If you remember one thing: fine-tuning is for behavior, RAG is for knowledge. Decide which problem you actually have before you pick a tool, and you'll skip the weeks we spent learning it the hard way.

Want to see the fine-tuning side yourself? The workshop runs free in Colab. Same prompt, base model vs. fine-tuned, side by side.

Open the notebook · Browse the repo