Before You Start: Prerequisites & Learning Path
- Identify the technical prerequisites for this course and assess your own readiness
- Describe the five-module arc of the course and what you will be able to build by the end
What This Course Is About
This course teaches you to build reliable, production-grade agentic AI systems using open-source models running locally — no API keys, no per-token billing, no black-box model locks.
The central thesis is simple but counterintuitive: the harness matters more than the model. Swapping from a 7B to a 32B producer shifts quality by 10–15%. Fixing a fundamental harness flaw — missing verification, no memory, naive search — can shift quality by 50–80%. The four randomized experiments that underpin this course demonstrate this empirically.
By the end of this course you will be able to:
- Design and run a rigorous experiment to evaluate harness changes
- Build a saturation-gated research loop that stops searching when new results stop adding information
- Implement an evaluate → revise → verify loop with a decimalized rubric
- Wire a custom skill into the pipeline at any of four hook points
- Instrument any pipeline stage with structured tracing
- Build a DPO preference dataset from your own run logs and fine-tune a domain-specific model
Prerequisites
Required:
- Python fluency — you should be comfortable reading and modifying 500–1,000 line Python scripts. The harness is not a library you install; it is code you read, understand, and adapt.
- Basic ML literacy — you should know what a language model is, what fine-tuning means at a high level, and what a context window is. You do not need deep ML theory.
- Command-line comfort — installing Ollama models, running Python scripts, reading log output. The course runs everything locally; there is no managed infrastructure.
Helpful but not required:
- Experience with any LLM API (OpenAI, Anthropic, etc.) — the mental model transfers even though the infrastructure differs
- Familiarity with SQLite and/or vector databases — memory.py uses both, and knowing what they are makes the Memory Systems reading click faster
- Basic familiarity with PyTorch or transformers — useful for the QLoRA fine-tuning reading in Module 5
Not required:
- Frontier model API access (the course runs on Ollama with local models)
- A GPU (though one speeds up fine-tuning in Module 5 significantly)
- Deep knowledge of transformer architecture
Hardware
The harness is designed to run on commodity hardware. Practical minimums:
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 16 GB | 32 GB |
| VRAM | 0 GB (CPU fallback) | 8 GB+ |
| Storage | 30 GB free | 60 GB+ |
| OS | Windows/macOS/Linux | Any |
The default producer model (pi-qwen-32b, Qwen2.5-32B Q4_K_M) requires ~20 GB of RAM+VRAM. On a 16 GB machine you can use the 7B fallback (pi-qwen) — quality is lower but the pipeline logic is identical.
Software Setup
Clone the harness repository:
git clone https://github.com/nickmccarty/ollama-pi-harness
cd ollama-pi-harness
Install Ollama and pull the required models:
# Install Ollama (ollama.ai)
ollama pull qwen2.5:7b # base producer (fallback)
ollama pull glm4:9b # planner + memory compression
ollama pull Qwen3-Coder:30b # evaluator
ollama pull llama3.2-vision # vision preprocessing
# Create custom producer Modelfiles
ollama create pi-qwen -f Modelfile
ollama create pi-qwen-32b -f Modelfile.32b # if 32B available
Python environment:
conda create -n ollama-pi python=3.11
conda activate ollama-pi
pip install ollama ddgs "markitdown[all]" chromadb sentence-transformers datasets
Module Arc
The harness has seven subsystems — Hooks, Agent, Research loop, Notes, Evaluation loop, Security, Signals — and this course covers all of them:
| Module | Title | Core Question | HARNESS |
|---|---|---|---|
| M1 | The Harness Thesis | What is harness engineering, and why does it dominate quality? | A |
| M2 | Context Engineering & Memory | What information reaches the model, and how is it selected? | R, N |
| M3 | Verification & Failure Modes | How do we know the output is good, and what goes wrong? | E |
| M4 | Production Systems | How do we extend, orchestrate, secure, and observe the harness? | H, S, S |
| M5 | Self-Improvement | How does the harness improve itself over time? | all |
Modules build on each other — M3 assumes you understand M2's research pipeline, and M5's autoresearch loop only makes sense once you can read a wiggum score. Work through them in order.