Pipeline Architecture
- Trace every stage of the single-focus run lifecycle from parse_skills() through compress_and_store() and name the responsible file for each
- Distinguish between single-focus tasks (agent.py) and compound tasks (orchestrator.py) and explain when to use each
- Explain the role of each component file in the harness and the data it consumes and produces
Two Task Types
The harness handles two fundamentally different kinds of work:
Single-focus tasks are routed through agent.py. A single-focus task has one clear deliverable: research a topic, produce a document, annotate an abstract. The run lifecycle below describes single-focus tasks.
Compound tasks are routed through orchestrator.py. A compound task — "research agent failure modes and context engineering, synthesize into a unified guide" — is decomposed into subtasks, each run through the single-focus pipeline in parallel, and then assembled into a final document. The orchestrator is a coordination layer on top of the agent, not a separate system.
The Single-Focus Run Lifecycle
parse_skills() skills.py
→ memory.get_context() memory.py
→ make_plan() planner.py
→ auto_activate() skills.py
→ gather_research() agent.py
web_search_raw() search_cache → DDGS
compress_knowledge() rolling LLM compression
read_file_context() chunker + MarkItDown
enrich_with_page_content() URL enrichment
→ synthesize() agent.py (producer model)
→ count check + retry
→ write output
→ wiggum_loop() wiggum.py
→ run_panel() panel.py (post_wiggum skills)
→ post_synthesis skills skills.py
→ compress_and_store() memory.py
Each arrow is a function call. Each indentation level is a nested call or sub-stage. Let's walk through what happens at each stage.
Stage 1: parse_skills()
File: skills.py
The task string may begin with /skill tokens: /annotate /deep Research RAG techniques and save to output.md. parse_skills() strips these tokens from the task string and returns both the clean task and the set of explicitly activated skills. This happens before any model call — skills affect every subsequent stage.
Stage 2: memory.get_context()
File: memory.py
Before planning, the harness retrieves relevant observations from prior runs. The memory system maintains two indices: a ChromaDB vector store for semantic similarity retrieval and a SQLite FTS5 index for keyword matching. The combined context — typically 3–5 relevant past observations — is injected into the planning prompt. This lets the planner avoid re-researching topics the agent has already covered.
Stage 3: make_plan()
File: planner.py
The planner model (glm4:9b by default) analyzes the task and produces a Plan dataclass:
@dataclass
class Plan:
task_type: str # "enumerated", "best_practices", "research"
complexity: str # "low", "medium", "high"
search_queries: list # 2+ targeted search strings
prior_work: str # what memory already covers
expected_sections: list # expected output structure
The task_type field matters downstream: the Wiggum evaluator applies different criteria to enumerated tasks (must hit the specified count) versus best-practices tasks (must cover practical implementation) versus research tasks (must integrate multiple sources).
Stage 4: auto_activate()
File: skills.py
Some skills activate automatically based on task content or plan properties. /deep fires when the task mentions "comprehensive", "exhaustive", or "deep dive". /panel fires when plan.complexity == "high". /annotate fires when the task mentions "paper", "abstract", or "survey". Auto-activation happens after planning so that the plan's complexity assessment can trigger skills.
Stage 5: gather_research()
File: agent.py
This is the most complex stage. The research loop:
- Checks if a cached research context exists (
RESEARCH_CACHE=1env flag) - Runs web searches (DDGS) against the planner's queries, with a 24h SQLite TTL cache
- Assesses novelty of each search round and stops when new results add little new information
- Compresses results into a rolling knowledge state after each accepted round
- Reads any file paths detected in the task string (using the chunker for large files)
- Enriches the top novel URLs with full page content via MarkItDown
The output is a single string of formatted research context passed to synthesis.
Stage 6: synthesize()
File: agent.py (calls the producer model)
The synthesis call assembles:
- The task description
- Retrieved memory context (from stage 2)
- The research context (from stage 5)
- Skill-injected prompts (pre_synthesis hooks)
- The synthesis instruction (the target of autoresearch optimization in Module 5)
The producer model returns Markdown. For enumerated tasks, the harness checks whether the correct count was produced and retries once if not.
Stage 7: wiggum_loop()
File: wiggum.py
The output enters the evaluate → revise → verify loop. The evaluator model scores the output across five dimensions (covered in detail in Module 3). If the score is below threshold, the evaluator provides structured feedback, the producer revises, and the loop repeats. Up to 3 rounds. A final PASS/FAIL determination is recorded.
Stage 8: compress_and_store()
File: memory.py
After a successful run, the planner model compresses the run into a structured observation — a title, a narrative paragraph, and a list of key facts — and stores it in both the SQLite and ChromaDB indices. Future runs on related topics will retrieve this observation in stage 2.
Component Roles
| File | Role | Input | Output |
|---|---|---|---|
agent.py |
Main entry point; stage orchestration | Task string | Written file + run record |
planner.py |
Task analysis | Task + memory context | Plan dataclass |
memory.py |
Persistent observation store | Query / Observation | Context string / stored row |
wiggum.py |
Evaluate → revise → verify | Draft output + task | Scored, revised output |
skills.py |
Skill registry and injection | Task / pipeline stage | Modified prompts / behavior |
chunker.py |
Large-doc context extraction | File path | Chunked context string |
logger.py |
Structured run logging | Stage events | runs.jsonl + trace JSON |
security.py |
Code and injection scanning | Code / search results | Sanitized inputs |
orchestrator.py |
Compound task coordination | Compound task | Assembled document |
inference.py |
Backend shim | Model call | Ollama/vLLM response |
The Shared Log
Every run appends a record to runs.jsonl. Every stage span appends to a Chrome Trace Event JSON file in traces/. These two files are the ground truth for all analysis — experiments compare runs.jsonl entries; Perfetto visualizes traces/ files. Nothing exists that isn't logged.