Harness Engineering for AI Agents · Context Engineering & Memory

Building a Saturation Search Loop

Colab Notebook · ~60 min

Google Colab Notebook

Python · ~60 min

Lab Objectives

Implement a minimal dual-search loop with a SQLite TTL cache and verify that cache hits skip the network round-trip

Add heuristic novelty assessment (word-overlap) and verify that low-novelty rounds are rejected without calling compress_knowledge()

Implement rolling knowledge compression using a local Ollama model and verify the state stays within the KNOWLEDGE_MAX_CHARS budget

Add gap-filling query generation with plan_query() and verify that round 3+ queries differ meaningfully from rounds 1–2

Add URL enrichment with the 60% word-overlap novelty gate and verify that redundant URLs are skipped

Run the complete saturation loop against a real topic and compare its search coverage to a fixed dual-search baseline on the same topic

Setup

This lab requires Ollama running locally with at least one model available for compression and query generation. The DDGS library handles web search.

conda activate ollama-pi
pip install ddgs sqlite3 chromadb sentence-transformers

# Verify Ollama is running:
ollama list  # should show at least one model

If you do not have Ollama available, the lab provides mock functions for all model calls — you can complete exercises 1–3 without a running LLM.

Exercise 1: The TTL Search Cache

Implement a 24-hour SQLite TTL cache wrapping DDGS web search.

import sqlite3
import hashlib
import json
import time
from ddgs import DDGS

CACHE_DB = 'search_cache.db'
CACHE_TTL = 86400  # 24 hours

def init_cache():
    conn = sqlite3.connect(CACHE_DB)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS search_cache (
            key TEXT PRIMARY KEY,
            results TEXT,
            timestamp REAL
        )
    """)
    conn.commit()
    return conn

def web_search_cached(query: str, max_results: int = 10) -> list[dict]:
    conn = init_cache()
    key = hashlib.sha256(query.encode()).hexdigest()

    # YOUR CODE: check cache first; return cached results if within TTL
    # YOUR CODE: if cache miss, call DDGS, store results, return them
    pass

# Test:
results1 = web_search_cached("LLM context engineering techniques")
results2 = web_search_cached("LLM context engineering techniques")  # should hit cache

print(f"Results: {len(results1)} items")
print(f"Cache working: second call should be instant")

Verify: Add timing to both calls. The second call should be <10ms (cache hit).

Exercise 2: Heuristic Novelty Assessment

Implement assess_novelty_heuristic() and integrate it into a search loop.

def assess_novelty_heuristic(new_results: list[dict], knowledge_state: str) -> int:
    """Returns 0-10: fraction of new words not in knowledge_state, scaled to 10."""
    # YOUR CODE: extract all words from new_results bodies
    # YOUR CODE: compute word-level set difference vs knowledge_state
    # YOUR CODE: return round(novel_fraction * 10)
    pass

# Unit tests:
known = "retrieval augmented generation context window"
new_same = [{"body": "retrieval augmented generation context window management"}]
new_diff = [{"body": "saturation gating novelty threshold rolling compression"}]

assert assess_novelty_heuristic(new_same, known) < 4  # mostly known
assert assess_novelty_heuristic(new_diff, known) > 6  # mostly new
print("Novelty tests passed")

Integrate the novelty check into a search loop that:

Runs at least SEARCHES_PER_TASK=2 rounds
Assesses novelty after each round
Stops if novelty < NOVELTY_THRESHOLD=3 after the minimum rounds
Logs the novelty score for each round

Exercise 3: Rolling Knowledge Compression

Implement compress_knowledge() using a local Ollama model (or the provided mock).

import ollama

KNOWLEDGE_MAX_CHARS = 1500
COMPRESS_MODEL = "glm4:9b"  # or your fastest available model

COMPRESS_PROMPT = """\
Current knowledge summary:
{current_state}

New search results to incorporate:
{new_results}

Update the summary to include new information. Be concise: 5-8 bullet points,
each starting with a key fact. Do not exceed {max_chars} characters.
Output ONLY the bullet points."""

def compress_knowledge(current_state: str, new_results: list[dict],
                        model: str = COMPRESS_MODEL) -> str:
    # YOUR CODE: first round: build state from results directly (no model call)
    # YOUR CODE: subsequent rounds: call model with COMPRESS_PROMPT
    # YOUR CODE: ensure output stays within KNOWLEDGE_MAX_CHARS
    pass

# Test the state stays bounded:
state = ""
for i in range(5):
    results = web_search_cached(f"LLM agents query {i}")
    state = compress_knowledge(state, results)
    print(f"Round {i+1}: state length = {len(state)} chars")
    assert len(state) <= KNOWLEDGE_MAX_CHARS, "State exceeded budget!"

Exercise 4: Gap-Filling Query Generation

Implement plan_query() that generates a targeted query for round 3+.

PLAN_QUERY_PROMPT = """\
Task: {task}

What is already known:
{knowledge_state}

Generate ONE search query to find important information NOT yet covered above.
Output ONLY the query string."""

def plan_query(task: str, knowledge_state: str, round_num: int,
               model: str = COMPRESS_MODEL) -> str:
    if round_num <= 2 or not knowledge_state:
        # Derive query from task directly for early rounds
        return task.split("save to")[0].strip()

    # YOUR CODE: call model with PLAN_QUERY_PROMPT
    # YOUR CODE: return the generated query string
    pass

# Test:
task = "Research the top 5 context engineering techniques for production LLM agents"
state = "RAG and retrieval augmented generation are widely used techniques..."
q3 = plan_query(task, state, round_num=3)
print(f"Round 3 query: {q3}")
# Should NOT be about RAG — the state already covers it

Exercise 5: URL Novelty Gating

Implement enrich_novel_urls() that skips URLs whose snippet overlaps significantly with the knowledge state.

def fetch_url_content(url: str) -> str:
    """Fetch and convert URL to plain text. Returns '' on failure."""
    try:
        from markitdown import MarkItDown
        md = MarkItDown()
        return md.convert_url(url).text_content[:3000]
    except Exception:
        return ""

def enrich_novel_urls(results: list[dict], knowledge_state: str,
                       max_fetch: int = 2, overlap_threshold: float = 0.6) -> str:
    # YOUR CODE: for each result, compute word overlap between snippet and knowledge_state
    # YOUR CODE: skip if overlap > overlap_threshold (log the skip)
    # YOUR CODE: fetch and return content for up to max_fetch novel URLs
    pass

Test: Run on a set of results where 3 of 5 snippets overlap heavily with a pre-populated knowledge state. Verify that only 2 URLs are fetched.

Exercise 6: Complete Saturation Loop vs. Baseline

Assemble your components into a complete gather_research() function and compare it to a fixed dual-search baseline.

def gather_research_saturation(task: str, planned_queries: list[str]) -> str:
    """Complete saturation-gated research loop."""
    knowledge_state = ""
    all_results = []
    SEARCHES_PER_TASK = 2
    MAX_SEARCH_ROUNDS = 5
    NOVELTY_THRESHOLD = 3

    for round_num in range(1, MAX_SEARCH_ROUNDS + 1):
        query = (planned_queries[round_num - 1]
                 if round_num <= len(planned_queries)
                 else plan_query(task, knowledge_state, round_num))

        results = web_search_cached(query)
        novelty = assess_novelty_heuristic(results, knowledge_state)
        print(f"[round {round_num}] novelty={novelty} query='{query[:50]}'")

        if novelty < NOVELTY_THRESHOLD and round_num > SEARCHES_PER_TASK:
            print("  saturation reached — stopping")
            break

        all_results.extend(results)
        knowledge_state = compress_knowledge(knowledge_state, results)

    enriched = enrich_novel_urls(all_results, knowledge_state)
    return "\n\n".join(r.get("body", "") for r in all_results) + enriched

def gather_research_baseline(task: str, planned_queries: list[str]) -> str:
    """Fixed dual-search baseline."""
    all_results = []
    for query in planned_queries[:2]:
        all_results.extend(web_search_cached(query))
    return "\n\n".join(r.get("body", "") for r in all_results)

# Compare on a topic:
task = "Research cost management strategies for production LLM agents"
queries = ["LLM production cost management", "AI agent token budget optimization"]

baseline_context = gather_research_baseline(task, queries)
saturation_context = gather_research_saturation(task, queries)

print(f"\nBaseline context: {len(baseline_context)} chars")
print(f"Saturation context: {len(saturation_context)} chars")
print(f"Saturation coverage: {len(saturation_context) / len(baseline_context):.1f}x baseline")

Analysis: For a simple topic (well-covered in 2 queries), saturation should stop at round 2 and produce similar context to baseline. For a complex or niche topic, saturation should continue beyond 2 rounds and produce richer context.

Building a Saturation Search Loop

Setup

Exercise 1: The TTL Search Cache

Exercise 2: Heuristic Novelty Assessment

Exercise 3: Rolling Knowledge Compression

Exercise 4: Gap-Filling Query Generation

Exercise 5: URL Novelty Gating

Exercise 6: Complete Saturation Loop vs. Baseline

Privacy Policy

What we collect

What we don't collect

Your choices

Contact