Building a Saturation Search Loop
Setup
This lab requires Ollama running locally with at least one model available for compression and query generation. The DDGS library handles web search.
conda activate ollama-pi
pip install ddgs sqlite3 chromadb sentence-transformers
# Verify Ollama is running:
ollama list # should show at least one model
If you do not have Ollama available, the lab provides mock functions for all model calls — you can complete exercises 1–3 without a running LLM.
Exercise 1: The TTL Search Cache
Implement a 24-hour SQLite TTL cache wrapping DDGS web search.
import sqlite3
import hashlib
import json
import time
from ddgs import DDGS
CACHE_DB = 'search_cache.db'
CACHE_TTL = 86400 # 24 hours
def init_cache():
conn = sqlite3.connect(CACHE_DB)
conn.execute("""
CREATE TABLE IF NOT EXISTS search_cache (
key TEXT PRIMARY KEY,
results TEXT,
timestamp REAL
)
""")
conn.commit()
return conn
def web_search_cached(query: str, max_results: int = 10) -> list[dict]:
conn = init_cache()
key = hashlib.sha256(query.encode()).hexdigest()
# YOUR CODE: check cache first; return cached results if within TTL
# YOUR CODE: if cache miss, call DDGS, store results, return them
pass
# Test:
results1 = web_search_cached("LLM context engineering techniques")
results2 = web_search_cached("LLM context engineering techniques") # should hit cache
print(f"Results: {len(results1)} items")
print(f"Cache working: second call should be instant")
Verify: Add timing to both calls. The second call should be <10ms (cache hit).
Exercise 2: Heuristic Novelty Assessment
Implement assess_novelty_heuristic() and integrate it into a search loop.
def assess_novelty_heuristic(new_results: list[dict], knowledge_state: str) -> int:
"""Returns 0-10: fraction of new words not in knowledge_state, scaled to 10."""
# YOUR CODE: extract all words from new_results bodies
# YOUR CODE: compute word-level set difference vs knowledge_state
# YOUR CODE: return round(novel_fraction * 10)
pass
# Unit tests:
known = "retrieval augmented generation context window"
new_same = [{"body": "retrieval augmented generation context window management"}]
new_diff = [{"body": "saturation gating novelty threshold rolling compression"}]
assert assess_novelty_heuristic(new_same, known) < 4 # mostly known
assert assess_novelty_heuristic(new_diff, known) > 6 # mostly new
print("Novelty tests passed")
Integrate the novelty check into a search loop that:
- Runs at least SEARCHES_PER_TASK=2 rounds
- Assesses novelty after each round
- Stops if novelty < NOVELTY_THRESHOLD=3 after the minimum rounds
- Logs the novelty score for each round
Exercise 3: Rolling Knowledge Compression
Implement compress_knowledge() using a local Ollama model (or the provided mock).
import ollama
KNOWLEDGE_MAX_CHARS = 1500
COMPRESS_MODEL = "glm4:9b" # or your fastest available model
COMPRESS_PROMPT = """\
Current knowledge summary:
{current_state}
New search results to incorporate:
{new_results}
Update the summary to include new information. Be concise: 5-8 bullet points,
each starting with a key fact. Do not exceed {max_chars} characters.
Output ONLY the bullet points."""
def compress_knowledge(current_state: str, new_results: list[dict],
model: str = COMPRESS_MODEL) -> str:
# YOUR CODE: first round: build state from results directly (no model call)
# YOUR CODE: subsequent rounds: call model with COMPRESS_PROMPT
# YOUR CODE: ensure output stays within KNOWLEDGE_MAX_CHARS
pass
# Test the state stays bounded:
state = ""
for i in range(5):
results = web_search_cached(f"LLM agents query {i}")
state = compress_knowledge(state, results)
print(f"Round {i+1}: state length = {len(state)} chars")
assert len(state) <= KNOWLEDGE_MAX_CHARS, "State exceeded budget!"
Exercise 4: Gap-Filling Query Generation
Implement plan_query() that generates a targeted query for round 3+.
PLAN_QUERY_PROMPT = """\
Task: {task}
What is already known:
{knowledge_state}
Generate ONE search query to find important information NOT yet covered above.
Output ONLY the query string."""
def plan_query(task: str, knowledge_state: str, round_num: int,
model: str = COMPRESS_MODEL) -> str:
if round_num <= 2 or not knowledge_state:
# Derive query from task directly for early rounds
return task.split("save to")[0].strip()
# YOUR CODE: call model with PLAN_QUERY_PROMPT
# YOUR CODE: return the generated query string
pass
# Test:
task = "Research the top 5 context engineering techniques for production LLM agents"
state = "RAG and retrieval augmented generation are widely used techniques..."
q3 = plan_query(task, state, round_num=3)
print(f"Round 3 query: {q3}")
# Should NOT be about RAG — the state already covers it
Exercise 5: URL Novelty Gating
Implement enrich_novel_urls() that skips URLs whose snippet overlaps significantly with the knowledge state.
def fetch_url_content(url: str) -> str:
"""Fetch and convert URL to plain text. Returns '' on failure."""
try:
from markitdown import MarkItDown
md = MarkItDown()
return md.convert_url(url).text_content[:3000]
except Exception:
return ""
def enrich_novel_urls(results: list[dict], knowledge_state: str,
max_fetch: int = 2, overlap_threshold: float = 0.6) -> str:
# YOUR CODE: for each result, compute word overlap between snippet and knowledge_state
# YOUR CODE: skip if overlap > overlap_threshold (log the skip)
# YOUR CODE: fetch and return content for up to max_fetch novel URLs
pass
Test: Run on a set of results where 3 of 5 snippets overlap heavily with a pre-populated knowledge state. Verify that only 2 URLs are fetched.
Exercise 6: Complete Saturation Loop vs. Baseline
Assemble your components into a complete gather_research() function and compare it to a fixed dual-search baseline.
def gather_research_saturation(task: str, planned_queries: list[str]) -> str:
"""Complete saturation-gated research loop."""
knowledge_state = ""
all_results = []
SEARCHES_PER_TASK = 2
MAX_SEARCH_ROUNDS = 5
NOVELTY_THRESHOLD = 3
for round_num in range(1, MAX_SEARCH_ROUNDS + 1):
query = (planned_queries[round_num - 1]
if round_num <= len(planned_queries)
else plan_query(task, knowledge_state, round_num))
results = web_search_cached(query)
novelty = assess_novelty_heuristic(results, knowledge_state)
print(f"[round {round_num}] novelty={novelty} query='{query[:50]}'")
if novelty < NOVELTY_THRESHOLD and round_num > SEARCHES_PER_TASK:
print(" saturation reached — stopping")
break
all_results.extend(results)
knowledge_state = compress_knowledge(knowledge_state, results)
enriched = enrich_novel_urls(all_results, knowledge_state)
return "\n\n".join(r.get("body", "") for r in all_results) + enriched
def gather_research_baseline(task: str, planned_queries: list[str]) -> str:
"""Fixed dual-search baseline."""
all_results = []
for query in planned_queries[:2]:
all_results.extend(web_search_cached(query))
return "\n\n".join(r.get("body", "") for r in all_results)
# Compare on a topic:
task = "Research cost management strategies for production LLM agents"
queries = ["LLM production cost management", "AI agent token budget optimization"]
baseline_context = gather_research_baseline(task, queries)
saturation_context = gather_research_saturation(task, queries)
print(f"\nBaseline context: {len(baseline_context)} chars")
print(f"Saturation context: {len(saturation_context)} chars")
print(f"Saturation coverage: {len(saturation_context) / len(baseline_context):.1f}x baseline")
Analysis: For a simple topic (well-covered in 2 queries), saturation should stop at round 2 and produce similar context to baseline. For a complex or niche topic, saturation should continue beyond 2 rounds and produce richer context.