Saturation Gating
- Implement a saturation-gated search loop using heuristic novelty assessment, a rolling knowledge state, and a configurable novelty threshold
- Compare heuristic (word-overlap) and model-based novelty assessment, explaining the latency and accuracy tradeoffs of each
- Explain how compress_knowledge() maintains a rolling summary and why it is only called for accepted search rounds
- Explain the purpose of NOVELTY_EPSILON in the saturation gate and describe the failure mode it prevents when all queries return below-threshold novelty scores
The Design Problem
The fixed dual-search loop runs exactly two searches regardless of topic complexity. Saturation gating replaces the fixed loop with a stopping criterion: keep searching while new results are genuinely new; stop when they start repeating.
This is the same principle that guides a competent human researcher: you stop when your search results start recapitulating what you already know.
Configuration
MAX_SEARCH_ROUNDS = 5 # hard cap regardless of novelty
NOVELTY_THRESHOLD = 3 # 0–10; stop if new results score below this
KNOWLEDGE_MAX_CHARS = 1500 # cap on rolling knowledge state fed to novelty prompt
NOVELTY_EPSILON = 0.15 # ε-greedy: pass sub-threshold results through 15% of the time
The existing SEARCHES_PER_TASK = 2 becomes a minimum — the loop always runs at least 2 rounds before novelty gating kicks in. This preserves backward compatibility: simple tasks run exactly 2 rounds, complex tasks run up to 5.
The Saturation Loop
def gather_research(task, planned_queries, ...):
knowledge_state = "" # rolling compressed summary
all_results = [] # deduplicated raw results
for round in range(1, MAX_SEARCH_ROUNDS + 1):
# Generate query: use planned_queries for rounds 1-2, then plan_query()
if round <= len(planned_queries):
query = planned_queries[round - 1]
else:
query = plan_query(task, knowledge_state, round)
results = web_search_cached(query)
novelty = assess_novelty(results, knowledge_state)
log(f"[search {round}] novelty={novelty} query={query}")
# Gate: stop if saturation reached (after minimum rounds)
if novelty < NOVELTY_THRESHOLD and round > SEARCHES_PER_TASK:
if random.random() < NOVELTY_EPSILON:
log(" [novelty] saturation but ε-greedy pass-through — continuing")
else:
log(" [novelty] saturation — stopping search")
break
# Accept this round
all_results = merge_deduplicated(all_results, results)
knowledge_state = compress_knowledge(knowledge_state, results)
# URL enrichment — only fetch URLs not already covered by knowledge_state
enriched = enrich_novel_urls(all_results, knowledge_state)
return format_results(all_results) + enriched
The key invariant: compress_knowledge() is only called for accepted rounds — rounds that pass the novelty gate. Rejected rounds produce no model call and no state update. This prevents wasted compute on low-value search rounds.
Epsilon-Greedy Pass-Through
The saturation gate has a failure mode: in sessions where every query happens to return below-threshold novelty — due to caching, query convergence, or a topic that is genuinely exhausted — the loop terminates at the minimum two rounds even for complex tasks.
NOVELTY_EPSILON = 0.15 adds a random escape valve. Fifteen percent of the time, a below-threshold round is accepted anyway and the loop continues:
if novelty < NOVELTY_THRESHOLD and round > SEARCHES_PER_TASK:
if random.random() < NOVELTY_EPSILON:
log(" [novelty] saturation but ε-greedy pass-through — continuing")
else:
log(" [novelty] saturation — stopping search")
break
The pass-through does not reset the threshold — if the next round also scores below threshold, it is again subject to the 15% gate. The effect is that no single low-novelty result can terminate the loop with certainty; a sustained sequence of low-novelty results terminates it with high probability.
The name comes from ε-greedy exploration in reinforcement learning: exploit the stopping condition 85% of the time (stop when results are stale), explore past it 15% of the time (continue in case the heuristic underestimated novelty). The analogy holds — the harness is balancing exploitation against exploration on every search round.
Novelty Assessment: Heuristic
def assess_novelty_heuristic(new_results: list[dict], knowledge_state: str) -> int:
new_words = set(
w for r in new_results
for w in r.get("body", "").lower().split()
)
known_words = set(knowledge_state.lower().split())
if not new_words:
return 0
novel_fraction = len(new_words - known_words) / len(new_words)
return round(novel_fraction * 10) # 0–10
Word-level set difference: what fraction of words in the new results have not appeared in the knowledge state? A score of 3 means 30% of words are new — the rest are repetitions. Below the threshold of 3, the round is rejected.
Pros: ~0ms, no model call, fully deterministic, no latency impact on autoresearch sessions.
Cons: Vocabulary overlap is a weak proxy for semantic novelty. A result that paraphrases everything in the knowledge state using different words will score high even though it adds no new information.
Novelty Assessment: Model-Based
NOVELTY_PROMPT = """\
What is already known:
{knowledge_state}
New search results:
{new_results}
Do these results add genuinely new information not already covered above?
Score 0–10 where 0 = completely redundant, 10 = entirely new information.
Output ONLY the integer score, nothing else."""
def assess_novelty_model(new_results, knowledge_state, model) -> int:
snippet = format_results(new_results)[:800]
prompt = NOVELTY_PROMPT.format(
knowledge_state=knowledge_state[:800],
new_results=snippet,
)
response = ollama.chat(
model=model,
messages=[{"role": "user", "content": prompt}],
options={"temperature": 0, "num_predict": 3} # 1-2 output tokens only
)
raw = response["message"]["content"].strip()
match = re.search(r'\d+', raw)
return int(match.group()) if match else 5 # default neutral on parse failure
Pros: Semantic understanding — catches paraphrase duplicates, topic drift, and structural repetition.
Cons: Adds ~10–15 seconds per round (prefill-dominated; only 1–3 output tokens needed). On autoresearch sessions with many consecutive runs, this latency compounds.
Recommendation: Start with heuristic novelty. Switch to model-based if heuristic is noisy (e.g. accepting rounds that produce structurally duplicate content).
Knowledge Compression
COMPRESS_PROMPT = """\
Current knowledge summary:
{current_state}
New search results to incorporate:
{new_results}
Update the summary to include the new information. Be concise — 5-8 bullet points,
each starting with a key fact. Do not exceed {max_chars} characters total.
Output ONLY the bullet points, nothing else."""
def compress_knowledge(current_state, new_results, model, max_chars=KNOWLEDGE_MAX_CHARS):
if not current_state:
# First round: build initial state directly from results (no model call)
bodies = " ".join(r.get("body", "") for r in new_results)[:1200]
return bodies[:max_chars]
prompt = COMPRESS_PROMPT.format(
current_state=current_state,
new_results=format_results(new_results)[:800],
max_chars=max_chars
)
response = ollama.chat(model=model, messages=[{"role": "user", "content": prompt}],
options={"temperature": 0.1, "num_predict": 400})
return response["message"]["content"].strip()[:max_chars]
The knowledge state is a rolling 1,500-character bullet-point summary of everything gathered so far. It serves two purposes:
- Input to novelty assessment — new results are compared against it
- Input to adaptive query generation —
plan_query()uses it to identify gaps
The first round skips the model call and builds the state directly from search results (fast path). Subsequent rounds compress incrementally — the model adds new facts and removes redundancies, keeping the total under the character cap.
Adaptive Query Generation
For rounds 3 and beyond, instead of using pre-planned queries, plan_query() generates a gap-filling query:
PLAN_QUERY_PROMPT = """\
Task: {task}
What is already known:
{knowledge_state}
Generate ONE search query to find important information about the task NOT yet covered
above. Output ONLY the query string, nothing else."""
def plan_query(task, knowledge_state, round, model):
if round <= len(planned_queries) or not knowledge_state:
# Use pre-planned queries for early rounds
return planned_queries[round - 1] if round <= len(planned_queries) else task
response = ollama.chat(model=model,
messages=[{"role": "user", "content":
PLAN_QUERY_PROMPT.format(task=task,
knowledge_state=knowledge_state)}],
options={"temperature": 0.3, "num_predict": 60})
return response["message"]["content"].strip().strip('"')
By round 3, the model knows what the first two queries covered and can generate a query specifically targeting the uncovered territory. This is the mechanism that allows complex topics to get 4–5 rounds of genuinely diverse search coverage.
URL Enrichment with Novelty Gating
def enrich_novel_urls(results, knowledge_state, count=URL_ENRICH_COUNT) -> str:
blocks = []
fetched = 0
for r in results:
if fetched >= count:
break
snippet = r.get("body", "")
snippet_words = set(snippet.lower().split())
known_words = set(knowledge_state.lower().split())
overlap = len(snippet_words & known_words) / max(len(snippet_words), 1)
if overlap > 0.6:
log(f" [enrich] skipping {r['href'][:50]} — {overlap:.0%} overlap")
continue
content = fetch_url_content(r["href"])
if content:
blocks.append(f"**Full page: {r.get('title','')}**\n{r['href']}\n\n{content}")
fetched += 1
return "\n\n---\n\n".join(blocks)
URL enrichment is expensive — fetching and converting a full page takes 30–60 seconds. The novelty gate skips any URL whose snippet overlaps more than 60% with the knowledge state. Only genuinely new URLs get full-page treatment.