The Dual-Search Foundation
- Describe the fixed dual-search loop and the quality-floor fallback mechanism, including the conditions that trigger each
- Explain how the 24-hour SQLite TTL cache reduces search latency and what tradeoffs it introduces
- Identify the failure modes of a fixed N-search loop (over-search and under-search) and explain why they motivated saturation gating
The Original Research Loop
The harness began with a fixed dual-search architecture: run exactly two web searches per task, merge the results, pass them to synthesis. Simple, deterministic, reproducible.
This is the version validated in Experiment 1. The 9-run CRD across three task types used the dual-search loop as a controlled variable — by holding search count constant, the experiment isolated the effect of task type on output quality.
How the Dual-Search Loop Works
SEARCHES_PER_TASK = 2
SEARCH_QUALITY_FLOOR = 1800 # minimum characters of results to proceed
def gather_research(task, planned_queries):
all_results = []
for i, query in enumerate(planned_queries[:SEARCHES_PER_TASK]):
results = web_search_cached(query)
# Quality floor: if results are thin, try a fallback query
if total_chars(results) < SEARCH_QUALITY_FLOOR and i == 0:
fallback = simplify_query(query)
results = web_search_cached(fallback)
log(f"[quality floor] fallback triggered: {fallback}")
all_results.extend(results)
return format_results(deduplicate(all_results))
Two searches per task. The first query comes from the planner's search_queries[0] — the most targeted query for the task. The second comes from search_queries[1] — a complementary angle.
The Quality Floor
The quality floor is a minimum-adequacy check on the first search. If the first query returns fewer than 1,800 characters of results — which happens when a query is too niche, misspelled, or uses terminology that doesn't match the web's vocabulary — a fallback query is generated automatically:
def simplify_query(query: str) -> str:
# Strip technical jargon, keep the core concept
# e.g. "saturation-gated novelty-assessed research loop LLM" → "LLM research loop"
words = query.split()
return " ".join(words[:4]) # first 4 words as a crude simplification
The floor prevents synthesis from running on near-empty context — a situation that reliably produces placeholder content and low Wiggum scores.
The SQLite Search Cache
All DDGS (DuckDuckGo Search) calls are wrapped in a 24-hour TTL cache:
import hashlib, sqlite3, time
def web_search_cached(query: str) -> list[dict]:
key = hashlib.sha256(query.encode()).hexdigest()
conn = sqlite3.connect('search_cache.db')
# Check cache
row = conn.execute(
'SELECT results, timestamp FROM search_cache WHERE key=?', (key,)
).fetchone()
if row and (time.time() - row[1]) < 86400: # 24h TTL
return json.loads(row[0])
# Cache miss — fetch and store
results = ddgs_search(query)
conn.execute(
'INSERT OR REPLACE INTO search_cache VALUES (?,?,?)',
(key, json.dumps(results), time.time())
)
conn.commit()
return results
The cache serves two purposes:
Speed: Repeated runs on the same task within 24 hours skip the network round-trip. Autoresearch sessions run many variants of similar tasks — without caching, this would mean hundreds of identical network calls.
Reproducibility: Within a single experiment, all runs with the same query get the same search results. This is essential for controlled comparisons — if search results vary between runs, you can't isolate the effect of the harness change you're testing.
The tradeoff is staleness: cached results are up to 24 hours old. For time-sensitive topics this matters; for knowledge synthesis over established techniques, it doesn't.
Result Format
DDGS returns a list of dicts. The harness formats these for synthesis context:
def format_results(results: list[dict]) -> str:
blocks = []
for r in results:
blocks.append(
f"**{r.get('title', 'Untitled')}**\n"
f"{r.get('href', '')}\n\n"
f"{r.get('body', '')}"
)
return "\n\n---\n\n".join(blocks)
The formatted string is what gets passed to synthesis. Each result has a title (bold), a URL, and the snippet body. The producer model can cite URLs in its output.
The Problem with Fixed Search
The dual-search loop is adequate for Experiment 1 but has two systematic failure modes:
Over-search — simple, well-covered topics saturate after one search round. The second query fetches results that overlap heavily with the first: same facts, same sources, different phrasing. This inflates the synthesis context with redundant information, sometimes causing the model to structure output around the search format rather than the task structure. It also wastes time and tokens.
Under-search — complex, cross-disciplinary topics have meaningful signal beyond the second query. Two searches covering "context engineering LLM" and "RAG retrieval augmented generation" miss adjacent material on context compression, tool-calling strategies, and recent papers on context window management. The agent synthesizes from an incomplete picture.
URL enrichment has the same problem. Fetching full page content for URLs whose snippet is already fully covered wastes 30–60 seconds per URL and dilutes the synthesis context.
These failure modes motivated the saturation-gating approach described in the next reading.