The Literature Review Pipeline
- Trace the literature review pipeline from arXiv fetch through Semantic Scholar enrichment, 5-persona curation, annotation, and Jinja2 rendering
- Explain how Semantic Scholar hub scores identify high-influence papers and how gap candidates are identified from citation graph analysis
- Distinguish the survey template from the gaps template and explain when each is appropriate
The Pipeline
The literature review pipeline is a /lit-review skill that automates the full lifecycle of a systematic literature review — from paper discovery through final rendered document. It produces either a comprehensive survey or a gap analysis.
arxiv_fetch.py
→ semantic_scholar.py (citation enrichment)
→ curator.py (5-persona quality filter)
→ annotate_abstracts.py (Nanda 8-move annotation + Wiggum)
→ synthesize() (cluster → synthesize across papers)
→ Jinja2 render (survey or gaps template)
Stage 1: arXiv Fetch
# Fetch papers matching a query:
python arxiv_fetch.py "agentic LLM harness engineering" --max 300
# → arxiv_agentic_llm_harness.csv (300 papers)
# With date filter (for incremental updates):
python arxiv_fetch.py "prompt injection" --after 2024-06-01 --append existing.csv
# Inspect existing dataset:
python arxiv_fetch.py --stats arxiv_agentic_papers.csv
The CSV schema: arxiv_id, title, authors, published, abstract, url. Deduplication by arxiv_id prevents adding the same paper twice across incremental fetches.
Stage 2: Semantic Scholar Enrichment
ArXiv metadata lacks citation data. semantic_scholar.py enriches each paper with Semantic Scholar API data:
python semantic_scholar.py arxiv_agentic_papers.csv
Added fields per paper:
citation_count— total citations in the S2 graphinfluential_citation_count— citations from papers that themselves have high citation countshub_score— eigenvector centrality in the local citation graph (identifies papers cited by many other important papers)references— list of arxiv_ids this paper citescitations— list of arxiv_ids that cite this paper
The hub score is the key signal for identifying foundational papers: a paper with a high hub score is not just popular — it is cited by other influential papers, making it structurally important to the field.
# Gap candidates: papers with high hub score but low direct citation count
# These are referenced by important work but not widely read
python semantic_scholar.py arxiv_agentic_papers.csv --fetch-gaps 20
# → fetches 20 gap candidate papers not yet in the CSV and appends them
Stage 3: 5-Persona Curation
curator.py runs 5 simulated reviewer personas over each abstract, deciding whether to include the paper in the curated set:
PERSONAS = [
"ML practitioner building production agents",
"Academic researcher studying multi-agent coordination",
"Security engineer assessing prompt injection risks",
"ML engineer focused on inference optimization",
"Technical writer surveying the field"
]
def curate_paper(abstract, personas, model):
votes = []
for persona in personas:
prompt = (
f"You are a {persona} evaluating a paper abstract for relevance. "
f"Abstract: {abstract}\n\n"
f"Is this paper relevant to your work? Respond YES or NO with a one-sentence reason."
)
response = ollama.chat(model=model, messages=[{"role": "user", "content": prompt}])
votes.append(response["message"]["content"].strip().startswith("YES"))
# Keep if majority vote (3/5 or better)
return sum(votes) >= 3
The curation log (curation_log.jsonl) records each vote with reason, enabling retrospective analysis of which papers were controversial and why.
Stage 4: Annotation
python annotate_abstracts.py arxiv_agentic_papers.csv \
--model nanda-annotator \
--out annotated/
Each abstract is annotated using the Nanda 8-move framework (from the previous reading). For high-priority papers (top hub score or >100 citations), the annotation additionally passes through the Wiggum loop:
python agent.py "/annotate /wiggum https://arxiv.org/abs/2308.04079 output.md"
The combined /annotate /wiggum invocation produces an annotation and then evaluates it for quality — catching cases where the annotator misidentifies moves or misses key claims.
Stage 5: Synthesis and Rendering
After curation and annotation, the pipeline clusters the curated papers by topic (using ChromaDB embeddings over annotation facts) and synthesizes a section for each cluster:
def synthesize_cluster(cluster_papers, template_type, producer_model):
annotations = "\n\n".join(
format_annotation(p) for p in cluster_papers
)
prompt = CLUSTER_SYNTHESIS_PROMPT.format(
template_type=template_type,
annotations=annotations
)
return call_producer(prompt, producer_model)
The final document is rendered through a Jinja2 template:
Survey template (templates/lit_review_survey.j2) — academic-style survey: introduction, methodology, thematic sections, comparison tables, conclusion.
Gaps template (templates/lit_review_gaps.j2) — gap analysis: what exists, what is missing, which papers to read next, open research questions.
# Run the full pipeline via /lit-review skill:
python agent.py "/lit-review agentic LLM harness engineering save to review.md"
# With gap focus:
python agent.py "/lit-review --template gaps prompt injection save to gaps.md"
# Using existing CSV (skip fetch):
python agent.py "/lit-review --csv arxiv_agentic_papers.csv --no-fetch \
--template survey agentic LLM save to survey.md"
What the Pipeline Produces
For a corpus of 300 papers on "agentic LLM harness engineering":
- After S2 enrichment: citation metadata + hub scores for all 300 papers
- After curation: ~200 papers pass (majority vote)
- After annotation: ~200 structured Nanda 8-move annotations
- After synthesis: 10–15 topical sections, ~8,000–12,000 words
- Final document: a peer-review-quality survey or gap analysis
The whole pipeline, running locally with cached search results, takes approximately 3–4 hours for 300 papers.