projects/manuscript-regeneration.md
text/markdown · 10 KB
Manuscript Regeneration
2026-01-15
In 2005, my dad enlisted my help to label data for a study he was running. At the time, I was getting paid $10 per hour for my efforts.
Fast-forward 20 years later, he reached out again for help, albeit, this time, completely unpaid.
He wanted to understand how AI agents could accelerate the medical manuscript generation process by assessing whether an agentic system could take his study and reproduce it across a different time horizon.
I was intrigued at the concept. Reproducing a medical manuscript such as the one he published in 2005 is no trivial task for humans—upon a cursory analysis, the high level things a human / team would have to do are:
- Understand the paper itself (what were the objectives, methodologies, results, implications, etc.?)
- Understand the data sources (are they still available, how do you access them, are they behind institutional firewalls / license agreements?)
- Design the reproduction study itself and align it as closely as possible to the original study
- Execute the analysis—find the data, obtain it, work with it, make sure it's relevant to the study goal, analyze it
- Write the findings, compare to the original, being mindful of limitations and changes to the data landscape.
Furthermore, I was particularly interested in the possibilities beyond this:
- Can an agent go as far as to produce novel study protocols?
- Can an agent both produce novel study protocols, then furnish them?
From an engineering perspective, I was particularly interested in just how thin we could make the agent harness—e.g. give the system some guidance, a few primitive tools, and really test the capabilities of the underlying model.
For this write-up, I'll focus specifically on the implementation of the original task at hand, which is reproducing the paper he gave. A few bodies of work were produced as secondary experiments from this, which you can read in the protocol ideation project note and in the literature search section of the same repository.
You can check out some sample studies and protocols here.
Repository and stack
The implementation lives in abuswami1996/manuscript-research: a small Python CLI built on LangChain Deep Agents (deepagents). The manuscript pipeline is one of several registered “apps”; each run gets an isolated directory under workspace/runs/<project>/<timestamp>/, with the reference paper copied in as paper.md when you start from the template at the repo root.
At a high level, the stack is intentionally small:
- One principal agent created with
create_deep_agent, backed by a virtual filesystem (FilesystemBackendwithvirtual_mode=True) rooted at the run directory. - Three subagents (data wrangler, statistician, manuscript writer), each defined as a dict with a name, description, system prompt, and tool list, passed into
create_deep_agentassubagents=[...]. - Two shared tools reused across the repo:
internet_search(Tavily) andrun_python(subprocess withcwdset to the active run workspace so scripts read and writedata/,analysis/, etc.).
The “thin harness” is mostly that: register the agent, seed folders, set environment for API keys and optional MANUSCRIPT_MODEL, and let prompts plus the filesystem carry the workflow.
Roles and how they coordinate
The principal agent is framed in code and prompts as a Principal Investigator. It does not reimplement statistics or data engineering inline; it delegates to specialists via the Deep Agent task() mechanism (documented in the PI system prompt as the way to invoke subagents). The three subagents are:
| Role | Purpose | Tools | Primary I/O |
|---|---|---|---|
| data-wrangler | Find, download, clean, and document datasets | internet_search, run_python |
Reads scratchpad context; writes /data/ and data_dictionary.md, acquisition_log.md |
| statistician | Analysis, tables, figures, reproducible scripts | run_python |
Reads /data/ and scratchpad; writes /analysis/ (CSVs, PNGs at 300 dpi, analysis_summary.md, .py scripts) |
| manuscript-writer | Final publication-style Markdown manuscript | (none—filesystem only) | Reads paper.md, scratchpad, and /analysis/; writes /output/manuscript.md |
The principal itself keeps internet_search only (in create_manuscript_agent), matching the idea that it researches and plans while delegating heavy lifting. Subagents get the tools their prompts describe; the writer is deliberately tool-free so it synthesizes from artifacts already on disk.
Orchestration
Delegation uses the Deep Agent task() tool: the PI invokes each subagent in turn, inspects the workspace between stages, and can re-delegate if something fails. The diagrams below mirror the happy path the system prompt encourages (read paper → plan → data → analysis → manuscript).
Filesystem as the integration layer
Rather than a custom message bus, the design uses paths and markdown notes as the contract between steps:
/scratchpad/holds cross-team context:study_design.md(PI extraction from the paper),data_wrangler_notes.md,statistician_notes.md./data/holds analysis-ready CSVs plus a data dictionary and acquisition log./analysis/holds results, figures, a short summary, and scripts for reproducibility./output/holds the finalmanuscript.md.
The PI prompt spells out a linear workflow: read paper.md → write study design → plan with todos → delegate to the wrangler → review /data/ → delegate to the statistician → review /analysis/ → delegate to the writer → read /output/manuscript.md. That sequence mirrors the human checklist at the top of this article, but the implementation detail is just directory layout plus very explicit delegation instructions (variable names, time ranges, which scratchpad files to read and write).
Prompts encode policy, not code
Behavioral policy lives in agents/manuscript/prompts.py as large system strings—for example, the data wrangler is told to prefer public sources, to document synthetic data if real open data cannot be obtained, and to log API limitations; the statistician is given concrete conventions for run_python (imports at top, plt.savefig(..., dpi=300), report CIs and effect sizes). That keeps the Python surface area small: create_manuscript_agent only creates directories, wires prompts and tools, and returns the graph-ready agent.
Runtime: one CLI, isolated runs
main.py selects a registered agent id (default manuscript), resolves the model from MANUSCRIPT_MODEL (defaulting to anthropic:claude-sonnet-4-6), computes the run workspace via resolve_manuscript_run_workspace (timestamped under workspace/runs/... unless overridden), seeds paper.md from the template, then streams or blocks on the run. Success is reported with a path to output/manuscript.md.
The default user prompt in the registry asks to replicate the analysis in paper.md using publicly available data, exclude retracted work where relevant, and return a finished manuscript—aligned with the original question about reproducing a study across a new time horizon under real-world data constraints.
What I took away
The interesting part for me was not a long list of bespoke tools but whether a small set of primitives—search, executable Python, and a shared tree of files—would be enough for the model to approximate the multi-role workflow a lab would use. The manuscript agent is the most direct test of that: same repo, same tools module, different prompts and subagent graphs for ideation and literature search, which I may write up separately in more detail.
If you want to run or extend it, start from the README in manuscript-research: place your reference paper at workspace/paper.md, configure .env, and run python main.py (or python main.py manuscript "your instructions").