Designing multi-agent pipelines with Claude
← Back
April 4, 2026Claude9 min read

Designing multi-agent pipelines with Claude

Published April 4, 20269 min read

The single-prompt approach to complex AI tasks has a ceiling. Once your prompt reaches a certain complexity — extract data, validate it, enrich it, format it, check for policy violations, and output structured JSON — the quality degrades. The model tries to do too many things at once and drops subtle requirements. Multi-agent pipelines solve this by giving each step to a focused agent with a specific role, a tight context window, and a defined output contract.

When to use multi-agent vs single-prompt

Single-prompt is simpler and cheaper. Use it when:

  • The task is one coherent operation (summarise this document)
  • The steps are not independently reviewable or testable
  • Latency matters and you cannot run agents in parallel

Multi-agent adds value when:

  • Different steps require different expertise or personas
  • You need to validate the output of one step before passing it to the next
  • Some steps can run in parallel
  • You want to swap out one step without rebuilding the whole pipeline
  • You need an audit trail of each step's output

The pipeline data structure

Each agent in the pipeline receives a typed input, produces a typed output, and can append to a shared trace log. Here is the base class:

python
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Any, TypeVar, Generic
import anthropic

T_in = TypeVar("T_in")
T_out = TypeVar("T_out")


@dataclass
class PipelineContext:
    """Shared context passed through the pipeline."""
    trace: list[dict] = field(default_factory=list)
    metadata: dict = field(default_factory=dict)

    def log(self, agent_name: str, input_data: Any, output_data: Any) -> None:
        self.trace.append({
            "agent": agent_name,
            "input_summary": str(input_data)[:200],
            "output_summary": str(output_data)[:200],
        })


class Agent(ABC, Generic[T_in, T_out]):
    def __init__(self, name: str, model: str = "claude-opus-4-5"):
        self.name = name
        self.client = anthropic.Anthropic()
        self.model = model

    def run(self, input_data: T_in, ctx: PipelineContext) -> T_out:
        output = self._run(input_data, ctx)
        ctx.log(self.name, input_data, output)
        return output

    @abstractmethod
    def _run(self, input_data: T_in, ctx: PipelineContext) -> T_out:
        pass

    def call_claude(self, system: str, user: str, max_tokens: int = 2048) -> str:
        response = self.client.messages.create(
            model=self.model,
            max_tokens=max_tokens,
            system=system,
            messages=[{"role": "user", "content": user}],
        )
        return response.content[0].text

A real example: job application screener

Three agents: extract structured data from a CV, score it against requirements, and generate a recruiter summary. Each is testable in isolation.

python
import json
from dataclasses import dataclass


@dataclass
class CVData:
    raw_text: str


@dataclass
class ExtractedProfile:
    name: str
    years_experience: int
    skills: list[str]
    education: str
    previous_roles: list[str]


@dataclass
class ScoredProfile:
    profile: ExtractedProfile
    score: int          # 0-100
    strengths: list[str]
    gaps: list[str]


@dataclass
class RecruiterSummary:
    scored: ScoredProfile
    recommendation: str   # "advance" | "hold" | "reject"
    summary: str          # 2-3 sentences for the recruiter


class ExtractionAgent(Agent[CVData, ExtractedProfile]):
    def _run(self, input_data: CVData, ctx: PipelineContext) -> ExtractedProfile:
        raw = self.call_claude(
            system="Extract structured information from CVs. Always respond with valid JSON.",
            user=f"""Extract the following from this CV. Respond with JSON only.
Fields: name (string), years_experience (int), skills (list of strings),
education (string, highest degree), previous_roles (list of job titles).

CV:
{input_data.raw_text}""",
        )
        data = json.loads(raw)
        return ExtractedProfile(**data)


JOB_REQUIREMENTS = """
Required: 3+ years Python, REST API design, PostgreSQL, AWS
Preferred: FastAPI, Docker, experience with financial systems
Nice to have: Kafka, TypeScript
"""


class ScoringAgent(Agent[ExtractedProfile, ScoredProfile]):
    def _run(self, input_data: ExtractedProfile, ctx: PipelineContext) -> ScoredProfile:
        raw = self.call_claude(
            system="You are an objective technical recruiter. Respond with valid JSON.",
            user=f"""Score this candidate against our requirements.

Job Requirements:
{JOB_REQUIREMENTS}

Candidate Profile:
{json.dumps(input_data.__dict__, indent=2)}

Respond with JSON: {{ "score": 0-100, "strengths": [...], "gaps": [...] }}""",
        )
        data = json.loads(raw)
        return ScoredProfile(
            profile=input_data,
            score=data["score"],
            strengths=data["strengths"],
            gaps=data["gaps"],
        )


class SummaryAgent(Agent[ScoredProfile, RecruiterSummary]):
    def _run(self, input_data: ScoredProfile, ctx: PipelineContext) -> RecruiterSummary:
        raw = self.call_claude(
            system="Write clear, concise recruiter summaries. Respond with valid JSON.",
            user=f"""Write a recruiter summary for this candidate.

Score: {input_data.score}/100
Strengths: {input_data.strengths}
Gaps: {input_data.gaps}

Determine: recommendation ("advance" if score >= 70, "hold" if 50-69, "reject" if below 50)
Write a 2-3 sentence summary explaining the recommendation in plain language.

Respond with JSON: {{ "recommendation": "...", "summary": "..." }}""",
        )
        data = json.loads(raw)
        return RecruiterSummary(
            scored=input_data,
            recommendation=data["recommendation"],
            summary=data["summary"],
        )

Wiring the pipeline

python
def screen_candidate(cv_text: str) -> RecruiterSummary:
    ctx = PipelineContext()

    extractor = ExtractionAgent("extractor")
    scorer = ScoringAgent("scorer")
    summariser = SummaryAgent("summariser")

    cv = CVData(raw_text=cv_text)
    profile = extractor.run(cv, ctx)
    scored = scorer.run(profile, ctx)
    summary = summariser.run(scored, ctx)

    # ctx.trace has a full audit log of every step
    return summary


# Usage
result = screen_candidate(open("cv.txt").read())
print(result.recommendation)   # "advance"
print(result.summary)          # "Alice has 6 years of Python experience..."

Parallel agents

When steps are independent, run them in parallel using asyncio.gather:

python
import asyncio

async def run_parallel_checks(cv_text: str) -> dict:
    async def run_agent(agent, input_data, ctx):
        return await asyncio.to_thread(agent.run, input_data, ctx)

    ctx = PipelineContext()
    cv = CVData(raw_text=cv_text)

    # Run technical and culture fit checks in parallel
    tech_agent = TechnicalFitAgent("tech")
    culture_agent = CultureFitAgent("culture")

    tech_result, culture_result = await asyncio.gather(
        run_agent(tech_agent, cv, ctx),
        run_agent(culture_agent, cv, ctx),
    )

    return {"technical": tech_result, "culture": culture_result, "trace": ctx.trace}

What makes a good agent boundary

The single most important decision in pipeline design is where to draw agent boundaries. Good boundaries have:

  • A clear, testable output contract: The output is structured (JSON, dataclass) and can be validated before passing to the next step
  • A single responsibility: The agent does one thing — extract, score, summarise, validate, enrich
  • Minimal context requirements: The agent should not need to know what came before it in the pipeline

Bad agent boundaries: splitting a single coherent reasoning task in half, creating agents so narrow they need constant back-and-forth with neighbours, or putting validation logic in multiple agents.

The pipeline above screens hundreds of CVs daily in production. Each agent runs in 2-4 seconds. The extraction agent was the first one we improved — because it had a clear input (raw CV text), a clear output (structured JSON), and a clear failure mode (invalid JSON). That is the kind of problem you can iterate on quickly.

Share this
← All Posts9 min read