Production Agent Patterns

Gap analysis of three production agentic system sources against our evaluation pipeline, with scope decisions for Sprint 2+.

Sources¶

Building Agentic Applications (PydanticAI)
The Logs I Never Read (Pydantic/Logfire)
Effective Harnesses for Long-Running Agents (Anthropic)

Gap Matrix¶

Principle	Source	Status	Decision
Framework-based approach	PydanticAI	✅ Done	PydanticAI stays
Type-safe structured outputs	PydanticAI	✅ Done	Enhance via plugin
Layered deployment	PydanticAI	⚠️ CLI only	Sprint 2: FastAPI+MCP
VCR-based testing	PydanticAI	❌ Missing	Deferred: @patch ok
Model settings for determinism	PydanticAI	⚠️ Partial	Sprint 2: expose
Structured queryable logs	Logfire	⚠️ loguru	Opik primary
AI-queryable observability	Logfire	❌ Missing	Sprint 3: MCP
Incremental boundaries	Anthropic	✅ Done	Ralph loop
State management	Anthropic	✅ Done	prd.json + git
Checkpointing	Anthropic	✅ Done	Git commits
Error recovery	Anthropic	✅ Done	git revert
Human-in-the-loop	Anthropic	✅ Done	Ralph approval

Scope Decisions¶

Sprint 2: FastAPI + MCP (Feature 10)¶

Multi-channel access prevents rearchitecture later:

CLI - Developer-facing (exists)
Streamlit UI - Interactive exploration (exists, no redesign)
FastAPI REST - CI/CD integration (new)
MCP Server - AI-to-AI workflows (new)

Opik Primary, Logfire Optional¶

Opik already covers agent tracing, LLM tracking, cost monitoring, evaluation metrics. Logfire adds incremental value (app-level logs, HTTP tracing) but creates hard dependency on Pydantic ecosystem. Keep optional/fallback.

Deferred: VCR + Browser E2E¶

VCR: @patch mocking works for current test suite. VCR adds dependency without proportional benefit.

Browser E2E: Streamlit UI is secondary interface. API E2E tests via pytest + httpx provide sufficient coverage. Playwright/Selenium deferred to Sprint 4+.

Sprint 3+ Candidates¶

Priority	Feature	Prerequisite
High	Container-based deployment	Feature 10 (FastAPI) stable
Medium	MCP observability server	Opik trace API access
Medium	Logfire integration	Optional, alongside Opik
Low	VCR testing	None
Low	Browser E2E tests	Streamlit UI importance increases

Key Findings¶

Ralph loop already matches Anthropic best practices - documented in ralph/README.md
Deployment flexibility is the primary gap - addressed by Feature 10 (FastAPI + MCP)
Observability is sufficient - Opik covers needs; Logfire is incremental
Testing is appropriate - E2E integration tests (not browser) added to Sprint 2