Overview of LLM system building blocks and architecture patterns for engineering teams

LLM Systems — Overview

Building Blocks

Each block is intentionally small and composable. You can wire them together in many ways, but the contracts below should remain stable.

Prompts

Defines the instructions and configuration for guiding a Large Language Model (LLM) to perform a specific task. The prompt system consists of key subcomponents that work together to create effective model interactions.

Templates:Reusable prompt structures with placeholders

Model (+ provider):LLM selection and API configuration

Parameters:Temperature, top-p, max tokens, stop sequences, tools, etc.

Context injection:Retrieved docs, memory, tool results integration

Key practices

Versioned prompt templates for consistent behavior
Parameter packs per mode (drafting vs. finalizing)
A/B slots for testing different approaches

Gotchas

Silent prompt drift can cause unexpected behavior changes
Keep a prompt registry with IDs + changelog to track modifications

Agents

Model‑driven decision making across steps; picks tools, plans, or next actions. The agent architecture includes interconnected subcomponents that enable intelligent decision-making and tool interaction.

Tools:External service integration and standardized connections

Processors:Input/output shaping, slot filling, state projection, answer shaping, schema mapping, citation merge

Instructions:Agent behavior guidelines and task specifications

Models:LLM selection and configuration

Parameters:Generation settings and tool configurations

Context:State management, memory, and conversation history

Pre/post hooks:Telemetry, feature flags, side‑effects

Error hooks:Error handling and recovery mechanisms

Key practices

Explicit tool schemas for clear contracts
Retry + backoff for external service calls
Idempotency on external effects
Compose processors as pure functions
Keep hooks for observability/side‑effects
Deterministic ordering (pre → decide → post → hooks)

Gotchas

Tool hallucinations require schema validation + post‑checks
Avoid business logic in hooks (keep them non‑blocking)

Tools

Extend agent capabilities with external functions and services. Tools enable agents to interact with external systems through standardized interfaces, providing function calling capabilities and API integration with proper validation.

Function calling:Direct function execution capabilities

API integration:External service connectivity and standardized connections

Schema validation:Input/output validation and type checking

Retry + backoff:Resilient error handling and retry mechanisms

Key practices

Explicit tool schemas for clear contracts
Retry + backoff for external service calls
Idempotency on external effects
Schema validation + post-checks

Gotchas

LLM Guards (Pre & Post)

Guard inputs to and outputs from LLMs. The guard system operates through subcomponents that ensure safe LLM interactions with proper validation and safety measures.

Pre-guards:Moderation, PII/secret scrubbing, prompt-injection checks, policy filters

Post-guards:Safety redaction, factuality gates, schema validation

Policy engine:Versioned rule management and decision logic

Logging system:Audit trail for allow/deny decisions

Key practices

Treat guards as policies with versions
Log both allow/deny decisions for audit trails
Implement proper PII/secret scrubbing
Use factuality gates for output validation

Gotchas

Guards can introduce latency if not optimized
Policy versioning is critical for consistent behavior

Agent Networks

Loop-based orchestration that routes work across multiple specialized agents using shared state and a router. Supports multi-model setups, deterministic or LLM routing, and explicit stop conditions.

Agents:Specialized units; each can use different models and tools

Router:Chooses next agent or stops; code-based, LLM, or hybrid

State:Shared history + typed key-value data accessible to agents and router

Model configuration:Network default model with per-agent overrides (multi-model)

Termination & limits:Done criteria and max iterations to prevent runaway loops

Default State:Optional initial/persisted state to seed routing & tools

Key practices

Start with a code (deterministic) router; add LLM/hybrid only where needed
Always set maxIter and a clear “done” predicate
Keep State minimal and strongly typed; persist via defaultState only when useful
Log router decisions and results to debug routing errors and ping-pong loops
Test routers with simulated State snapshots (golden cases + edge cases)

Gotchas

LLM routing may miss finish states → enforce maxIter/termination checks
No defaultModel + agents without models → network init error
State bloat can cause misrouting; scope and redact aggressively

Evals

Judge non-deterministic outputs for quality, safety, and usefulness. The evaluation system includes subcomponents that assess LLM outputs across different dimensions with both automated and manual validation.

Rubric-based graders:LLM-powered evaluation using predefined criteria

Code checks:Regex patterns, assertions, and automated validation

Human review:Manual assessment and quality control

Benchmark systems:Offline testing and regression detection

Shadow/guardrails:Online monitoring and canary deployments

Key practices

Use both offline and online evaluation strategies
Implement regression detection for quality control
Combine automated and human evaluation methods
Track scores and tags for fine-tuning data

Gotchas

Evaluation criteria must be well-defined and consistent
Human review can be expensive and slow

RAG (Retrieval-Augmented Generation)

Attach relevant knowledge to a prompt. The RAG system includes interconnected subcomponents that retrieve and integrate relevant knowledge through semantic search and intelligent processing.

Ingestion pipelines:Document chunking, parsing, and preprocessing

Embeddings:Vector representations for semantic search

Vector/graph stores:Knowledge storage and retrieval systems

Re-rankers:Result refinement and relevance scoring

Query processors:Query understanding and optimization

Key practices

Document processing recipes for consistent chunking
Use chunk/section IDs for proper referencing
Include provenance tags for source tracking
Set freshness windows for content validity

Gotchas

Over-chunking can reduce context quality
Missing metadata breaks provenance tracking
No evals on ingestion quality leads to poor results

Memory

Persist context that improves future turns or runs. The memory system includes subcomponents that manage different types of persistent context for both short-term and long-term storage.

Conversation memory:Short-term context for current session

User/profile memory:Long-term personalized context

Global/org memory:Shared organizational knowledge

Memory stores:Database, cache, and persistence layers

Memory management:TTL, cleanup, and lifecycle management

Key practices

Separate memory from knowledge corpora
Set appropriate TTLs for different memory types
Support subject-access/delete for compliance
Use proper scoping for memory isolation

Gotchas

Memory can grow indefinitely without proper TTL management
Compliance requirements need proper data deletion support

Operational Concerns

Purpose: Manage the production lifecycle, monitoring, and maintenance of LLM systems.

Operational concerns include subcomponents that ensure reliable production operation of LLM systems. Prompt management handles versions and rollouts, observability provides monitoring and logging, dataset management maintains data quality, while CI/CD, model selection, and deployment pipelines handle testing and releases.

Prompt management: IDs, versions, diffs, rollout plans
Observability: Traces, tool spans, token/cost, eval events, safety decisions
Dataset management: Gold data, negative examples, drift monitoring
Synthetic data: Controlled generation with templates + filters; never replace real labels
CI/CD: Prompt/tests/evals in the pipeline; gated releases; canary rolls
Model selection: Multi‑provider adapters; latency/cost/quality routing
Monitoring systems: Performance tracking and alerting
Deployment pipelines: Automated testing and release management

Next Steps

Set up your building blocks following the patterns and best practices outlined above
Implement the core components (Prompts, Agents, RAG, Memory) based on your requirements
Follow the key practices and avoid the common gotchas for each component

Overview

LLM Systems — Overview

Building Blocks

Key practices

Gotchas

Key practices

Gotchas

Key practices

Gotchas

Local ToolsSame-process execution

MCP ToolsStandardized protocol integration

Key practices

Gotchas

Key practices

Gotchas

Key practices

Gotchas

Key practices

Gotchas

Memory RAGUser context integration

Agentic RAGAI-driven retrieval strategies

Graph RAGKnowledge graph relationships

Key practices

Gotchas

Operational Concerns

Next Steps

On this page