Best Practices/LLM Apps
Overview
Overview of LLM system building blocks and architecture patterns for engineering teams
LLM Systems — Overview
Building Blocks
Each block is intentionally small and composable. You can wire them together in many ways, but the contracts below should remain stable.
Prompts
Defines the instructions and configuration for guiding a Large Language Model (LLM) to perform a specific task. The prompt system consists of key subcomponents that work together to create effective model interactions.
Templates:Reusable prompt structures with placeholders
Model (+ provider):LLM selection and API configuration
Parameters:Temperature, top-p, max tokens, stop sequences, tools, etc.
Context injection:Retrieved docs, memory, tool results integration
Key practices
- Versioned prompt templates for consistent behavior
- Parameter packs per mode (drafting vs. finalizing)
- A/B slots for testing different approaches
Gotchas
- Silent prompt drift can cause unexpected behavior changes
- Keep a prompt registry with IDs + changelog to track modifications
Agents
Model‑driven decision making across steps; picks tools, plans, or next actions. The agent architecture includes interconnected subcomponents that enable intelligent decision-making and tool interaction.
Tools:External service integration and standardized connections
Processors:Input/output shaping, slot filling, state projection, answer shaping, schema mapping, citation merge
Instructions:Agent behavior guidelines and task specifications
Models:LLM selection and configuration
Parameters:Generation settings and tool configurations
Context:State management, memory, and conversation history
Pre/post hooks:Telemetry, feature flags, side‑effects
Error hooks:Error handling and recovery mechanisms
Key practices
- Explicit tool schemas for clear contracts
- Retry + backoff for external service calls
- Idempotency on external effects
- Compose processors as pure functions
- Keep hooks for observability/side‑effects
- Deterministic ordering (pre → decide → post → hooks)
Gotchas
- Tool hallucinations require schema validation + post‑checks
- Avoid business logic in hooks (keep them non‑blocking)
Tools
Extend agent capabilities with external functions and services. Tools enable agents to interact with external systems through standardized interfaces, providing function calling capabilities and API integration with proper validation.
Function calling:Direct function execution capabilities
API integration:External service connectivity and standardized connections
Schema validation:Input/output validation and type checking
Retry + backoff:Resilient error handling and retry mechanisms
Key practices
- Explicit tool schemas for clear contracts
- Retry + backoff for external service calls
- Idempotency on external effects
- Schema validation + post-checks
Gotchas
LLM Guards (Pre & Post)
Guard inputs to and outputs from LLMs. The guard system operates through subcomponents that ensure safe LLM interactions with proper validation and safety measures.
Pre-guards:Moderation, PII/secret scrubbing, prompt-injection checks, policy filters
Post-guards:Safety redaction, factuality gates, schema validation
Policy engine:Versioned rule management and decision logic
Logging system:Audit trail for allow/deny decisions
Key practices
- Treat guards as policies with versions
- Log both allow/deny decisions for audit trails
- Implement proper PII/secret scrubbing
- Use factuality gates for output validation
Gotchas
- Guards can introduce latency if not optimized
- Policy versioning is critical for consistent behavior
Agent Networks
Loop-based orchestration that routes work across multiple specialized agents using shared state and a router. Supports multi-model setups, deterministic or LLM routing, and explicit stop conditions.
Agents:Specialized units; each can use different models and tools
Router:Chooses next agent or stops; code-based, LLM, or hybrid
State:Shared history + typed key-value data accessible to agents and router
Model configuration:Network default model with per-agent overrides (multi-model)
Termination & limits:Done criteria and max iterations to prevent runaway loops
Default State:Optional initial/persisted state to seed routing & tools
Key practices
- Start with a code (deterministic) router; add LLM/hybrid only where needed
- Always set maxIter and a clear “done” predicate
- Keep State minimal and strongly typed; persist via defaultState only when useful
- Log router decisions and results to debug routing errors and ping-pong loops
- Test routers with simulated State snapshots (golden cases + edge cases)
Gotchas
- LLM routing may miss finish states → enforce maxIter/termination checks
- No defaultModel + agents without models → network init error
- State bloat can cause misrouting; scope and redact aggressively
Evals
Judge non-deterministic outputs for quality, safety, and usefulness. The evaluation system includes subcomponents that assess LLM outputs across different dimensions with both automated and manual validation.
Rubric-based graders:LLM-powered evaluation using predefined criteria
Code checks:Regex patterns, assertions, and automated validation
Human review:Manual assessment and quality control
Benchmark systems:Offline testing and regression detection
Shadow/guardrails:Online monitoring and canary deployments
Key practices
- Use both offline and online evaluation strategies
- Implement regression detection for quality control
- Combine automated and human evaluation methods
- Track scores and tags for fine-tuning data
Gotchas
- Evaluation criteria must be well-defined and consistent
- Human review can be expensive and slow
RAG (Retrieval-Augmented Generation)
Attach relevant knowledge to a prompt. The RAG system includes interconnected subcomponents that retrieve and integrate relevant knowledge through semantic search and intelligent processing.
Ingestion pipelines:Document chunking, parsing, and preprocessing
Embeddings:Vector representations for semantic search
Vector/graph stores:Knowledge storage and retrieval systems
Re-rankers:Result refinement and relevance scoring
Query processors:Query understanding and optimization
Key practices
- Document processing recipes for consistent chunking
- Use chunk/section IDs for proper referencing
- Include provenance tags for source tracking
- Set freshness windows for content validity
Gotchas
- Over-chunking can reduce context quality
- Missing metadata breaks provenance tracking
- No evals on ingestion quality leads to poor results
Memory
Persist context that improves future turns or runs. The memory system includes subcomponents that manage different types of persistent context for both short-term and long-term storage.
Conversation memory:Short-term context for current session
User/profile memory:Long-term personalized context
Global/org memory:Shared organizational knowledge
Memory stores:Database, cache, and persistence layers
Memory management:TTL, cleanup, and lifecycle management
Key practices
- Separate memory from knowledge corpora
- Set appropriate TTLs for different memory types
- Support subject-access/delete for compliance
- Use proper scoping for memory isolation
Gotchas
- Memory can grow indefinitely without proper TTL management
- Compliance requirements need proper data deletion support
Operational Concerns
Purpose: Manage the production lifecycle, monitoring, and maintenance of LLM systems.
Operational concerns include subcomponents that ensure reliable production operation of LLM systems. Prompt management handles versions and rollouts, observability provides monitoring and logging, dataset management maintains data quality, while CI/CD, model selection, and deployment pipelines handle testing and releases.
- Prompt management: IDs, versions, diffs, rollout plans
- Observability: Traces, tool spans, token/cost, eval events, safety decisions
- Dataset management: Gold data, negative examples, drift monitoring
- Synthetic data: Controlled generation with templates + filters; never replace real labels
- CI/CD: Prompt/tests/evals in the pipeline; gated releases; canary rolls
- Model selection: Multi‑provider adapters; latency/cost/quality routing
- Monitoring systems: Performance tracking and alerting
- Deployment pipelines: Automated testing and release management
Next Steps
- Set up your building blocks following the patterns and best practices outlined above
- Implement the core components (Prompts, Agents, RAG, Memory) based on your requirements
- Follow the key practices and avoid the common gotchas for each component