Carbonteq
Best Practices/LLM Apps

Overview

Overview of LLM system building blocks and architecture patterns for engineering teams

LLM Systems — Overview

Building Blocks

Each block is intentionally small and composable. You can wire them together in many ways, but the contracts below should remain stable.

Prompts
Defines the instructions and configuration for guiding a Large Language Model (LLM) to perform a specific task. The prompt system consists of key subcomponents that work together to create effective model interactions.
Templates:Reusable prompt structures with placeholders
Model (+ provider):LLM selection and API configuration
Parameters:Temperature, top-p, max tokens, stop sequences, tools, etc.
Context injection:Retrieved docs, memory, tool results integration

Key practices
  • Versioned prompt templates for consistent behavior
  • Parameter packs per mode (drafting vs. finalizing)
  • A/B slots for testing different approaches
Gotchas
  • Silent prompt drift can cause unexpected behavior changes
  • Keep a prompt registry with IDs + changelog to track modifications
Agents
Model‑driven decision making across steps; picks tools, plans, or next actions. The agent architecture includes interconnected subcomponents that enable intelligent decision-making and tool interaction.
Tools:External service integration and standardized connections
Processors:Input/output shaping, slot filling, state projection, answer shaping, schema mapping, citation merge
Instructions:Agent behavior guidelines and task specifications
Models:LLM selection and configuration
Parameters:Generation settings and tool configurations
Context:State management, memory, and conversation history
Pre/post hooks:Telemetry, feature flags, side‑effects
Error hooks:Error handling and recovery mechanisms

Key practices
  • Explicit tool schemas for clear contracts
  • Retry + backoff for external service calls
  • Idempotency on external effects
  • Compose processors as pure functions
  • Keep hooks for observability/side‑effects
  • Deterministic ordering (pre → decide → post → hooks)
Gotchas
  • Tool hallucinations require schema validation + post‑checks
  • Avoid business logic in hooks (keep them non‑blocking)
Tools
Extend agent capabilities with external functions and services. Tools enable agents to interact with external systems through standardized interfaces, providing function calling capabilities and API integration with proper validation.
Function calling:Direct function execution capabilities
API integration:External service connectivity and standardized connections
Schema validation:Input/output validation and type checking
Retry + backoff:Resilient error handling and retry mechanisms

Key practices
  • Explicit tool schemas for clear contracts
  • Retry + backoff for external service calls
  • Idempotency on external effects
  • Schema validation + post-checks
Gotchas

    LLM Guards (Pre & Post)
    Guard inputs to and outputs from LLMs. The guard system operates through subcomponents that ensure safe LLM interactions with proper validation and safety measures.
    Pre-guards:Moderation, PII/secret scrubbing, prompt-injection checks, policy filters
    Post-guards:Safety redaction, factuality gates, schema validation
    Policy engine:Versioned rule management and decision logic
    Logging system:Audit trail for allow/deny decisions

    Key practices
    • Treat guards as policies with versions
    • Log both allow/deny decisions for audit trails
    • Implement proper PII/secret scrubbing
    • Use factuality gates for output validation
    Gotchas
    • Guards can introduce latency if not optimized
    • Policy versioning is critical for consistent behavior
    Agent Networks
    Loop-based orchestration that routes work across multiple specialized agents using shared state and a router. Supports multi-model setups, deterministic or LLM routing, and explicit stop conditions.
    Agents:Specialized units; each can use different models and tools
    Router:Chooses next agent or stops; code-based, LLM, or hybrid
    State:Shared history + typed key-value data accessible to agents and router
    Model configuration:Network default model with per-agent overrides (multi-model)
    Termination & limits:Done criteria and max iterations to prevent runaway loops
    Default State:Optional initial/persisted state to seed routing & tools

    Key practices
    • Start with a code (deterministic) router; add LLM/hybrid only where needed
    • Always set maxIter and a clear “done” predicate
    • Keep State minimal and strongly typed; persist via defaultState only when useful
    • Log router decisions and results to debug routing errors and ping-pong loops
    • Test routers with simulated State snapshots (golden cases + edge cases)
    Gotchas
    • LLM routing may miss finish states → enforce maxIter/termination checks
    • No defaultModel + agents without models → network init error
    • State bloat can cause misrouting; scope and redact aggressively
    Evals
    Judge non-deterministic outputs for quality, safety, and usefulness. The evaluation system includes subcomponents that assess LLM outputs across different dimensions with both automated and manual validation.
    Rubric-based graders:LLM-powered evaluation using predefined criteria
    Code checks:Regex patterns, assertions, and automated validation
    Human review:Manual assessment and quality control
    Benchmark systems:Offline testing and regression detection
    Shadow/guardrails:Online monitoring and canary deployments

    Key practices
    • Use both offline and online evaluation strategies
    • Implement regression detection for quality control
    • Combine automated and human evaluation methods
    • Track scores and tags for fine-tuning data
    Gotchas
    • Evaluation criteria must be well-defined and consistent
    • Human review can be expensive and slow
    RAG (Retrieval-Augmented Generation)
    Attach relevant knowledge to a prompt. The RAG system includes interconnected subcomponents that retrieve and integrate relevant knowledge through semantic search and intelligent processing.
    Ingestion pipelines:Document chunking, parsing, and preprocessing
    Embeddings:Vector representations for semantic search
    Vector/graph stores:Knowledge storage and retrieval systems
    Re-rankers:Result refinement and relevance scoring
    Query processors:Query understanding and optimization

    Key practices
    • Document processing recipes for consistent chunking
    • Use chunk/section IDs for proper referencing
    • Include provenance tags for source tracking
    • Set freshness windows for content validity
    Gotchas
    • Over-chunking can reduce context quality
    • Missing metadata breaks provenance tracking
    • No evals on ingestion quality leads to poor results

    Memory
    Persist context that improves future turns or runs. The memory system includes subcomponents that manage different types of persistent context for both short-term and long-term storage.
    Conversation memory:Short-term context for current session
    User/profile memory:Long-term personalized context
    Global/org memory:Shared organizational knowledge
    Memory stores:Database, cache, and persistence layers
    Memory management:TTL, cleanup, and lifecycle management

    Key practices
    • Separate memory from knowledge corpora
    • Set appropriate TTLs for different memory types
    • Support subject-access/delete for compliance
    • Use proper scoping for memory isolation
    Gotchas
    • Memory can grow indefinitely without proper TTL management
    • Compliance requirements need proper data deletion support

    Operational Concerns

    Purpose: Manage the production lifecycle, monitoring, and maintenance of LLM systems.

    Operational concerns include subcomponents that ensure reliable production operation of LLM systems. Prompt management handles versions and rollouts, observability provides monitoring and logging, dataset management maintains data quality, while CI/CD, model selection, and deployment pipelines handle testing and releases.

    • Prompt management: IDs, versions, diffs, rollout plans
    • Observability: Traces, tool spans, token/cost, eval events, safety decisions
    • Dataset management: Gold data, negative examples, drift monitoring
    • Synthetic data: Controlled generation with templates + filters; never replace real labels
    • CI/CD: Prompt/tests/evals in the pipeline; gated releases; canary rolls
    • Model selection: Multi‑provider adapters; latency/cost/quality routing
    • Monitoring systems: Performance tracking and alerting
    • Deployment pipelines: Automated testing and release management

    Next Steps

    1. Set up your building blocks following the patterns and best practices outlined above
    2. Implement the core components (Prompts, Agents, RAG, Memory) based on your requirements
    3. Follow the key practices and avoid the common gotchas for each component