Project Summary
AutoFlow accepts structured inquiries through a versioned REST webhook, executes a deterministic LangGraph workflow backed by Ollama for classification and agent steps, persists outcomes to PostgreSQL, caches volatile run state in Redis, and exposes a Next.js control center for health, history, submission, and optional WebSocket observation. Inference is local-first — no hosted LLM APIs required for core processing.
Technical deep dive
AutoFlow is a production-oriented, multi-agent inquiry automation platform designed for organizations that need fast intake, deterministic orchestration, and auditable outcomes rather than black-box prompt chains. The system accepts structured inbound inquiries over a versioned FastAPI webhook, acknowledges immediately with a durable `run_id`, and executes LangGraph orchestration asynchronously using local-first Ollama inference for classification and specialist behavior. Redis provides low-latency run state for live status and operator timelines, while PostgreSQL remains the durable system of record for compliance, analytics, and post-incident reconstruction. This split architecture is not accidental: it cleanly separates responsiveness from durability, and experimentation from governance. AutoFlow therefore serves two goals at once: excellent real-time operator experience and long-horizon operational correctness. For teams searching terms such as enterprise AI workflow orchestration, FastAPI LangGraph architecture, local LLM automation, or auditable multi-agent systems, AutoFlow demonstrates a practical reference pattern that can scale from pilot to production without discarding its original contracts.
Architectural goals
- Enforce separation of concerns so API routing, orchestration logic, domain tools, state management, and UI concerns can evolve independently with focused tests.
- Guarantee deterministic intent transitions by encoding routing as explicit graph nodes and conditional edges, not hidden prompt-side branching.
- Prefer local-first inference through Ollama to improve data-residency posture, reduce external dependency concentration, and preserve cost predictability.
- Split operational state into hot and durable planes: Redis for low-latency visibility and PostgreSQL for canonical historical truth.
- Acknowledge webhook traffic immediately with stable `run_id` correlation so upstream systems never block on downstream model latency.
- Apply progressive ingress hardening through optional API keys, idempotency windows, rate limits, origin controls, and admin-scoped deletion.
- Persist confidence, escalation rationale, and step-level chronology so outcomes remain inspectable by engineering, security, and operations.
- Allow scale evolution from single process to worker-queue architectures without changing edge API contracts.
- Unify observability by linking request IDs, run IDs, graph-node events, Redis snapshots, and database records.
- Retain architecture legibility so external reviewers can reason about failure domains and control points without reverse-engineering prompts.
- Make extension safe by keeping integration concerns isolated from orchestration logic and state semantics.
System context
The context view describes a deliberately narrow blast radius between external demand and internal execution. Integrations and forms only need one stable contract (`POST /api/v1/webhook`) to submit business inquiries, while operators use a separate Next.js control center for run health, timeline inspection, and historical review. FastAPI operates as the control plane at ingress: it validates payloads, enforces optional API security and idempotency policy, records initial state, and starts asynchronous graph execution. Downstream concerns are separated by purpose. PostgreSQL is the durable truth layer for final status, agent decisions, and reporting. Redis is the hot path for low-latency run state and step streams. Ollama is the local-first model substrate used by classification and specialist nodes. The resulting boundary map is intentionally SEO-relevant to enterprise buyers evaluating FastAPI microservice architecture, LangGraph production design, Redis plus Postgres dual-state patterns, and private LLM inference deployments. AutoFlow succeeds because each dependency has one clear reason to exist, and those reasons remain valid even as traffic, policy, and compliance requirements evolve.
Logical architecture layers
| Layer | Responsibility | Primary artifacts |
|---|---|---|
| Presentation | Operator workflows, manual submissions, run visualization, historical inspection, and client interactions. | frontend/ (Next.js 14, React 18, TypeScript, Tailwind, Recharts) |
| API/edge | Routing, validation, CORS, request IDs, auth checks, error mapping, and ingress throttling. | app/main.py, app/routers/*, app/middleware/request_id.py, app/errors.py, app/limiter.py |
| Orchestration | Intent classification, deterministic branching, specialist execution, escalation routing, and final synthesis. | app/agents/orchestrator.py, app/agents/*_agent.py, app/services/graph_execution.py |
| Domain tools | Knowledge retrieval, CRM enrichment, policy-aware support helpers, and outbound communication adapters. | app/tools/*.py |
| Integration | Model transport abstraction, timeout/error normalization, and local LLM interoperability. | app/utils/ollama_client.py |
| Hot state | TTL-managed run snapshots, step streams, idempotency maps, and optional shared rate-limit counters. | app/memory/redis_memory.py |
| Durable state | Canonical run persistence, agent step audit records, metadata versioning, and historical query support. | app/db/* (SQLAlchemy async + asyncpg) |
This layered model is the maintainability backbone of AutoFlow. In many AI systems, architecture erosion starts when route handlers absorb workflow logic, tool modules write persistence side effects directly, and UI assumptions leak into backend state semantics. AutoFlow avoids that drift by making responsibility explicit. API/edge owns ingress and contract discipline. Orchestration owns decision flow. Domain tools expose bounded actions without controlling lifecycle. Integration adapters isolate transport details for model calls. Hot and durable state encode speed-versus-governance tradeoffs as first-class design decisions. Presentation stays a contract consumer instead of a hidden backend dependency. For technical leadership, this separation translates to lower blast radius, safer upgrades, and clearer ownership boundaries across platform, AI, and frontend teams.
Logical flow diagram
The flow emphasizes that synchronous user contracts and asynchronous compute are intentionally decoupled. Webhook submission initializes run state and returns quickly, while `graph_execution.process_inquiry_run` performs expensive orchestration in background execution. Status and steps are served through Redis-first reads for speed, with durable fallback to PostgreSQL for historical truth. The WebSocket path is optional but valuable for operator ergonomics where real-time completion notifications reduce manual polling pressure. This architecture aligns with high-intent design goals common in enterprise automation: deterministic lifecycle semantics, bounded latency at ingress, and transparent eventual consistency between hot and durable stores.
LangGraph orchestration
- `classify_intent`: structured intent inference maps incoming payloads to the routing taxonomy and records confidence signals.
- `route_by_intent`: deterministic edge logic maps intents to FAQ, lead, or support specialist paths.
- `faq_node`, `lead_node`, and `support_node`: specialist nodes produce domain-specific enrichment and recommended actions.
- `route_after_agent`: post-specialist gate checks escalation flags and forwards to handoff or synthesis.
- `handoff_node`: emits operator-ready escalation context for high-risk, low-confidence, or policy-sensitive cases.
- `synthesize_response_node`: composes final customer-facing responses when escalation is unnecessary.
- `log_audit`: terminal state bookkeeping persists node-level chronology and final lifecycle artifacts.
- `thread_id=run_id`: execution correlation guarantees run-level traceability across retries, logs, and state stores.
The orchestration layer remains stable because state is explicit and typed. `AgentState` is implemented as a `TypedDict` describing shared lifecycle fields such as message history, inferred intent, confidence, escalation flags, resolution draft text, and specialist step chronology. This is a reliability control, not style preference. Typed state constrains accidental key drift between nodes, keeps serialization behavior predictable across Redis and PostgreSQL writes, and protects downstream analytics against silent schema mutation. The `messages` field uses `Annotated[list, add_messages]`, aligning with LangGraph merge semantics so node deltas compose correctly instead of overwriting prior context. Operationally, this produces reproducible debugging: teams can localize failures to classification quality, specialist reasoning, escalation policy, or synthesis behavior without reverse-engineering opaque prompt traces.
A critical caveat is checkpoint durability. The reference topology compiles LangGraph with `MemorySaver`, which is appropriate for local development, architecture demos, and single-process environments where restart boundaries are controlled. It is not a durable distributed checkpointer and should not be treated as one in multi-replica production systems. As throughput and reliability requirements increase, organizations should migrate checkpoint persistence to durable storage so replay semantics survive process crashes, autoscaling events, and cross-instance routing. AutoFlow intentionally preserves a clear seam for this migration, enabling teams to move from fast iteration to hardened operational replay guarantees without changing external API contracts.
Request lifecycle
- 1. Client submits a structured payload to `POST /api/v1/webhook`.
- 2. API applies schema validation plus optional API-key, idempotency, CORS, and rate-limit policy.
- 3. Ingress computes or reuses a stable `run_id` for traceability across every downstream artifact.
- 4. API writes initial `running` state into Redis for low-latency observability.
- 5. API persists initial run metadata to PostgreSQL for durable accountability.
- 6. API returns `200` immediately with `run_id` and `poll_url`, isolating callers from model latency.
- 7. `BackgroundTasks` dispatches `process_inquiry_run` for asynchronous graph execution.
- 8. LangGraph executes with `thread_id=run_id`, driving intent classification and specialist behavior through Ollama.
- 9. Terminal state, confidence, and step chronology are persisted back to PostgreSQL and Redis.
- 10. WebSocket subscribers receive completion notifications while REST clients poll status endpoints.
- 11. Historical queries and analytics consume durable records from PostgreSQL independent of Redis TTL behavior.
API surface
| Method | Path | Purpose |
|---|---|---|
| GET | /health | Aggregate health with `status` (`ok` or `degraded`) and dependency booleans for Ollama, Redis, and database ping. |
| GET | / | Minimal service discovery payload. |
| POST | /api/v1/webhook | Ingest inquiry; optional `X-API-Key` and `Idempotency-Key`; rate-limited; returns `WebhookResponse`. |
| GET | /api/v1/status/{run_id} | Poll run status (`RunStatus`) using Redis-first reads with durable fallback. |
| GET | /api/v1/status/{run_id}/steps | Retrieve step chronology for the run, primarily from Redis. |
| GET | /api/v1/ws/{run_id} | Live updates over WebSocket; requires `?token=` when webhook key auth is configured. |
| GET | /api/v1/runs | Recent runs (`RunListItem[]`). |
| GET | /api/v1/runs/{run_id} | Full `RunStatus` from DB. |
| DELETE | /api/v1/runs/{run_id} | Delete DB row and Redis keys; optional `X-Admin-Key` when admin key policy is active. |
| GET | /docs | Interactive OpenAPI contract for integration and review workflows. |
A compact API matrix is a strategic control point. It reduces integration ambiguity, improves contract governance, and keeps versioning overhead manageable as teams add capabilities. For reviewers, explicit method-path-purpose mapping accelerates threat modeling, test-case generation, and release approvals because ingress behavior is easy to audit. For client developers, a clear endpoint taxonomy prevents accidental coupling and simplifies failure handling: intake routes differ from status routes, and historical retrieval is cleanly separated from live subscription channels. The `/docs` contract further supports generated clients and policy review in CI pipelines.
Security model
| Mechanism | Configuration | Behavior |
|---|---|---|
| Webhook auth | WEBHOOK_API_KEY | When set, webhook ingestion requires matching `X-API-Key` on submission requests. |
| WebSocket auth | Same secret | Live subscribers pass `?token=` query parameter when webhook auth is enabled. |
| Admin delete | AUTOFLOW_ADMIN_API_KEY | When set, destructive delete operations require `X-Admin-Key`. |
| Rate limiting | WEBHOOK_RATE_LIMIT, RATE_LIMIT_STORAGE | Per-client-IP throttle at ingress; response headers enabled; optional Redis shared counters. |
| Idempotency | IDEMPOTENCY_TTL_SECONDS | Header `Idempotency-Key` deduplicates retries to the same `run_id` inside configured TTL. |
| CORS | CORS_ORIGINS | Explicit origin allowlist for browser clients; wildcard use should stay local-only. |
| Error disclosure | APP_ENV | Production 500 payloads expose generic detail plus `request_id` for safe support correlation. |
Security posture in AutoFlow is progressive by design. Teams can start with key-based ingress protection and idempotency, then evolve toward stronger identity, network segmentation, and data-governance controls as risk tolerance tightens. This sequencing is operationally realistic: it preserves delivery velocity while establishing enforceable checkpoints for compliance and external assurance reviews.
Data and Redis keys
| Redis key pattern | Purpose | Retention behavior | Primary consumers |
|---|---|---|---|
| autoflow:run:{run_id} | JSON snapshot of in-flight and terminal run state; messages normalized for JSON-safe serialization. | TTL-managed hot-state object refreshed at lifecycle transitions. | Status polling route and operator live-run views. |
| autoflow:steps:{run_id} | Append-only chronology of orchestration and specialist steps. | Ephemeral short-horizon stream for operational diagnostics. | Timeline UI and `/api/v1/status/{run_id}/steps` reads. |
| autoflow:idempotency:{sha256} | Maps idempotency fingerprint hashes to canonical `run_id` values. | TTL-bounded dedupe window preventing duplicate execution under retries. | Webhook retry control plane and client resiliency logic. |
AutoFlow intentionally uses an eventual-consistency model tuned for both operator responsiveness and governance durability. Intake writes initial state to Redis for immediate visibility, then writes baseline run metadata to PostgreSQL for canonical traceability. Completion paths update both planes with terminal status and step chronology. Status reads are Redis-first for speed but can fall back to PostgreSQL when keys expire or when durable truth is required for reporting and audits. This pattern delivers a responsive UX without sacrificing compliance posture, but it requires active management: teams should monitor cache eviction, DB write latency, and divergence windows between state planes, then codify reconciliation runbooks for degraded dependency scenarios.
Observability
- Propagate `X-Request-ID` through responses and logs for fast client-to-backend incident correlation.
- Log in structured UTC format with stable logger semantics to simplify centralized ingestion and alert routing.
- Emit dependency-aware health semantics so Redis and database failures can signal degraded status explicitly.
- Instrument graph-node timing and model-call latency via OpenTelemetry spans for bottleneck isolation.
- Track intent confidence and escalation rates over time to detect routing drift and threshold instability.
- Measure webhook p50/p95/p99 latency, end-to-end completion duration, and per-intent error profiles for SLO design.
- Run synthetic probes against health and representative webhook payloads to catch regressions early.
- Correlate lifecycle artifacts by `run_id` across Redis snapshots, Postgres rows, and WebSocket completion events.
- Capture model-version and workflow-version labels per run for release-aware performance and quality analysis.
- Create quality dashboards linking route class, escalation outcome, and operator intervention rates.
Deployment topologies
| Topology | Fit | Caveats |
|---|---|---|
| Single container / single VM | Ideal for demos, architecture reviews, and low-throughput pilots where simplicity is prioritized. | In-memory WebSocket registry and `MemorySaver` checkpoints are process-local and non-durable. |
| API replicas + shared Redis + Postgres | Supports higher ingress throughput and improved edge availability for production traffic. | Needs sticky sessions or distributed socket fan-out and a durable checkpoint strategy for replay correctness. |
| Worker queue architecture | Best for burst absorption, controlled retries, and orchestration isolation under sustained load. | Adds worker lifecycle, DLQ handling, and operational complexity beyond in-process `BackgroundTasks`. |
| Compose stack for local teams | Fast onboarding path for reproducible local infrastructure and cross-team demos. | Host-network assumptions are Linux-oriented; Windows or macOS commonly run API directly against Compose services. |
A mature rollout often follows a staged progression: begin with a single-node reference deployment to stabilize contracts and operator workflow, move to replicated API plus shared infrastructure when ingress volume increases, then isolate orchestration into queue-backed workers as completion latency and burst variance become dominant constraints. This progression preserves architectural continuity while introducing reliability controls only when they pay for themselves operationally.
Extension roadmap
- Implement signed webhook verification with key rotation and tenant-scoped credential governance.
- Move orchestration from `BackgroundTasks` to queue-backed workers with idempotent consumers and DLQ replay.
- Replace process-local `MemorySaver` with durable LangGraph checkpoint persistence and migration playbooks.
- Introduce distributed WebSocket fan-out using Redis Streams or pub/sub for multi-replica correctness.
- Deploy full OpenTelemetry tracing plus SLO dashboards for latency, availability, and quality outcomes.
- Establish offline evaluation harnesses for intent precision, escalation quality, and draft-response regression gates.
- Adopt OIDC and RBAC for control-center access and sensitive administrative API operations.
- Expand analytics to include conversion lift, escalation deflection, and time-to-resolution by confidence bands.
- Treat `MODEL_VERSION` and `WORKFLOW_VERSION` as first-class release artifacts in rollback policy.
- Add policy-driven PII controls such as selective redaction, retention windows, and field-level encryption.
- Introduce tenant-aware partitioning strategies for data, cache keys, and observability dimensions.
- Add contract tests and chaos drills that validate behavior under Redis or model-provider partial outages.
AutoFlow demonstrates that production-ready AI orchestration is less about clever prompts and more about disciplined systems design: explicit contracts, typed state, clear dependency boundaries, and measurable operational behavior. That is why the architecture remains resilient under both engineering scrutiny and SEO-intent evaluation for enterprise buyers researching practical multi-agent automation patterns.
Key Features & Capabilities
- Intent classification and conditional routing to FAQ, sales, and support specialist agents
- Explicit LangGraph topology with typed AgentState, escalation handoffs, and audit logging
- Split persistence: durable run records in PostgreSQL, TTL snapshots and step streams in Redis
- Ingress hardening with per-IP rate limits, optional API keys, idempotency keys, and structured errors
- Next.js control center with overview, submit, live run timeline, and Postgres-backed history tabs
- Health aggregates database, Redis, and Ollama status with graceful degradation semantics
Tech Stack & Components
Getting Started
1.Install backend dependencies
Python 3.11+, Ollama with llama3, and Docker for Redis + Postgres.
pip install -r requirements.txt
ollama pull llama3
cp .env.example .env
docker compose up redis postgres -d
uvicorn app.main:app --reload --host 127.0.0.1 --port 80002.Start the control center frontend
Set NEXT_PUBLIC_API_BASE if the API is not on localhost:8000.
cd frontend
npm install
npm run dev3.Run tests and lint
Requires Postgres and Redis matching your .env configuration.
pytest -q
ruff check app scripts testsFrequently asked questions
- What is AutoFlow and what problem does it solve?
- AutoFlow is a multi-agent inquiry automation reference implementation. It classifies inbound business inquiries (FAQ, sales, support), routes them through a LangGraph workflow with specialist agents, and persists auditable run records to PostgreSQL—with local Ollama inference and no hosted LLM API dependency for core processing.
- Does AutoFlow require OpenAI or other cloud LLM APIs?
- No. Core processing uses local Ollama (default model llama3). Integrations POST structured inquiries to a versioned REST webhook; classification and agent steps run against your Ollama endpoint. You can swap OllamaClient for another backend if needed.
- How does AutoFlow handle security at the API boundary?
- Optional X-API-Key on webhooks, matching WebSocket token auth, per-IP SlowAPI rate limits with optional Redis-backed counters, Idempotency-Key deduplication, admin-gated DELETE with AUTOFLOW_ADMIN_API_KEY, structured validation errors, and production-safe 500 bodies with request_id correlation.
- What is the difference between Redis and PostgreSQL in AutoFlow?
- PostgreSQL stores durable run records and agent_steps JSON for disputes and reporting. Redis holds TTL'd run snapshots, step streams, and idempotency maps for low-latency polling via GET /status/{run_id}. Status reads prefer Redis while running, then fall back to Postgres.
- Can AutoFlow scale horizontally?
- Single-process demos use in-memory WebSocket registries and LangGraph MemorySaver checkpoints. Multi-replica deployments need sticky sessions or Redis pub/sub for WebSockets, a durable LangGraph checkpointer (e.g. Postgres), and ideally queue-backed graph execution (Celery/RQ) instead of FastAPI BackgroundTasks.
