Project Summary
SentinelAI is a reference implementation of a real-time fraud scoring platform: a versioned REST + WebSocket service backed by PostgreSQL, tree-based ML (XGBoost) with SHAP explanations, an optional Isolation Forest cold-start path, PSI drift monitoring, and optional local Ollama narratives. The design target is teams needing an auditable, self-hosted scoring tier without routing inference or PII through third-party LLM APIs.
Technical deep dive
Executive summary
SentinelAI is a fraud decisioning architecture designed for organizations that need low-latency scoring, reproducible model behavior, and audit-ready evidence trails in the same operating model. The system is intentionally built around strict boundaries: transport concerns are isolated from scoring logic, model-serving logic is isolated from persistence and monitoring, and optional explanation enhancement is isolated from the deterministic decision path. This prevents accidental coupling between concerns that change at different velocities and under different risk controls. In practice, that means product teams can move quickly on client interfaces, data science teams can improve models under disciplined governance, and platform teams can harden reliability without introducing decision drift.
The platform adopts a conservative production posture where control failures are expensive. Feature contracts are immutable at inference time, transformation code runs in transform-only mode, and every prediction is stored with enough metadata to support replay and forensic analysis. Optional augmentation, including LLM narration for explanations, is treated as non-critical and cannot block the synchronous scoring response. This keeps decision behavior stable during partial outages and protects fraud operations from secondary-system failures. In environments where score outcomes influence payment authorization or account intervention, deterministic behavior under stress is more valuable than marginal feature convenience.
SentinelAI is also designed for multi-team ownership. Data scientists can publish model artifacts and calibration updates without rewriting API handlers. Backend engineers can evolve request contracts and service-level safeguards without touching model internals. SRE and security teams can enforce deployment, observability, and access controls around a stable scoring core. This architecture scales from a single process deployment to a distributed production topology with explicit evolution paths for alert distribution, metrics, artifact management, and governance workflows. The result is a production system that preserves scientific rigor while meeting enterprise uptime, security, and compliance requirements.
From a model risk perspective, SentinelAI operationalizes three principles: deterministic serving, explicit lineage, and measurable drift response. Deterministic serving ensures the same input semantics produce the same decision under a fixed model version. Explicit lineage ensures every score can be traced to the exact artifact bundle, schema version, and runtime policy that produced it. Measurable drift response ensures teams do not react to noisy signals or anecdotal incidents; they respond to structured evidence across PSI, decision-mix movement, latency patterns, and business outcomes. These principles reduce both false confidence and panic-driven interventions.
Executive summary design choice table
| Dimension | Design choice |
|---|---|
| Inference contract | Serve with immutable feature schema, ordered columns, and fixed transform semantics; no runtime refitting |
| Decision path isolation | Keep core score computation independent from optional explanation enrichment and external notification fan-out |
| Orchestration style | Use thin routers plus dependency-injected services to isolate protocol, business logic, and infrastructure concerns |
| Persistence policy | Store prediction, explanation, lineage identifiers, and correlation metadata as immutable audit records |
| Monitoring model | Treat latency, decision mix, and drift as first-class runtime signals with structured emission paths |
| Security model | Apply layered controls across auth, schema validation, rate limiting, transport protection, and error sanitization |
| Scale strategy | Maintain stateless scoring replicas and externalize shared state concerns to managed infrastructure components |
| Governance ergonomics | Version baselines, model bundles, and API contracts so investigations and rollbacks remain deterministic |
Layered architecture
The layered architecture is not an aesthetic preference; it is a control strategy for high-consequence decisions. Client-facing channels feed a transport layer that handles protocol concerns and request hygiene. The transport layer delegates to dependency-injected services that own deterministic scoring, explanation assembly, persistence writes, and telemetry emission. Data plane resources such as PostgreSQL and model artifacts remain behind explicit interfaces so runtime behavior is inspectable and testable. Optional capabilities, including local LLM narration, are attached at well-defined boundaries where timeouts and fallback behavior can be enforced without touching score correctness.
Dependency injection is the key control plane in-process. It turns implicit global state into explicit constructor dependencies and allows startup checks to fail fast before unsafe traffic is admitted. Artifact loading, schema compatibility checks, repository wiring, and telemetry sinks can all be validated during boot. This reduces partial-initialization risk and makes readiness semantics meaningful. In fraud systems, a process that is alive but not decision-safe should never appear healthy to an orchestrator. The layered architecture plus DI makes that distinction practical to enforce.
The data plane remains intentionally simple: PostgreSQL for durable decision history and model artifacts on disk (or mounted volume) for predictable startup behavior. This minimizes moving parts in the core path while preserving migration options. As traffic grows, artifact distribution can move to a registry-backed pull model and decision storage can scale behind pooling and replication, but the in-process contract remains stable. Because contract boundaries are explicit, evolutionary changes do not require retraining teams or rewriting the client ecosystem.
The architecture also encodes explicit degradation semantics. If database dependencies fail, readiness fails closed while liveness stays available for diagnosis. If optional explanation narration is delayed, deterministic explanation fallback still satisfies response contracts. If alert consumers lag, scoring and persistence continue while fan-out retries degrade independently. These rules prevent ambiguous behavior during incidents and make runbooks enforceable. Teams can reason about what the system guarantees under each failure mode, which is essential for business-critical fraud intervention workflows.
Component catalog
| Layer | Module | Responsibility |
|---|---|---|
| Client | API Gateway / BFF | Apply edge authentication, TLS termination, and coarse traffic shaping before forwarding API calls |
| Client | Streamlit Dashboard | Provide analyst-facing workflows for score review, transaction inspection, and case triage |
| Client | Alert consumers | Consume non-approved decision events for intervention, case generation, and workflow orchestration |
| Transport | FastAPI routers | Validate payloads, enforce response contracts, and map protocol semantics to service operations |
| Transport | Middleware stack | Handle CORS, rate limiting, request logging, and correlation context propagation |
| Application | Dependency injection wiring | Construct and bind repositories, services, trackers, and configuration for deterministic process startup |
| Application | PredictionService | Orchestrate feature generation, model inference, threshold mapping, and normalized decision objects |
| Application | FeatureService | Build model-ready feature vectors from transaction context under train-serve aligned semantics |
| Application | feature_engineering | Execute canonical transform-only feature engineering with no runtime fitting side effects |
| Application | ExplanationService | Generate deterministic rationale and optional narrative overlays under strict timeout and fallback policies |
| Application | AlertService | Broadcast REVIEW or BLOCKED events with retry-aware, failure-isolated delivery behavior |
| Persistence | TransactionRepository | Persist immutable decision records and expose retrieval APIs for operations and audit workflows |
| Monitoring | PerformanceTracker | Capture latency histograms, throughput counters, and decision-mix metrics for SLO tracking |
| Monitoring | Drift monitor | Compute PSI and related drift indicators against approved baseline feature distributions |
| Data plane | PostgreSQL | Store predictions, explanations, drift metadata, and lineage fields in durable relational storage |
| Data plane | Model artifacts on disk | Host model binaries, preprocessing state, and schema contracts loaded and validated at startup |
| Optional | Ollama LLM | Generate local explanation narration without requiring external LLM network egress |
| Operations | Runbook and incident policies | Define escalation, fallback, and rollback procedures tied to measurable service and model-health signals |
A cataloged architecture materially improves ownership clarity and incident response velocity. Engineers can map symptoms to modules quickly, define module-specific SLOs, and attach focused runbooks to each boundary. Governance and model risk teams can trace every control objective to concrete implementation components rather than abstract process documents. During post-incident reviews, this structure allows the team to separate root cause from propagation path and prioritize corrective actions that reduce future blast radius.
Scoring sequence
SentinelAI scoring follows a strict sequence of operations designed for reproducibility, bounded latency, and complete observability. The router validates payload structure and dispatches CPU-heavy work to a thread pool boundary so asynchronous request handling remains healthy under load. Feature construction and model inference produce a normalized prediction object that includes confidence and decision class metadata. Explanation generation follows with deterministic fallback behavior. Persistence and telemetry recording occur as first-class side effects, not optional hooks, ensuring every response corresponds to a durable operational record.
- Client submits `POST /predict` with a typed JSON transaction payload.
- Predict router validates request and forwards scoring work to a thread pool to protect event-loop responsiveness.
- Prediction service calls `FeatureService` to build model input under train-serve aligned transformation rules.
- `feature_engineering` applies transform-only logic and returns a fixed-order engineered feature row.
- Feature row is converted to model-ready ndarray input and scored by the loaded artifact bundle.
- Prediction service returns a normalized `PredictionResult` to the router with decision and confidence fields.
- Router invokes `ExplanationService` in a thread pool and obtains deterministic explanation text with fallback.
- Router persists prediction and explanation through `TransactionRepository` with model and schema lineage metadata.
- Router records latency and decision metrics through `PerformanceTracker` for SLO and risk-performance analysis.
- If decision class is not `APPROVED`, router emits alert events through `AlertService`.
- Router returns `PredictionWithExplanation` to the client under a stable API v1 response contract.
This sequence enables robust validation at multiple levels. Contract tests can verify schema and response invariants at the router boundary. Service tests can verify deterministic feature mapping and threshold logic with fixture-driven assertions. Integration tests can confirm persistence durability, alert branching behavior, and telemetry completeness. Because each stage is explicitly represented, latency budgets can be assigned per stage and optimized with precision rather than broad guesswork. In production, that translates into faster remediation and more credible performance commitments.
Operationally, the sequence supports clear semantics for partial failure handling. If explanation generation times out, deterministic fallback explanation text is returned and the score path remains intact. If alert delivery to consumers fails, the client still receives the decision while delivery retry and failure tracking occur in isolated paths. If database persistence fails, the request can be failed explicitly according to policy, preserving consistency between user-visible outcomes and system-of-record evidence. Explicit semantics are critical in fraud systems because silent side-effect loss can create governance and legal risk.
ML train-serve contract
The train-serve contract is the strongest predictor of long-term score integrity in production ML systems. SentinelAI enforces a strict contract between training outputs and serving inputs: fixed feature names, fixed ordering, explicit dtypes, stable categorical treatment, and frozen transformation behavior. Artifact bundles include both model state and schema metadata so contract checks are machine-verifiable at startup. This protects against silent regressions caused by pipeline drift, accidental feature reorder, implicit type coercion, and mutable preprocessing behavior.
Startup validation blocks readiness if any required artifact or schema constraint is missing or inconsistent. Inference-time feature engineering remains pure and side-effect free: no adaptive encoders, no online fit operations, and no schema mutation by request traffic. Predictions are persisted with model version and schema version identifiers so historical events can be replayed exactly with the original contract context. This replayability supports threshold recalibration analysis, disputed-decision review, and model risk governance obligations.
A strong contract also improves experimentation quality. Candidate models can run in shadow mode against live traffic with confidence that observed deltas reflect model behavior rather than serve-time transformation mismatch. Teams can evaluate lift and stability by segment, run phased threshold migration, and execute controlled champion-challenger transitions with deterministic rollback points. In high-churn fraud environments, this reduces both missed-fraud exposure and unnecessary customer friction caused by unstable rule changes.
- Package model binaries, preprocessors, and schema metadata as a single versioned artifact contract.
- Fail readiness when model artifact, schema lineage, or compatibility checks do not pass.
- Enforce transform-only inference with fixed ordering, dtype discipline, and deterministic mapping logic.
- Persist model and schema identifiers with each decision to support replay and audit reconstruction.
- Run shadow scoring and segment-level evaluation before promoting candidate models to primary traffic.
- Use explicit rollback policy keyed by model-version lineage rather than ad hoc hotfix changes.
Security table
Security in SentinelAI is implemented as layered controls mapped to concrete risks, not as an afterthought checklist. Controls are integrated into ingress, runtime, storage, and observability paths so secure behavior is the default operating mode. The objective is to reduce exploitability while preserving decision throughput and operational visibility.
| Control area | Implementation | Risk addressed |
|---|---|---|
| Authentication and authorization | Gateway token validation plus API key policy with route-level authorization constraints | Unauthorized scoring access and privilege abuse |
| Transport protection | TLS at ingress and optional mTLS for service-to-service paths | Traffic interception and request tampering |
| Abuse prevention | SlowAPI budgets with source-aware throttling and burst controls | DoS behavior, brute-force probing, and resource starvation |
| Payload integrity | Strict schema validation with constrained fields, enums, and type safety | Injection vectors and malformed payload exploitation |
| Error sanitization | Structured error envelopes with correlation IDs and hidden internal stack details | Reconnaissance via stack traces or sensitive diagnostics |
| Secrets discipline | External secret injection and scheduled rotation with no secrets in repository history | Credential leakage and long-lived secret exposure |
| Data minimization | Persist only decision-relevant fields and redact or hash sensitive identifiers when possible | PII over-retention and compliance exposure |
| Auditability | Immutable decision records with timestamping, lineage metadata, and trace correlation | Poor forensic traceability during incidents and disputes |
| Optional LLM isolation | Local Ollama runtime with optional disable flag and strict timeout boundaries | External data egress and residency-control violations |
| Supply chain hygiene | Pinned dependencies, vulnerability scanning, and signed release artifacts | Compromised package risk and unverified runtime binaries |
Health endpoints table
Health signaling separates process vitality from scoring readiness so orchestration can make safe routing decisions. Liveness should remain stable during transient dependency incidents, while readiness must fail when scoring invariants cannot be guaranteed. This distinction avoids accidental traffic admission to partially initialized instances.
| Endpoint | Purpose | Expected behavior |
|---|---|---|
| `GET /api/v1/health/live` | Liveness probe for process viability | Returns success while process is alive, independent of model artifact and database availability |
| `GET /api/v1/health/ready` | Readiness probe for safe traffic admission | Returns success only when scoring-critical dependencies and contracts pass startup/runtime checks |
| `GET /` | Root service metadata and smoke-test endpoint | Returns lightweight service metadata for ingress and deploy verification |
| `GET /api/v1/health/model` | Model and schema integrity diagnostics | Reports artifact load status and fails when model-contract elements are inconsistent |
| `GET /api/v1/health/db` | Database path diagnostics | Reports connectivity and basic query viability for targeted remediation |
| `GET /api/v1/health/dependencies` | Aggregated dependency diagnostics | Provides summarized dependency status for diagnostics while preserving error sanitization policy |
API v1 table
API v1 is the stable contract layer between SentinelAI and surrounding systems, including gateways, analyst interfaces, and operational tooling. The versioned surface includes scoring, retrieval, drift operations, and health diagnostics. REST endpoints provide durable request-response workflows, while WebSocket channels support near-real-time alert streaming for intervention processes that benefit from push semantics.
| Route | Method | Responsibility |
|---|---|---|
| `/api/v1/predict` | POST | Validate payload, execute deterministic scoring sequence, persist result, and conditionally emit alerts |
| `/api/v1/transactions` | GET | Return paginated decision history with filtering by time range, class, score, and metadata facets |
| `/api/v1/transactions/{id}` | GET | Return complete decision artifact including explanation text and lineage metadata |
| `/api/v1/transactions/drift` | GET | Expose current PSI and supporting drift summaries for monitoring consumers and dashboards |
| `/api/v1/transactions/drift/baseline` | POST | Create or update baseline distributions used in rolling drift comparison windows |
| `/api/v1/health/live` | GET | Expose liveness for orchestrators and synthetic monitoring |
| `/api/v1/health/ready` | GET | Expose readiness based on scoring-critical dependency checks |
| `/ws/alerts` | WebSocket | Stream REVIEW and BLOCKED events to subscribed clients for near-real-time intervention |
| `/api/v1/transactions/export` | GET | Export filtered decision records for governance review, retrospective analysis, and investigations |
Drift monitoring PSI section
SentinelAI uses Population Stability Index (PSI) as a practical first-line drift signal for production monitoring. Baseline distributions are defined over approved historical windows and compared to rolling production windows using the same engineered features consumed by the model. Fixed binning and stable feature semantics preserve comparability across time and model versions. This turns PSI into an interpretable operational indicator rather than an unstable metric artifact.
PSI alone is not sufficient for action. Elevated PSI may indicate fraud pattern changes, traffic-mix shifts, instrument drift, data quality defects, or benign seasonality. SentinelAI therefore correlates PSI with decision-mix movement, service latency, and downstream fraud-confirmation outcomes before escalating policy actions. Escalation follows tiered runbooks: informational drift triggers observation, warning drift triggers deep diagnostics and segment-level analysis, and critical drift triggers governance review with bounded mitigation options such as threshold tuning, feature gate controls, or expedited retraining.
Baseline governance is versioned, auditable, and decision-relevant. Each baseline update records data window rationale, quality diagnostics, sampling controls, and approver metadata. This allows teams to reconstruct drift alerts in historical context even after baseline refreshes. It also prevents reactive retraining loops that chase metric noise without business validation. In mature fraud programs, disciplined baseline governance is the difference between stable improvement and continuous operational churn.
- Define baseline windows from instrumentation-stable, quality-screened production data.
- Compute PSI with fixed bins and immutable feature semantics to maintain metric comparability.
- Evaluate PSI jointly with decision-mix shifts, latency movement, and realized case outcomes.
- Apply tiered thresholds with explicit runbooks for observation, diagnostics, and governance escalation.
- Persist baseline lineage and approvals so historical alerts remain reconstructable during audits.
- Use drift evidence to prioritize targeted mitigations before broad retraining campaigns.
Horizontal scaling table
Horizontal scaling in SentinelAI preserves deterministic request-time scoring while moving shared-state concerns to dedicated infrastructure. This keeps replica behavior consistent and reduces state-coupling failures under autoscaling events. The path from compact deployment to enterprise topology is incremental: evolve one capability at a time, validate with shadow traffic, and preserve API and model-serving contracts throughout.
| Topic | Current | Enterprise extension |
|---|---|---|
| API runtime | Single FastAPI instance with local dependency graph | Stateless multi-replica deployment behind managed load balancer and autoscaling |
| Alert fan-out | In-process WebSocket broadcast | Redis Pub/Sub or Kafka backbone for cross-replica distribution and durability |
| Metrics path | Local counters via `PerformanceTracker` | Prometheus or OpenTelemetry export with centralized aggregation and SLO alerting |
| Artifact distribution | Local disk artifact mount | Signed object storage or model registry promotion pipeline with rollback gates |
| Database connectivity | Direct Postgres connection from API process | Managed pooling proxy, failover-aware topology, and read replica strategy |
| Background workloads | Light auxiliary work in API process | Dedicated worker tier for heavy explanation generation, drift backfill, and enrichment jobs |
| Release management | Single deploy with manual smoke checks | Canary or blue-green rollout with automated rollback and health gate policies |
| Secrets posture | Environment variable injection per deployment | Centralized secret manager with rotation and runtime identity-based access controls |
| Observability governance | Service logs and local dashboards | Centralized traces, metrics, long-retention logs, and incident-driven observability standards |
| Multi-region resiliency | Single-region deployment | Active-passive or active-active region strategy with explicit data and alert replication policies |
The most common scaling anti-pattern is increasing replica count before defining cross-replica semantics for alerts, metrics, and artifact synchronization. SentinelAI avoids that by explicitly declaring which operations stay in request-time deterministic scope and which move to distributed infrastructure. This improves failover behavior, protects decision consistency under burst load, and reduces governance friction because control boundaries remain explicit and testable.
Enterprise adoption typically proceeds in phases: first harden observability and health semantics, then externalize alert fan-out, then strengthen artifact promotion and release controls, and finally adopt multi-region resilience where business continuity requirements demand it. This sequence minimizes risk by avoiding architecture overreach before operational maturity exists to support it.
Next.js ops console
A Next.js operations console provides a practical human control plane for SentinelAI without coupling UI behavior to backend internals. The console can consume API v1 contracts to present live throughput, class distribution, latency percentiles, drift summaries, readiness status, and searchable transaction history with lineage context. Because the UI is contract-bound and stateless, product teams can iterate quickly on analyst workflows while backend teams preserve strict decision and governance controls. In daily operations, this reduces ad hoc database access, shortens incident triage, and gives fraud analysts, data scientists, and SREs a shared operational view anchored to immutable decision records.
For mature teams, the console also becomes a policy-communication surface. It can expose baseline version status, current model release lineage, and runbook-triggered recommendations when drift or latency thresholds are breached. By presenting this information directly from contract-safe endpoints, organizations maintain a clean separation between observability and control while improving response coordination across technical and business stakeholders.
Key Features & Capabilities
- FastAPI async serving with ML inference in thread pool to avoid blocking the event loop
- XGBoost primary classifier with Isolation Forest cold-start and SHAP TreeExplainer explanations
- Three-way decision routing: APPROVED, REVIEW, and BLOCKED with configurable thresholds
- PSI drift monitoring over persisted SHAP explanations with explicit baseline capture
- WebSocket alert channel for BLOCKED/REVIEW decisions with bounded reconnect history
- Next.js ops console with KPIs, decision mix, WebSocket incident stream, and scoring sandbox
Tech Stack & Components
Getting Started
1.Local development
Train models with creditcard.csv under ml/data/ before scoring.
python -m venv .venv
pip install -r requirements.txt
cp .env.example .env
python ml/train.py
uvicorn app.main:app --reload --host 127.0.0.1 --port 80002.Next.js catalog console
Production-style ops console on port 3010.
cd frontend
cp .env.example .env.local
npm install
npm run dev:catalog3.Quality gates
Run lint and tests before deployment.
ruff check app ml tests
pytest -qFrequently asked questions
- What is SentinelAI?
- SentinelAI is a self-hosted real-time fraud scoring platform: FastAPI REST and WebSocket service, PostgreSQL audit logs, XGBoost classification with SHAP TreeExplainer explanations, optional Isolation Forest cold-start path, PSI drift monitoring, and optional local Ollama narratives.
- How does SentinelAI explain fraud decisions?
- The supervised path uses SHAP TreeExplainer on XGBoost to produce top-k feature contributions stored as JSON in transaction_logs. ExplanationService optionally generates natural-language summaries via Ollama, with deterministic SHAP-based template fallback when the LLM is unavailable.
- What fraud decisions does SentinelAI return?
- Three-way routing: APPROVED, REVIEW, or BLOCKED based on configurable FRAUD_THRESHOLD_BLOCK and FRAUD_THRESHOLD_REVIEW boundaries applied to fraud probability from XGBoost predict_proba (or Isolation Forest score mapping in cold-start mode).
- How does SentinelAI monitor model drift?
- DriftDetector implements Population Stability Index (PSI) over ten equal-width bins on SHAP-impact aggregates. Operators capture an explicit baseline via POST /api/v1/transactions/drift/baseline; GET /api/v1/transactions/drift reports PSI against that baseline.
- Does SentinelAI send PII to third-party LLM APIs?
- No third-party LLM is required. Core scoring and SHAP run locally. Optional Ollama narratives call your local Ollama instance only. API key auth (AUTH_MODE=api_key), rate limiting, and controlled error disclosure support self-hosted deployment without routing inference through external SaaS.
