SentinelAI is a self-hosted real-time fraud scoring platform: FastAPI REST and WebSocket service, PostgreSQL audit logs, XGBoost classification with SHAP TreeExplainer explanations, optional Isolation Forest cold-start path, PSI drift monitoring, and optional local Ollama narratives.

How does SentinelAI explain fraud decisions?

The supervised path uses SHAP TreeExplainer on XGBoost to produce top-k feature contributions stored as JSON in transaction_logs. ExplanationService optionally generates natural-language summaries via Ollama, with deterministic SHAP-based template fallback when the LLM is unavailable.

What fraud decisions does SentinelAI return?

Three-way routing: APPROVED, REVIEW, or BLOCKED based on configurable FRAUD_THRESHOLD_BLOCK and FRAUD_THRESHOLD_REVIEW boundaries applied to fraud probability from XGBoost predict_proba (or Isolation Forest score mapping in cold-start mode).

How does SentinelAI monitor model drift?

DriftDetector implements Population Stability Index (PSI) over ten equal-width bins on SHAP-impact aggregates. Operators capture an explicit baseline via POST /api/v1/transactions/drift/baseline; GET /api/v1/transactions/drift reports PSI against that baseline.

Does SentinelAI send PII to third-party LLM APIs?

No third-party LLM is required. Core scoring and SHAP run locally. Optional Ollama narratives call your local Ollama instance only. API key auth (AUTH_MODE=api_key), rate limiting, and controlled error disclosure support self-hosted deployment without routing inference through external SaaS.

SentinelAI: Real-Time Fraud Scoring with XGBoost, SHAP, and Drift Monitoring

Project Summary

SentinelAI is a reference implementation of a real-time fraud scoring platform: a versioned REST + WebSocket service backed by PostgreSQL, tree-based ML (XGBoost) with SHAP explanations, an optional Isolation Forest cold-start path, PSI drift monitoring, and optional local Ollama narratives. The design target is teams needing an auditable, self-hosted scoring tier without routing inference or PII through third-party LLM APIs.

Executive summary

SentinelAI is a fraud decisioning architecture designed for organizations that need low-latency scoring, reproducible model behavior, and audit-ready evidence trails in the same operating model. The system is intentionally built around strict boundaries: transport concerns are isolated from scoring logic, model-serving logic is isolated from persistence and monitoring, and optional explanation enhancement is isolated from the deterministic decision path. This prevents accidental coupling between concerns that change at different velocities and under different risk controls. In practice, that means product teams can move quickly on client interfaces, data science teams can improve models under disciplined governance, and platform teams can harden reliability without introducing decision drift.

The platform adopts a conservative production posture where control failures are expensive. Feature contracts are immutable at inference time, transformation code runs in transform-only mode, and every prediction is stored with enough metadata to support replay and forensic analysis. Optional augmentation, including LLM narration for explanations, is treated as non-critical and cannot block the synchronous scoring response. This keeps decision behavior stable during partial outages and protects fraud operations from secondary-system failures. In environments where score outcomes influence payment authorization or account intervention, deterministic behavior under stress is more valuable than marginal feature convenience.

SentinelAI is also designed for multi-team ownership. Data scientists can publish model artifacts and calibration updates without rewriting API handlers. Backend engineers can evolve request contracts and service-level safeguards without touching model internals. SRE and security teams can enforce deployment, observability, and access controls around a stable scoring core. This architecture scales from a single process deployment to a distributed production topology with explicit evolution paths for alert distribution, metrics, artifact management, and governance workflows. The result is a production system that preserves scientific rigor while meeting enterprise uptime, security, and compliance requirements.

From a model risk perspective, SentinelAI operationalizes three principles: deterministic serving, explicit lineage, and measurable drift response. Deterministic serving ensures the same input semantics produce the same decision under a fixed model version. Explicit lineage ensures every score can be traced to the exact artifact bundle, schema version, and runtime policy that produced it. Measurable drift response ensures teams do not react to noisy signals or anecdotal incidents; they respond to structured evidence across PSI, decision-mix movement, latency patterns, and business outcomes. These principles reduce both false confidence and panic-driven interventions.

Executive summary design choice table

Dimension-level choices that shape reliability, governance, and scale.

Dimension	Design choice
Inference contract	Serve with immutable feature schema, ordered columns, and fixed transform semantics; no runtime refitting
Decision path isolation	Keep core score computation independent from optional explanation enrichment and external notification fan-out
Orchestration style	Use thin routers plus dependency-injected services to isolate protocol, business logic, and infrastructure concerns
Persistence policy	Store prediction, explanation, lineage identifiers, and correlation metadata as immutable audit records
Monitoring model	Treat latency, decision mix, and drift as first-class runtime signals with structured emission paths
Security model	Apply layered controls across auth, schema validation, rate limiting, transport protection, and error sanitization
Scale strategy	Maintain stateless scoring replicas and externalize shared state concerns to managed infrastructure components
Governance ergonomics	Version baselines, model bundles, and API contracts so investigations and rollbacks remain deterministic

Layered architecture

The layered architecture is not an aesthetic preference; it is a control strategy for high-consequence decisions. Client-facing channels feed a transport layer that handles protocol concerns and request hygiene. The transport layer delegates to dependency-injected services that own deterministic scoring, explanation assembly, persistence writes, and telemetry emission. Data plane resources such as PostgreSQL and model artifacts remain behind explicit interfaces so runtime behavior is inspectable and testable. Optional capabilities, including local LLM narration, are attached at well-defined boundaries where timeouts and fallback behavior can be enforced without touching score correctness.

Dependency injection is the key control plane in-process. It turns implicit global state into explicit constructor dependencies and allows startup checks to fail fast before unsafe traffic is admitted. Artifact loading, schema compatibility checks, repository wiring, and telemetry sinks can all be validated during boot. This reduces partial-initialization risk and makes readiness semantics meaningful. In fraud systems, a process that is alive but not decision-safe should never appear healthy to an orchestrator. The layered architecture plus DI makes that distinction practical to enforce.

The data plane remains intentionally simple: PostgreSQL for durable decision history and model artifacts on disk (or mounted volume) for predictable startup behavior. This minimizes moving parts in the core path while preserving migration options. As traffic grows, artifact distribution can move to a registry-backed pull model and decision storage can scale behind pooling and replication, but the in-process contract remains stable. Because contract boundaries are explicit, evolutionary changes do not require retraining teams or rewriting the client ecosystem.

High-level architecture

The architecture also encodes explicit degradation semantics. If database dependencies fail, readiness fails closed while liveness stays available for diagnosis. If optional explanation narration is delayed, deterministic explanation fallback still satisfies response contracts. If alert consumers lag, scoring and persistence continue while fan-out retries degrade independently. These rules prevent ambiguous behavior during incidents and make runbooks enforceable. Teams can reason about what the system guarantees under each failure mode, which is essential for business-critical fraud intervention workflows.

Component catalog

Module inventory across layers and operational responsibilities.

Layer	Module	Responsibility
Client	API Gateway / BFF	Apply edge authentication, TLS termination, and coarse traffic shaping before forwarding API calls
Client	Streamlit Dashboard	Provide analyst-facing workflows for score review, transaction inspection, and case triage
Client	Alert consumers	Consume non-approved decision events for intervention, case generation, and workflow orchestration
Transport	FastAPI routers	Validate payloads, enforce response contracts, and map protocol semantics to service operations
Transport	Middleware stack	Handle CORS, rate limiting, request logging, and correlation context propagation
Application	Dependency injection wiring	Construct and bind repositories, services, trackers, and configuration for deterministic process startup
Application	PredictionService	Orchestrate feature generation, model inference, threshold mapping, and normalized decision objects
Application	FeatureService	Build model-ready feature vectors from transaction context under train-serve aligned semantics
Application	feature_engineering	Execute canonical transform-only feature engineering with no runtime fitting side effects
Application	ExplanationService	Generate deterministic rationale and optional narrative overlays under strict timeout and fallback policies
Application	AlertService	Broadcast REVIEW or BLOCKED events with retry-aware, failure-isolated delivery behavior
Persistence	TransactionRepository	Persist immutable decision records and expose retrieval APIs for operations and audit workflows
Monitoring	PerformanceTracker	Capture latency histograms, throughput counters, and decision-mix metrics for SLO tracking
Monitoring	Drift monitor	Compute PSI and related drift indicators against approved baseline feature distributions
Data plane	PostgreSQL	Store predictions, explanations, drift metadata, and lineage fields in durable relational storage
Data plane	Model artifacts on disk	Host model binaries, preprocessing state, and schema contracts loaded and validated at startup
Optional	Ollama LLM	Generate local explanation narration without requiring external LLM network egress
Operations	Runbook and incident policies	Define escalation, fallback, and rollback procedures tied to measurable service and model-health signals

A cataloged architecture materially improves ownership clarity and incident response velocity. Engineers can map symptoms to modules quickly, define module-specific SLOs, and attach focused runbooks to each boundary. Governance and model risk teams can trace every control objective to concrete implementation components rather than abstract process documents. During post-incident reviews, this structure allows the team to separate root cause from propagation path and prioritize corrective actions that reduce future blast radius.

Scoring sequence

SentinelAI scoring follows a strict sequence of operations designed for reproducibility, bounded latency, and complete observability. The router validates payload structure and dispatches CPU-heavy work to a thread pool boundary so asynchronous request handling remains healthy under load. Feature construction and model inference produce a normalized prediction object that includes confidence and decision class metadata. Explanation generation follows with deterministic fallback behavior. Persistence and telemetry recording occur as first-class side effects, not optional hooks, ensuring every response corresponds to a durable operational record.

Scoring sequence

Client submits `POST /predict` with a typed JSON transaction payload.
Predict router validates request and forwards scoring work to a thread pool to protect event-loop responsiveness.
Prediction service calls `FeatureService` to build model input under train-serve aligned transformation rules.
`feature_engineering` applies transform-only logic and returns a fixed-order engineered feature row.
Feature row is converted to model-ready ndarray input and scored by the loaded artifact bundle.
Prediction service returns a normalized `PredictionResult` to the router with decision and confidence fields.
Router invokes `ExplanationService` in a thread pool and obtains deterministic explanation text with fallback.
Router persists prediction and explanation through `TransactionRepository` with model and schema lineage metadata.
Router records latency and decision metrics through `PerformanceTracker` for SLO and risk-performance analysis.
If decision class is not `APPROVED`, router emits alert events through `AlertService`.
Router returns `PredictionWithExplanation` to the client under a stable API v1 response contract.

This sequence enables robust validation at multiple levels. Contract tests can verify schema and response invariants at the router boundary. Service tests can verify deterministic feature mapping and threshold logic with fixture-driven assertions. Integration tests can confirm persistence durability, alert branching behavior, and telemetry completeness. Because each stage is explicitly represented, latency budgets can be assigned per stage and optimized with precision rather than broad guesswork. In production, that translates into faster remediation and more credible performance commitments.

Operationally, the sequence supports clear semantics for partial failure handling. If explanation generation times out, deterministic fallback explanation text is returned and the score path remains intact. If alert delivery to consumers fails, the client still receives the decision while delivery retry and failure tracking occur in isolated paths. If database persistence fails, the request can be failed explicitly according to policy, preserving consistency between user-visible outcomes and system-of-record evidence. Explicit semantics are critical in fraud systems because silent side-effect loss can create governance and legal risk.

ML train-serve contract

The train-serve contract is the strongest predictor of long-term score integrity in production ML systems. SentinelAI enforces a strict contract between training outputs and serving inputs: fixed feature names, fixed ordering, explicit dtypes, stable categorical treatment, and frozen transformation behavior. Artifact bundles include both model state and schema metadata so contract checks are machine-verifiable at startup. This protects against silent regressions caused by pipeline drift, accidental feature reorder, implicit type coercion, and mutable preprocessing behavior.

Startup validation blocks readiness if any required artifact or schema constraint is missing or inconsistent. Inference-time feature engineering remains pure and side-effect free: no adaptive encoders, no online fit operations, and no schema mutation by request traffic. Predictions are persisted with model version and schema version identifiers so historical events can be replayed exactly with the original contract context. This replayability supports threshold recalibration analysis, disputed-decision review, and model risk governance obligations.

A strong contract also improves experimentation quality. Candidate models can run in shadow mode against live traffic with confidence that observed deltas reflect model behavior rather than serve-time transformation mismatch. Teams can evaluate lift and stability by segment, run phased threshold migration, and execute controlled champion-challenger transitions with deterministic rollback points. In high-churn fraud environments, this reduces both missed-fraud exposure and unnecessary customer friction caused by unstable rule changes.

Package model binaries, preprocessors, and schema metadata as a single versioned artifact contract.
Fail readiness when model artifact, schema lineage, or compatibility checks do not pass.
Enforce transform-only inference with fixed ordering, dtype discipline, and deterministic mapping logic.
Persist model and schema identifiers with each decision to support replay and audit reconstruction.
Run shadow scoring and segment-level evaluation before promoting candidate models to primary traffic.
Use explicit rollback policy keyed by model-version lineage rather than ad hoc hotfix changes.

Security table

Security in SentinelAI is implemented as layered controls mapped to concrete risks, not as an afterthought checklist. Controls are integrated into ingress, runtime, storage, and observability paths so secure behavior is the default operating mode. The objective is to reduce exploitability while preserving decision throughput and operational visibility.

Defense-in-depth controls across ingress, runtime, storage, and monitoring.

Control area	Implementation	Risk addressed
Authentication and authorization	Gateway token validation plus API key policy with route-level authorization constraints	Unauthorized scoring access and privilege abuse
Transport protection	TLS at ingress and optional mTLS for service-to-service paths	Traffic interception and request tampering
Abuse prevention	SlowAPI budgets with source-aware throttling and burst controls	DoS behavior, brute-force probing, and resource starvation
Payload integrity	Strict schema validation with constrained fields, enums, and type safety	Injection vectors and malformed payload exploitation
Error sanitization	Structured error envelopes with correlation IDs and hidden internal stack details	Reconnaissance via stack traces or sensitive diagnostics
Secrets discipline	External secret injection and scheduled rotation with no secrets in repository history	Credential leakage and long-lived secret exposure
Data minimization	Persist only decision-relevant fields and redact or hash sensitive identifiers when possible	PII over-retention and compliance exposure
Auditability	Immutable decision records with timestamping, lineage metadata, and trace correlation	Poor forensic traceability during incidents and disputes
Optional LLM isolation	Local Ollama runtime with optional disable flag and strict timeout boundaries	External data egress and residency-control violations
Supply chain hygiene	Pinned dependencies, vulnerability scanning, and signed release artifacts	Compromised package risk and unverified runtime binaries

Health endpoints table

Health signaling separates process vitality from scoring readiness so orchestration can make safe routing decisions. Liveness should remain stable during transient dependency incidents, while readiness must fail when scoring invariants cannot be guaranteed. This distinction avoids accidental traffic admission to partially initialized instances.

Operational probes that separate process vitality from scoring readiness.

Endpoint	Purpose	Expected behavior
`GET /api/v1/health/live`	Liveness probe for process viability	Returns success while process is alive, independent of model artifact and database availability
`GET /api/v1/health/ready`	Readiness probe for safe traffic admission	Returns success only when scoring-critical dependencies and contracts pass startup/runtime checks
`GET /`	Root service metadata and smoke-test endpoint	Returns lightweight service metadata for ingress and deploy verification
`GET /api/v1/health/model`	Model and schema integrity diagnostics	Reports artifact load status and fails when model-contract elements are inconsistent
`GET /api/v1/health/db`	Database path diagnostics	Reports connectivity and basic query viability for targeted remediation
`GET /api/v1/health/dependencies`	Aggregated dependency diagnostics	Provides summarized dependency status for diagnostics while preserving error sanitization policy

API v1 table

API v1 is the stable contract layer between SentinelAI and surrounding systems, including gateways, analyst interfaces, and operational tooling. The versioned surface includes scoring, retrieval, drift operations, and health diagnostics. REST endpoints provide durable request-response workflows, while WebSocket channels support near-real-time alert streaming for intervention processes that benefit from push semantics.

Versioned API surface for scoring, retrieval, monitoring, and alert streaming.

Route	Method	Responsibility
`/api/v1/predict`	POST	Validate payload, execute deterministic scoring sequence, persist result, and conditionally emit alerts
`/api/v1/transactions`	GET	Return paginated decision history with filtering by time range, class, score, and metadata facets
`/api/v1/transactions/{id}`	GET	Return complete decision artifact including explanation text and lineage metadata
`/api/v1/transactions/drift`	GET	Expose current PSI and supporting drift summaries for monitoring consumers and dashboards
`/api/v1/transactions/drift/baseline`	POST	Create or update baseline distributions used in rolling drift comparison windows
`/api/v1/health/live`	GET	Expose liveness for orchestrators and synthetic monitoring
`/api/v1/health/ready`	GET	Expose readiness based on scoring-critical dependency checks
`/ws/alerts`	WebSocket	Stream REVIEW and BLOCKED events to subscribed clients for near-real-time intervention
`/api/v1/transactions/export`	GET	Export filtered decision records for governance review, retrospective analysis, and investigations

Drift monitoring PSI section

SentinelAI uses Population Stability Index (PSI) as a practical first-line drift signal for production monitoring. Baseline distributions are defined over approved historical windows and compared to rolling production windows using the same engineered features consumed by the model. Fixed binning and stable feature semantics preserve comparability across time and model versions. This turns PSI into an interpretable operational indicator rather than an unstable metric artifact.

PSI alone is not sufficient for action. Elevated PSI may indicate fraud pattern changes, traffic-mix shifts, instrument drift, data quality defects, or benign seasonality. SentinelAI therefore correlates PSI with decision-mix movement, service latency, and downstream fraud-confirmation outcomes before escalating policy actions. Escalation follows tiered runbooks: informational drift triggers observation, warning drift triggers deep diagnostics and segment-level analysis, and critical drift triggers governance review with bounded mitigation options such as threshold tuning, feature gate controls, or expedited retraining.

Baseline governance is versioned, auditable, and decision-relevant. Each baseline update records data window rationale, quality diagnostics, sampling controls, and approver metadata. This allows teams to reconstruct drift alerts in historical context even after baseline refreshes. It also prevents reactive retraining loops that chase metric noise without business validation. In mature fraud programs, disciplined baseline governance is the difference between stable improvement and continuous operational churn.

Define baseline windows from instrumentation-stable, quality-screened production data.
Compute PSI with fixed bins and immutable feature semantics to maintain metric comparability.
Evaluate PSI jointly with decision-mix shifts, latency movement, and realized case outcomes.
Apply tiered thresholds with explicit runbooks for observation, diagnostics, and governance escalation.
Persist baseline lineage and approvals so historical alerts remain reconstructable during audits.
Use drift evidence to prioritize targeted mitigations before broad retraining campaigns.

Horizontal scaling table

Horizontal scaling in SentinelAI preserves deterministic request-time scoring while moving shared-state concerns to dedicated infrastructure. This keeps replica behavior consistent and reduces state-coupling failures under autoscaling events. The path from compact deployment to enterprise topology is incremental: evolve one capability at a time, validate with shadow traffic, and preserve API and model-serving contracts throughout.

Current posture and enterprise-scale extension path by operational topic.

Topic	Current	Enterprise extension
API runtime	Single FastAPI instance with local dependency graph	Stateless multi-replica deployment behind managed load balancer and autoscaling
Alert fan-out	In-process WebSocket broadcast	Redis Pub/Sub or Kafka backbone for cross-replica distribution and durability
Metrics path	Local counters via `PerformanceTracker`	Prometheus or OpenTelemetry export with centralized aggregation and SLO alerting
Artifact distribution	Local disk artifact mount	Signed object storage or model registry promotion pipeline with rollback gates
Database connectivity	Direct Postgres connection from API process	Managed pooling proxy, failover-aware topology, and read replica strategy
Background workloads	Light auxiliary work in API process	Dedicated worker tier for heavy explanation generation, drift backfill, and enrichment jobs
Release management	Single deploy with manual smoke checks	Canary or blue-green rollout with automated rollback and health gate policies
Secrets posture	Environment variable injection per deployment	Centralized secret manager with rotation and runtime identity-based access controls
Observability governance	Service logs and local dashboards	Centralized traces, metrics, long-retention logs, and incident-driven observability standards
Multi-region resiliency	Single-region deployment	Active-passive or active-active region strategy with explicit data and alert replication policies

The most common scaling anti-pattern is increasing replica count before defining cross-replica semantics for alerts, metrics, and artifact synchronization. SentinelAI avoids that by explicitly declaring which operations stay in request-time deterministic scope and which move to distributed infrastructure. This improves failover behavior, protects decision consistency under burst load, and reduces governance friction because control boundaries remain explicit and testable.

Enterprise adoption typically proceeds in phases: first harden observability and health semantics, then externalize alert fan-out, then strengthen artifact promotion and release controls, and finally adopt multi-region resilience where business continuity requirements demand it. This sequence minimizes risk by avoiding architecture overreach before operational maturity exists to support it.

Next.js ops console

A Next.js operations console provides a practical human control plane for SentinelAI without coupling UI behavior to backend internals. The console can consume API v1 contracts to present live throughput, class distribution, latency percentiles, drift summaries, readiness status, and searchable transaction history with lineage context. Because the UI is contract-bound and stateless, product teams can iterate quickly on analyst workflows while backend teams preserve strict decision and governance controls. In daily operations, this reduces ad hoc database access, shortens incident triage, and gives fraud analysts, data scientists, and SREs a shared operational view anchored to immutable decision records.

For mature teams, the console also becomes a policy-communication surface. It can expose baseline version status, current model release lineage, and runbook-triggered recommendations when drift or latency thresholds are breached. By presenting this information directly from contract-safe endpoints, organizations maintain a clean separation between observability and control while improving response coordination across technical and business stakeholders.

Key Features & Capabilities

FastAPI async serving with ML inference in thread pool to avoid blocking the event loop
XGBoost primary classifier with Isolation Forest cold-start and SHAP TreeExplainer explanations
Three-way decision routing: APPROVED, REVIEW, and BLOCKED with configurable thresholds
PSI drift monitoring over persisted SHAP explanations with explicit baseline capture
WebSocket alert channel for BLOCKED/REVIEW decisions with bounded reconnect history
Next.js ops console with KPIs, decision mix, WebSocket incident stream, and scoring sandbox

Tech Stack & Components

Python 3.11FastAPIUvicornXGBoostSHAPPostgreSQLSQLAlchemy 2.x asyncOllama (optional)Next.jsStreamlitDocker Compose

Getting Started

1.Local development

Train models with creditcard.csv under ml/data/ before scoring.

python -m venv .venv
pip install -r requirements.txt
cp .env.example .env
python ml/train.py
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000

2.Next.js catalog console

Production-style ops console on port 3010.

cd frontend
cp .env.example .env.local
npm install
npm run dev:catalog

3.Quality gates

Run lint and tests before deployment.

ruff check app ml tests
pytest -q

Frequently asked questions

What is SentinelAI?: SentinelAI is a self-hosted real-time fraud scoring platform: FastAPI REST and WebSocket service, PostgreSQL audit logs, XGBoost classification with SHAP TreeExplainer explanations, optional Isolation Forest cold-start path, PSI drift monitoring, and optional local Ollama narratives.
How does SentinelAI explain fraud decisions?: The supervised path uses SHAP TreeExplainer on XGBoost to produce top-k feature contributions stored as JSON in transaction_logs. ExplanationService optionally generates natural-language summaries via Ollama, with deterministic SHAP-based template fallback when the LLM is unavailable.
What fraud decisions does SentinelAI return?: Three-way routing: APPROVED, REVIEW, or BLOCKED based on configurable FRAUD_THRESHOLD_BLOCK and FRAUD_THRESHOLD_REVIEW boundaries applied to fraud probability from XGBoost predict_proba (or Isolation Forest score mapping in cold-start mode).
How does SentinelAI monitor model drift?: DriftDetector implements Population Stability Index (PSI) over ten equal-width bins on SHAP-impact aggregates. Operators capture an explicit baseline via POST /api/v1/transactions/drift/baseline; GET /api/v1/transactions/drift reports PSI against that baseline.
Does SentinelAI send PII to third-party LLM APIs?: No third-party LLM is required. Core scoring and SHAP run locally. Optional Ollama narratives call your local Ollama instance only. API key auth (AUTH_MODE=api_key), rate limiting, and controlled error disclosure support self-hosted deployment without routing inference through external SaaS.

Project Summary

Technical deep dive

Executive summary

Executive summary design choice table

Layered architecture

Component catalog

Scoring sequence

ML train-serve contract

Security table

Health endpoints table

API v1 table

Drift monitoring PSI section

Horizontal scaling table

Next.js ops console

Key Features & Capabilities

Tech Stack & Components

Getting Started

1.Local development

2.Next.js catalog console

3.Quality gates

Frequently asked questions