A Specialized and Secure AI Orchestrator for Swiss Financial Compliance
View the Project on GitHub Digital-AI-Finance/wecan-innosuisse-ai-draft
Home > Architecture
The AI Orchestrator is designed as a modular, on-premise system that processes compliance documents through multiple stages: digitization, extraction, mapping, and pre-filling.
+------------------------------------------------------------------+
| AI Orchestrator |
+------------------------------------------------------------------+
| |
| +-------------+ +-------------+ +-------------+ +--------+ |
| | Document | | Field | | Schema | | Pre- | |
| | Digitization|-->| Extraction |-->| Mapping |-->| Filling| |
| +-------------+ +-------------+ +-------------+ +--------+ |
| | | | | |
| v v v v |
| +-------------+ +-------------+ +-------------+ +--------+ |
| | OCR Engine | | LLM Engine | | Embedding | | Form | |
| | | | | | Engine | | Filler | |
| +-------------+ +-------------+ +-------------+ +--------+ |
| |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| Wecan Comply |
+------------------------------------------------------------------+
| Component | Purpose | Technology |
|---|---|---|
| Document Ingestion | Upload and preprocessing | FastAPI, PyMuPDF |
| OCR Engine | Text extraction from scans | Tesseract, PaddleOCR |
| LLM Engine | Field extraction and understanding | Llama 3, vLLM |
| Embedding Engine | Semantic matching | Sentence-BERT, E5 |
| Schema Mapper | Zero-shot field mapping | Custom ML |
| Form Filler | Document pre-population | PyPDF2, openpyxl |
| Review Interface | Human-in-the-loop | React, FastAPI |
| API Gateway | External integration | FastAPI |
Document Upload
|
v
+------------------------------------------------------------------+
| DOCUMENT INGESTION |
| - Format detection (PDF/Image/Excel) |
| - Page extraction |
| - Quality assessment |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| OCR PROCESSING |
| - Text extraction (native or scanned) |
| - Layout analysis |
| - Table detection |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| LLM EXTRACTION |
| - Field identification |
| - Value extraction |
| - Hallucination detection |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| SCHEMA MAPPING |
| - Target schema analysis |
| - Zero-shot field matching |
| - Confidence scoring |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| PRE-FILLING |
| - Template detection |
| - Field population |
| - Validation |
+------------------------------------------------------------------+
|
v
Review & Export
# API Endpoint Structure
POST /api/v1/documents
- Upload document for processing
- Returns: document_id, status
GET /api/v1/documents/{id}
- Get document status and results
- Returns: status, extracted_fields, confidence
POST /api/v1/documents/{id}/prefill
- Generate pre-filled document
- Returns: filled_document_url
Supported Formats:
| Format | Handler | Notes |
|---|---|---|
| PDF (native) | PyMuPDF | Direct text extraction |
| PDF (scanned) | Tesseract + PaddleOCR | OCR processing |
| Images | PIL + OCR | JPEG, PNG, TIFF |
| Excel | openpyxl | Cell-by-cell extraction |
| Word | python-docx | Content control extraction |
+------------------+
| OCR Manager |
+------------------+
|
+---> PyMuPDF (native PDFs)
|
+---> Tesseract (scanned, general)
|
+---> PaddleOCR (tables, complex layouts)
|
+---> Docling (document understanding)
OCR Pipeline:
Model Architecture:
Base Model (7-13B parameters)
|
v
+------------------+
| LoRA Adapters | <-- Domain-specific
+------------------+
|
v
+------------------+
| vLLM Inference | <-- Optimized serving
+------------------+
|
v
+------------------+
| Hallucination | <-- Output validation
| Detector |
+------------------+
Supported Models:
| Model | Parameters | Memory | Speed |
|---|---|---|---|
| Llama 3 8B | 8B | 16GB | Fast |
| Mistral 7B | 7B | 14GB | Fast |
| Qwen2 14B | 14B | 28GB | Medium |
Inference Configuration:
model:
name: "meta-llama/Llama-3.1-8B"
quantization: "int8" # 4-bit or 8-bit
max_length: 8192 # Context window
serving:
engine: "vllm"
tensor_parallel: 1
gpu_memory_utilization: 0.9
Embedding Pipeline:
Text Input
|
v
+------------------+
| Tokenization |
+------------------+
|
v
+------------------+
| Sentence-BERT | <-- Semantic embeddings
| or mE5 | (multilingual)
+------------------+
|
v
+------------------+
| Vector Store | <-- Similarity search
| (FAISS/Milvus) |
+------------------+
Models:
| Model | Languages | Dimensions | Use Case |
|---|---|---|---|
| all-MiniLM-L6-v2 | EN | 384 | General |
| multilingual-e5-large | 100+ | 1024 | Multilingual |
| sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 50+ | 384 | Balanced |
Mapping Algorithm:
Source Fields Target Schema
| |
v v
+----------+ +----------+
| Embed | | Embed |
| Fields | | Schema |
+----------+ +----------+
| |
+----------+-----------+
|
v
+---------------+
| Similarity |
| Matrix |
+---------------+
|
v
+---------------+
| Hungarian | <-- Optimal assignment
| Algorithm |
+---------------+
|
v
+---------------+
| Confidence |
| Filtering |
+---------------+
|
v
Field Mapping
Supported Field Types:
| Type | Detection | Filling |
|---|---|---|
| Text fields | Form field detection | Direct insertion |
| Checkboxes | Visual ML detection | State toggle |
| Radio buttons | Form field enumeration | Selection |
| Dropdowns | Form field + options | Value matching |
| Tables | Structure detection | Cell-by-cell |
| Date fields | Pattern recognition | Format normalization |
+------------------------------------------------------------------+
| Customer Environment |
+------------------------------------------------------------------+
| |
| +------------------+ +------------------+ |
| | Load Balancer | | Monitoring | |
| | (nginx) | | (Prometheus) | |
| +------------------+ +------------------+ |
| | | |
| v v |
| +------------------+ +------------------+ |
| | API Server | | Grafana | |
| | (FastAPI) | | Dashboard | |
| +------------------+ +------------------+ |
| | |
| v |
| +------------------+ +------------------+ |
| | LLM Server | | Embedding | |
| | (vLLM) | | Server | |
| +------------------+ +------------------+ |
| | | |
| v v |
| +------------------+ +------------------+ |
| | GPU Node | | Vector Store | |
| | (A100/RTX4090) | | (FAISS) | |
| +------------------+ +------------------+ |
| | | |
| v v |
| +------------------+ +------------------+ |
| | Document | | Result | |
| | Storage | | Storage | |
| +------------------+ +------------------+ |
| |
+------------------------------------------------------------------+
| Configuration | GPU | RAM | Storage | Throughput |
|---|---|---|---|---|
| Minimum | RTX 4090 24GB | 64GB | 500GB SSD | 5 docs/hour |
| Recommended | A100 40GB | 128GB | 1TB NVMe | 20 docs/hour |
| Enterprise | 2x A100 80GB | 256GB | 2TB NVMe | 50+ docs/hour |
version: '3.8'
services:
api:
image: ai-orchestrator/api:latest
ports:
- "8000:8000"
environment:
- LLM_ENDPOINT=http://llm:8080
llm:
image: ai-orchestrator/llm:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
embedding:
image: ai-orchestrator/embedding:latest
ports:
- "8001:8001"
vector-store:
image: milvusdb/milvus:latest
ports:
- "19530:19530"
+------------------------------------------------------------------+
| Security Layers |
+------------------------------------------------------------------+
| |
| [Transport] TLS 1.3 encryption for all connections |
| | |
| [Storage] AES-256 encryption at rest |
| | |
| [Access] RBAC with OAuth 2.0 / OIDC |
| | |
| [Audit] Complete audit logging |
| | |
| [Network] Network isolation, firewall rules |
| |
+------------------------------------------------------------------+
| Requirement | Implementation |
|---|---|
| FINMA data residency | On-premise only, Swiss hosting |
| FADP consent | Configurable consent workflows |
| GDPR rights | Data export, deletion APIs |
| Audit logging | Immutable audit trail |
| Access control | Role-based, multi-tenant |
GitHub Issue: #440 - Security Audit Checklist
AI Orchestrator Wecan Comply
| |
| 1. POST /api/v1/documents |
|-------------------------->|
| |
| 2. Webhook: processing |
|<--------------------------|
| |
| 3. GET /api/v1/results |
|-------------------------->|
| |
| 4. Results + mappings |
|<--------------------------|
| |
| 5. POST /api/v1/prefill |
|-------------------------->|
| |
| 6. Pre-filled document |
|<--------------------------|
openapi: 3.0.0
info:
title: AI Orchestrator API
version: 1.0.0
paths:
/documents:
post:
summary: Upload document for processing
requestBody:
content:
multipart/form-data:
schema:
type: object
properties:
file:
type: string
format: binary
schema_id:
type: string
responses:
'202':
description: Processing started
/documents/{id}/results:
get:
summary: Get extraction results
responses:
'200':
description: Extraction complete
content:
application/json:
schema:
$ref: '#/components/schemas/ExtractionResult'
GitHub Issue: #444 - API Documentation
| Layer | Strategy | Impact |
|---|---|---|
| OCR | Parallel page processing | 3x speedup |
| LLM | Batched inference | 2x throughput |
| Embedding | Caching | 10x for repeated queries |
| Storage | SSD + caching | Low latency |
Load Balancer
|
+---------------+---------------+
| | |
API Server 1 API Server 2 API Server 3
| | |
+---------------+---------------+
|
GPU Cluster
|
+---------------+---------------+
| | |
GPU 1 GPU 2 GPU 3
(LLM 1) (LLM 2) (LLM 3)
| Metric | Tool | Alert Threshold |
|---|---|---|
| Latency | Prometheus | >5s per page |
| GPU utilization | NVIDIA DCGM | <50% (underutil) |
| Memory | Prometheus | >90% |
| Queue depth | Custom | >100 documents |
| Error rate | Prometheus | >1% |
| Layer | Technology | Purpose |
|---|---|---|
| API | FastAPI | REST API, async processing |
| LLM Serving | vLLM | High-performance inference |
| OCR | Tesseract, PaddleOCR | Text extraction |
| Embeddings | Sentence-BERT | Semantic matching |
| Vector Store | FAISS/Milvus | Similarity search |
| Document Processing | PyMuPDF, python-docx | Format handling |
| Form Filling | PyPDF2, openpyxl | Output generation |
| Frontend | React | Review interface |
| Container | Docker | Deployment |
| Orchestration | Docker Compose/K8s | Scaling |
| Monitoring | Prometheus + Grafana | Observability |
| Back to Home | Previous: Partners | Next: Progress |