AI Orchestrator

A Specialized and Secure AI Orchestrator for Swiss Financial Compliance

View the Project on GitHub Digital-AI-Finance/wecan-innosuisse-ai-draft

System Architecture

Home > Architecture


Overview

The AI Orchestrator is designed as a modular, on-premise system that processes compliance documents through multiple stages: digitization, extraction, mapping, and pre-filling.

+------------------------------------------------------------------+
|                        AI Orchestrator                            |
+------------------------------------------------------------------+
|                                                                   |
|  +-------------+   +-------------+   +-------------+   +--------+ |
|  | Document    |   | Field       |   | Schema      |   | Pre-   | |
|  | Digitization|-->| Extraction  |-->| Mapping     |-->| Filling| |
|  +-------------+   +-------------+   +-------------+   +--------+ |
|       |                 |                 |               |       |
|       v                 v                 v               v       |
|  +-------------+   +-------------+   +-------------+   +--------+ |
|  | OCR Engine  |   | LLM Engine  |   | Embedding   |   | Form   | |
|  |             |   |             |   | Engine      |   | Filler | |
|  +-------------+   +-------------+   +-------------+   +--------+ |
|                                                                   |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|                       Wecan Comply                                |
+------------------------------------------------------------------+

High-Level Architecture

System Components

Component Purpose Technology
Document Ingestion Upload and preprocessing FastAPI, PyMuPDF
OCR Engine Text extraction from scans Tesseract, PaddleOCR
LLM Engine Field extraction and understanding Llama 3, vLLM
Embedding Engine Semantic matching Sentence-BERT, E5
Schema Mapper Zero-shot field mapping Custom ML
Form Filler Document pre-population PyPDF2, openpyxl
Review Interface Human-in-the-loop React, FastAPI
API Gateway External integration FastAPI

Data Flow

                    Document Upload
                          |
                          v
+------------------------------------------------------------------+
|                     DOCUMENT INGESTION                            |
|  - Format detection (PDF/Image/Excel)                            |
|  - Page extraction                                                |
|  - Quality assessment                                             |
+------------------------------------------------------------------+
                          |
                          v
+------------------------------------------------------------------+
|                     OCR PROCESSING                                |
|  - Text extraction (native or scanned)                           |
|  - Layout analysis                                                |
|  - Table detection                                                |
+------------------------------------------------------------------+
                          |
                          v
+------------------------------------------------------------------+
|                     LLM EXTRACTION                                |
|  - Field identification                                          |
|  - Value extraction                                              |
|  - Hallucination detection                                       |
+------------------------------------------------------------------+
                          |
                          v
+------------------------------------------------------------------+
|                     SCHEMA MAPPING                                |
|  - Target schema analysis                                        |
|  - Zero-shot field matching                                      |
|  - Confidence scoring                                            |
+------------------------------------------------------------------+
                          |
                          v
+------------------------------------------------------------------+
|                     PRE-FILLING                                   |
|  - Template detection                                            |
|  - Field population                                              |
|  - Validation                                                    |
+------------------------------------------------------------------+
                          |
                          v
                    Review & Export

Component Details

Document Ingestion Layer

# API Endpoint Structure
POST /api/v1/documents
  - Upload document for processing
  - Returns: document_id, status

GET /api/v1/documents/{id}
  - Get document status and results
  - Returns: status, extracted_fields, confidence

POST /api/v1/documents/{id}/prefill
  - Generate pre-filled document
  - Returns: filled_document_url

Supported Formats:

Format Handler Notes
PDF (native) PyMuPDF Direct text extraction
PDF (scanned) Tesseract + PaddleOCR OCR processing
Images PIL + OCR JPEG, PNG, TIFF
Excel openpyxl Cell-by-cell extraction
Word python-docx Content control extraction

OCR Engine

+------------------+
|   OCR Manager    |
+------------------+
        |
        +---> PyMuPDF (native PDFs)
        |
        +---> Tesseract (scanned, general)
        |
        +---> PaddleOCR (tables, complex layouts)
        |
        +---> Docling (document understanding)

OCR Pipeline:

  1. Preprocessing
    • Deskew correction
    • Noise reduction
    • Contrast enhancement
  2. Text Extraction
    • Run multiple OCR engines
    • Confidence-weighted fusion
    • Layout preservation
  3. Post-processing
    • Spell correction
    • Language detection
    • Structure analysis

LLM Engine

Model Architecture:

Base Model (7-13B parameters)
         |
         v
+------------------+
| LoRA Adapters    |  <-- Domain-specific
+------------------+
         |
         v
+------------------+
| vLLM Inference   |  <-- Optimized serving
+------------------+
         |
         v
+------------------+
| Hallucination    |  <-- Output validation
| Detector         |
+------------------+

Supported Models:

Model Parameters Memory Speed
Llama 3 8B 8B 16GB Fast
Mistral 7B 7B 14GB Fast
Qwen2 14B 14B 28GB Medium

Inference Configuration:

model:
  name: "meta-llama/Llama-3.1-8B"
  quantization: "int8"  # 4-bit or 8-bit
  max_length: 8192      # Context window

serving:
  engine: "vllm"
  tensor_parallel: 1
  gpu_memory_utilization: 0.9

Embedding Engine

Embedding Pipeline:

Text Input
    |
    v
+------------------+
| Tokenization     |
+------------------+
    |
    v
+------------------+
| Sentence-BERT    |  <-- Semantic embeddings
| or mE5           |      (multilingual)
+------------------+
    |
    v
+------------------+
| Vector Store     |  <-- Similarity search
| (FAISS/Milvus)   |
+------------------+

Models:

Model Languages Dimensions Use Case
all-MiniLM-L6-v2 EN 384 General
multilingual-e5-large 100+ 1024 Multilingual
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 50+ 384 Balanced

Schema Mapper

Mapping Algorithm:

Source Fields          Target Schema
     |                      |
     v                      v
+----------+          +----------+
| Embed    |          | Embed    |
| Fields   |          | Schema   |
+----------+          +----------+
     |                      |
     +----------+-----------+
                |
                v
        +---------------+
        | Similarity    |
        | Matrix        |
        +---------------+
                |
                v
        +---------------+
        | Hungarian     |  <-- Optimal assignment
        | Algorithm     |
        +---------------+
                |
                v
        +---------------+
        | Confidence    |
        | Filtering     |
        +---------------+
                |
                v
           Field Mapping

Form Filler

Supported Field Types:

Type Detection Filling
Text fields Form field detection Direct insertion
Checkboxes Visual ML detection State toggle
Radio buttons Form field enumeration Selection
Dropdowns Form field + options Value matching
Tables Structure detection Cell-by-cell
Date fields Pattern recognition Format normalization

Deployment Architecture

On-Premise Deployment

+------------------------------------------------------------------+
|                     Customer Environment                          |
+------------------------------------------------------------------+
|                                                                   |
|  +------------------+     +------------------+                    |
|  | Load Balancer   |     | Monitoring       |                    |
|  | (nginx)         |     | (Prometheus)     |                    |
|  +------------------+     +------------------+                    |
|           |                        |                              |
|           v                        v                              |
|  +------------------+     +------------------+                    |
|  | API Server      |     | Grafana          |                    |
|  | (FastAPI)       |     | Dashboard        |                    |
|  +------------------+     +------------------+                    |
|           |                                                       |
|           v                                                       |
|  +------------------+     +------------------+                    |
|  | LLM Server      |     | Embedding        |                    |
|  | (vLLM)          |     | Server           |                    |
|  +------------------+     +------------------+                    |
|           |                        |                              |
|           v                        v                              |
|  +------------------+     +------------------+                    |
|  | GPU Node        |     | Vector Store     |                    |
|  | (A100/RTX4090)  |     | (FAISS)          |                    |
|  +------------------+     +------------------+                    |
|           |                        |                              |
|           v                        v                              |
|  +------------------+     +------------------+                    |
|  | Document        |     | Result           |                    |
|  | Storage         |     | Storage          |                    |
|  +------------------+     +------------------+                    |
|                                                                   |
+------------------------------------------------------------------+

Hardware Requirements

Configuration GPU RAM Storage Throughput
Minimum RTX 4090 24GB 64GB 500GB SSD 5 docs/hour
Recommended A100 40GB 128GB 1TB NVMe 20 docs/hour
Enterprise 2x A100 80GB 256GB 2TB NVMe 50+ docs/hour

Docker Compose Structure

version: '3.8'
services:
  api:
    image: ai-orchestrator/api:latest
    ports:
      - "8000:8000"
    environment:
      - LLM_ENDPOINT=http://llm:8080

  llm:
    image: ai-orchestrator/llm:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  embedding:
    image: ai-orchestrator/embedding:latest
    ports:
      - "8001:8001"

  vector-store:
    image: milvusdb/milvus:latest
    ports:
      - "19530:19530"

Security Architecture

Data Protection

+------------------------------------------------------------------+
|                     Security Layers                               |
+------------------------------------------------------------------+
|                                                                   |
|  [Transport]  TLS 1.3 encryption for all connections             |
|                           |                                       |
|  [Storage]    AES-256 encryption at rest                         |
|                           |                                       |
|  [Access]     RBAC with OAuth 2.0 / OIDC                         |
|                           |                                       |
|  [Audit]      Complete audit logging                             |
|                           |                                       |
|  [Network]    Network isolation, firewall rules                  |
|                                                                   |
+------------------------------------------------------------------+

Compliance Features

Requirement Implementation
FINMA data residency On-premise only, Swiss hosting
FADP consent Configurable consent workflows
GDPR rights Data export, deletion APIs
Audit logging Immutable audit trail
Access control Role-based, multi-tenant

GitHub Issue: #440 - Security Audit Checklist


Integration Architecture

Wecan Comply Integration

AI Orchestrator              Wecan Comply
      |                           |
      | 1. POST /api/v1/documents |
      |-------------------------->|
      |                           |
      | 2. Webhook: processing    |
      |<--------------------------|
      |                           |
      | 3. GET /api/v1/results    |
      |-------------------------->|
      |                           |
      | 4. Results + mappings     |
      |<--------------------------|
      |                           |
      | 5. POST /api/v1/prefill   |
      |-------------------------->|
      |                           |
      | 6. Pre-filled document    |
      |<--------------------------|

API Specification

openapi: 3.0.0
info:
  title: AI Orchestrator API
  version: 1.0.0

paths:
  /documents:
    post:
      summary: Upload document for processing
      requestBody:
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                file:
                  type: string
                  format: binary
                schema_id:
                  type: string
      responses:
        '202':
          description: Processing started

  /documents/{id}/results:
    get:
      summary: Get extraction results
      responses:
        '200':
          description: Extraction complete
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExtractionResult'

GitHub Issue: #444 - API Documentation


Performance Architecture

Optimization Strategies

Layer Strategy Impact
OCR Parallel page processing 3x speedup
LLM Batched inference 2x throughput
Embedding Caching 10x for repeated queries
Storage SSD + caching Low latency

Scaling

                    Load Balancer
                         |
         +---------------+---------------+
         |               |               |
    API Server 1    API Server 2    API Server 3
         |               |               |
         +---------------+---------------+
                         |
                    GPU Cluster
                         |
         +---------------+---------------+
         |               |               |
      GPU 1           GPU 2           GPU 3
    (LLM 1)         (LLM 2)         (LLM 3)

Monitoring

Metric Tool Alert Threshold
Latency Prometheus >5s per page
GPU utilization NVIDIA DCGM <50% (underutil)
Memory Prometheus >90%
Queue depth Custom >100 documents
Error rate Prometheus >1%

Technology Stack Summary

Layer Technology Purpose
API FastAPI REST API, async processing
LLM Serving vLLM High-performance inference
OCR Tesseract, PaddleOCR Text extraction
Embeddings Sentence-BERT Semantic matching
Vector Store FAISS/Milvus Similarity search
Document Processing PyMuPDF, python-docx Format handling
Form Filling PyPDF2, openpyxl Output generation
Frontend React Review interface
Container Docker Deployment
Orchestration Docker Compose/K8s Scaling
Monitoring Prometheus + Grafana Observability

Back to Home Previous: Partners Next: Progress