System Architecture

Home > Architecture

Overview

The AI Orchestrator is designed as a modular, on-premise system that processes compliance documents through multiple stages: digitization, extraction, mapping, and pre-filling.

+------------------------------------------------------------------+
|                        AI Orchestrator                            |
+------------------------------------------------------------------+
|                                                                   |
|  +-------------+   +-------------+   +-------------+   +--------+ |
|  | Document    |   | Field       |   | Schema      |   | Pre-   | |
|  | Digitization|-->| Extraction  |-->| Mapping     |-->| Filling| |
|  +-------------+   +-------------+   +-------------+   +--------+ |
|       |                 |                 |               |       |
|       v                 v                 v               v       |
|  +-------------+   +-------------+   +-------------+   +--------+ |
|  | OCR Engine  |   | LLM Engine  |   | Embedding   |   | Form   | |
|  |             |   |             |   | Engine      |   | Filler | |
|  +-------------+   +-------------+   +-------------+   +--------+ |
|                                                                   |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|                       Wecan Comply                                |
+------------------------------------------------------------------+

High-Level Architecture

System Components

Component	Purpose	Technology
Document Ingestion	Upload and preprocessing	FastAPI, PyMuPDF
OCR Engine	Text extraction from scans	Tesseract, PaddleOCR
LLM Engine	Field extraction and understanding	Llama 3, vLLM
Embedding Engine	Semantic matching	Sentence-BERT, E5
Schema Mapper	Zero-shot field mapping	Custom ML
Form Filler	Document pre-population	PyPDF2, openpyxl
Review Interface	Human-in-the-loop	React, FastAPI
API Gateway	External integration	FastAPI

Data Flow

                    Document Upload
                          |
                          v
+------------------------------------------------------------------+
|                     DOCUMENT INGESTION                            |
|  - Format detection (PDF/Image/Excel)                            |
|  - Page extraction                                                |
|  - Quality assessment                                             |
+------------------------------------------------------------------+
                          |
                          v
+------------------------------------------------------------------+
|                     OCR PROCESSING                                |
|  - Text extraction (native or scanned)                           |
|  - Layout analysis                                                |
|  - Table detection                                                |
+------------------------------------------------------------------+
                          |
                          v
+------------------------------------------------------------------+
|                     LLM EXTRACTION                                |
|  - Field identification                                          |
|  - Value extraction                                              |
|  - Hallucination detection                                       |
+------------------------------------------------------------------+
                          |
                          v
+------------------------------------------------------------------+
|                     SCHEMA MAPPING                                |
|  - Target schema analysis                                        |
|  - Zero-shot field matching                                      |
|  - Confidence scoring                                            |
+------------------------------------------------------------------+
                          |
                          v
+------------------------------------------------------------------+
|                     PRE-FILLING                                   |
|  - Template detection                                            |
|  - Field population                                              |
|  - Validation                                                    |
+------------------------------------------------------------------+
                          |
                          v
                    Review & Export

Component Details

Document Ingestion Layer

# API Endpoint Structure
POST /api/v1/documents
  - Upload document for processing
  - Returns: document_id, status

GET /api/v1/documents/{id}
  - Get document status and results
  - Returns: status, extracted_fields, confidence

POST /api/v1/documents/{id}/prefill
  - Generate pre-filled document
  - Returns: filled_document_url

Supported Formats:

Format	Handler	Notes
PDF (native)	PyMuPDF	Direct text extraction
PDF (scanned)	Tesseract + PaddleOCR	OCR processing
Images	PIL + OCR	JPEG, PNG, TIFF
Excel	openpyxl	Cell-by-cell extraction
Word	python-docx	Content control extraction

OCR Engine

+------------------+
|   OCR Manager    |
+------------------+
        |
        +---> PyMuPDF (native PDFs)
        |
        +---> Tesseract (scanned, general)
        |
        +---> PaddleOCR (tables, complex layouts)
        |
        +---> Docling (document understanding)

OCR Pipeline:

Preprocessing
- Deskew correction
- Noise reduction
- Contrast enhancement
Text Extraction
- Run multiple OCR engines
- Confidence-weighted fusion
- Layout preservation
Post-processing
- Spell correction
- Language detection
- Structure analysis

LLM Engine

Model Architecture:

Base Model (7-13B parameters)
         |
         v
+------------------+
| LoRA Adapters    |  <-- Domain-specific
+------------------+
         |
         v
+------------------+
| vLLM Inference   |  <-- Optimized serving
+------------------+
         |
         v
+------------------+
| Hallucination    |  <-- Output validation
| Detector         |
+------------------+

Supported Models:

Model	Parameters	Memory	Speed
Llama 3 8B	8B	16GB	Fast
Mistral 7B	7B	14GB	Fast
Qwen2 14B	14B	28GB	Medium

Inference Configuration:

model:
  name: "meta-llama/Llama-3.1-8B"
  quantization: "int8"  # 4-bit or 8-bit
  max_length: 8192      # Context window

serving:
  engine: "vllm"
  tensor_parallel: 1
  gpu_memory_utilization: 0.9

Embedding Engine

Embedding Pipeline:

Text Input
    |
    v
+------------------+
| Tokenization     |
+------------------+
    |
    v
+------------------+
| Sentence-BERT    |  <-- Semantic embeddings
| or mE5           |      (multilingual)
+------------------+
    |
    v
+------------------+
| Vector Store     |  <-- Similarity search
| (FAISS/Milvus)   |
+------------------+

Models:

Model	Languages	Dimensions	Use Case
all-MiniLM-L6-v2	EN	384	General
multilingual-e5-large	100+	1024	Multilingual
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2	50+	384	Balanced

Schema Mapper

Mapping Algorithm:

Source Fields          Target Schema
     |                      |
     v                      v
+----------+          +----------+
| Embed    |          | Embed    |
| Fields   |          | Schema   |
+----------+          +----------+
     |                      |
     +----------+-----------+
                |
                v
        +---------------+
        | Similarity    |
        | Matrix        |
        +---------------+
                |
                v
        +---------------+
        | Hungarian     |  <-- Optimal assignment
        | Algorithm     |
        +---------------+
                |
                v
        +---------------+
        | Confidence    |
        | Filtering     |
        +---------------+
                |
                v
           Field Mapping

Form Filler

Supported Field Types:

Type	Detection	Filling
Text fields	Form field detection	Direct insertion
Checkboxes	Visual ML detection	State toggle
Radio buttons	Form field enumeration	Selection
Dropdowns	Form field + options	Value matching
Tables	Structure detection	Cell-by-cell
Date fields	Pattern recognition	Format normalization

Deployment Architecture

On-Premise Deployment

+------------------------------------------------------------------+
|                     Customer Environment                          |
+------------------------------------------------------------------+
|                                                                   |
|  +------------------+     +------------------+                    |
|  | Load Balancer   |     | Monitoring       |                    |
|  | (nginx)         |     | (Prometheus)     |                    |
|  +------------------+     +------------------+                    |
|           |                        |                              |
|           v                        v                              |
|  +------------------+     +------------------+                    |
|  | API Server      |     | Grafana          |                    |
|  | (FastAPI)       |     | Dashboard        |                    |
|  +------------------+     +------------------+                    |
|           |                                                       |
|           v                                                       |
|  +------------------+     +------------------+                    |
|  | LLM Server      |     | Embedding        |                    |
|  | (vLLM)          |     | Server           |                    |
|  +------------------+     +------------------+                    |
|           |                        |                              |
|           v                        v                              |
|  +------------------+     +------------------+                    |
|  | GPU Node        |     | Vector Store     |                    |
|  | (A100/RTX4090)  |     | (FAISS)          |                    |
|  +------------------+     +------------------+                    |
|           |                        |                              |
|           v                        v                              |
|  +------------------+     +------------------+                    |
|  | Document        |     | Result           |                    |
|  | Storage         |     | Storage          |                    |
|  +------------------+     +------------------+                    |
|                                                                   |
+------------------------------------------------------------------+

Hardware Requirements

Configuration	GPU	RAM	Storage	Throughput
Minimum	RTX 4090 24GB	64GB	500GB SSD	5 docs/hour
Recommended	A100 40GB	128GB	1TB NVMe	20 docs/hour
Enterprise	2x A100 80GB	256GB	2TB NVMe	50+ docs/hour

Docker Compose Structure

version: '3.8'
services:
  api:
    image: ai-orchestrator/api:latest
    ports:
      - "8000:8000"
    environment:
      - LLM_ENDPOINT=http://llm:8080

  llm:
    image: ai-orchestrator/llm:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  embedding:
    image: ai-orchestrator/embedding:latest
    ports:
      - "8001:8001"

  vector-store:
    image: milvusdb/milvus:latest
    ports:
      - "19530:19530"

Security Architecture

Data Protection

+------------------------------------------------------------------+
|                     Security Layers                               |
+------------------------------------------------------------------+
|                                                                   |
|  [Transport]  TLS 1.3 encryption for all connections             |
|                           |                                       |
|  [Storage]    AES-256 encryption at rest                         |
|                           |                                       |
|  [Access]     RBAC with OAuth 2.0 / OIDC                         |
|                           |                                       |
|  [Audit]      Complete audit logging                             |
|                           |                                       |
|  [Network]    Network isolation, firewall rules                  |
|                                                                   |
+------------------------------------------------------------------+

Compliance Features

Requirement	Implementation
FINMA data residency	On-premise only, Swiss hosting
FADP consent	Configurable consent workflows
GDPR rights	Data export, deletion APIs
Audit logging	Immutable audit trail
Access control	Role-based, multi-tenant

GitHub Issue: #440 - Security Audit Checklist

Integration Architecture

Wecan Comply Integration

AI Orchestrator              Wecan Comply
      |                           |
      | 1. POST /api/v1/documents |
      |-------------------------->|
      |                           |
      | 2. Webhook: processing    |
      |<--------------------------|
      |                           |
      | 3. GET /api/v1/results    |
      |-------------------------->|
      |                           |
      | 4. Results + mappings     |
      |<--------------------------|
      |                           |
      | 5. POST /api/v1/prefill   |
      |-------------------------->|
      |                           |
      | 6. Pre-filled document    |
      |<--------------------------|

API Specification

openapi: 3.0.0
info:
  title: AI Orchestrator API
  version: 1.0.0

paths:
  /documents:
    post:
      summary: Upload document for processing
      requestBody:
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                file:
                  type: string
                  format: binary
                schema_id:
                  type: string
      responses:
        '202':
          description: Processing started

  /documents/{id}/results:
    get:
      summary: Get extraction results
      responses:
        '200':
          description: Extraction complete
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExtractionResult'

GitHub Issue: #444 - API Documentation

Performance Architecture

Optimization Strategies

Layer	Strategy	Impact
OCR	Parallel page processing	3x speedup
LLM	Batched inference	2x throughput
Embedding	Caching	10x for repeated queries
Storage	SSD + caching	Low latency

Scaling

                    Load Balancer
                         |
         +---------------+---------------+
         |               |               |
    API Server 1    API Server 2    API Server 3
         |               |               |
         +---------------+---------------+
                         |
                    GPU Cluster
                         |
         +---------------+---------------+
         |               |               |
      GPU 1           GPU 2           GPU 3
    (LLM 1)         (LLM 2)         (LLM 3)

Monitoring

Metric	Tool	Alert Threshold
Latency	Prometheus	>5s per page
GPU utilization	NVIDIA DCGM	<50% (underutil)
Memory	Prometheus	>90%
Queue depth	Custom	>100 documents
Error rate	Prometheus	>1%

Technology Stack Summary

Layer	Technology	Purpose
API	FastAPI	REST API, async processing
LLM Serving	vLLM	High-performance inference
OCR	Tesseract, PaddleOCR	Text extraction
Embeddings	Sentence-BERT	Semantic matching
Vector Store	FAISS/Milvus	Similarity search
Document Processing	PyMuPDF, python-docx	Format handling
Form Filling	PyPDF2, openpyxl	Output generation
Frontend	React	Review interface
Container	Docker	Deployment
Orchestration	Docker Compose/K8s	Scaling
Monitoring	Prometheus + Grafana	Observability

Back to Home

Previous: Partners

Next: Progress