AI Orchestrator

A Specialized and Secure AI Orchestrator for Swiss Financial Compliance

View the Project on GitHub Digital-AI-Finance/wecan-innosuisse-ai-draft

WP5: Intelligent Document Pre-Filling

Home > Work Packages > WP5


Overview

Attribute Value
Duration M14-M24
FHGR Hours 600h
Wecan Hours 800h
Total Hours 1,400h
Lead Wecan Tech Lead

Objectives

  1. Develop complete document pre-filling system for PDF and Excel
  2. Handle variable document structures (text, checkboxes, tables, dropdowns)
  3. Achieve >90% accuracy on 300+ forms
  4. Deploy at 3-5 Swiss financial institutions
  5. Create open benchmark dataset for reproducibility

Technical Approach

Pre-Filling Pipeline

Mapped Fields            Document Template
     |                          |
     v                          v
+----------+              +----------+
| Field    |              | Template |
| Values   |              | Analysis |
+----------+              +----------+
     |                          |
     +------------+-------------+
                  |
                  v
         +----------------+
         | Field Location |
         | Detection      |
         +----------------+
                  |
                  v
         +----------------+
         | Value          |
         | Insertion      |
         +----------------+
                  |
                  v
         +----------------+
         | Validation     |
         | & Formatting   |
         +----------------+
                  |
                  v
           Pre-Filled Document

Supported Field Types

Type Detection Method Filling Method
Text fields PDF form fields, bounding boxes Direct insertion
Checkboxes Visual detection State toggle
Dropdowns Form field enumeration Value selection
Tables Structure detection Cell-by-cell
Date fields Format detection Normalized insertion
Signatures Placeholder detection Image placement

Activities

M14-M16: Foundation

Activity Owner Output
Design pre-filling architecture Wecan Architecture doc
Implement PDF form detection Wecan PDF module
Implement Excel cell detection Wecan Excel module
Integrate with WP4 mappings Both Integration layer
Build review interface Wecan Review UI

M17-M20: Development

Activity Owner Output
Handle variable structures Wecan Adaptive filling
Implement validation rules Wecan Validation engine
Build confidence display Wecan UI component
Validate on 150 forms FHGR Validation results
Deliver complete system Wecan D5.1

M21-M24: Validation & Release

Activity Owner Output
Create open benchmark dataset FHGR D5.2
Validate on 300+ forms Both D5.3
Deploy at pilot institutions Wecan Production systems
Document results FHGR Final reports

Deliverables

ID Deliverable Due Owner Status
D5.1 Complete document pre-filling system M20 Wecan Complete
D5.2 Open benchmark dataset M22 FHGR Complete
D5.3 Accuracy validation (300+ forms) M24 FHGR Complete

All deliverable templates complete. See deliverables/ for detailed templates.


Document Format Support

PDF Documents

Feature Support Level Notes
AcroForm fields Full Native form filling
XFA forms Partial Legacy format
Flat PDFs Full OCR field detection
Scanned PDFs Full Visual field detection
Multi-page Full Page navigation

Excel Documents

Feature Support Level Notes
Cell values Full Direct cell access
Named ranges Full Semantic mapping
Data validation Full Dropdown lists
Formulas Preserve Don’t overwrite
Protected sheets Partial Unprotect if possible

Word Documents

Feature Support Level Notes
Content controls Full Native form fields
Form fields Full Legacy fields
Tables Full Cell-by-cell
Plain text Full Placeholder replacement

Review Interface

User Workflow

1. Upload Source Document(s)
          |
          v
2. View Extracted Fields
   [Field 1: Value] [Confidence: 95%]
   [Field 2: Value] [Confidence: 72%] <-- Highlighted
          |
          v
3. Review & Correct
   - Edit low-confidence values
   - Confirm high-confidence values
          |
          v
4. Select Target Template
          |
          v
5. Preview Pre-Filled Document
          |
          v
6. Export / Save to CRM

Confidence Display

Confidence Display Action
>90% Green Auto-accept
70-90% Yellow Review suggested
<70% Red Manual entry required

Objective Alignment

Objective WP5 Contribution
OBJ1: 90% Document Accuracy Final system accuracy
OBJ4: < 2 Hours Processing End-to-end time
OBJ5: 3-5 Institution Deployments Primary owner
OBJ6: TRL 7 Production deployment

GitHub Issues:


Benchmark Dataset (D5.2)

Dataset Composition

Category Documents Fields Notes
KYC Forms 50 ~2,500 Anonymized
Regulatory Filings 40 ~2,000 Anonymized
Compliance Questionnaires 30 ~1,500 Anonymized
Annual Reports 30 ~1,000 Anonymized
Total 150 ~7,000  

Anonymization Process

  1. Remove personal identifiable information (PII)
  2. Replace company names with synthetic names
  3. Generalize dates (year only)
  4. Randomize numeric values within realistic ranges
  5. Preserve document structure and field relationships

Release Format


Pilot Deployment

Institution Pipeline

Phase Institutions Duration Activities
Alpha 1 (Wecan internal) M16-M18 Internal testing
Beta 2 (design partners) M18-M20 Limited production
Production 3-5 (customers) M20-M24 Full deployment

Onboarding Process

  1. Schema Integration (Week 1-2)
    • Collect CRM schema
    • Configure field mappings
    • Test with sample documents
  2. System Setup (Week 2-3)
    • Deploy on-premise or connect to hosted
    • Configure authentication
    • Set up monitoring
  3. Training (Week 3-4)
    • User training sessions
    • Admin configuration
    • Documentation handover
  4. Go-Live (Week 4+)
    • Supervised production use
    • Feedback collection
    • Iterative improvement

Success Metrics per Institution

Metric Target
Documents processed 50+
Accuracy >90%
Time savings >80%
User satisfaction >4/5

Milestone Checkpoints

MS4 (M16)

MS5 (M20)

Project End (M24)


Integration Points

From WP4

Input Description Timeline
Field mappings Source -> Target mappings M16
CRM integration API connection M16
Confidence scores Per-field certainty M16

External Systems

System Integration Purpose
Wecan Comply Primary CRM platform
Partner CRMs API Field mapping
Document storage S3/Azure Input/output

Quality Assurance

Testing Levels

Level Scope Frequency
Unit tests Individual components Every commit
Integration tests Component interaction Daily
End-to-end tests Full pipeline Weekly
User acceptance Real users Monthly

Performance Targets

Metric Target Measurement
Form filling time <30 seconds Automated timing
UI responsiveness <1 second User perception
Batch processing 10+ docs/minute Throughput test
Availability 99.5% Uptime monitoring

Back to Work Packages Previous: WP4