AI Orchestrator

A Specialized and Secure AI Orchestrator for Swiss Financial Compliance

View the Project on GitHub Digital-AI-Finance/wecan-innosuisse-ai-draft

WP4: Multi-Source Information Fusion

Home > Work Packages > WP4


Overview

Attribute Value
Duration M10-M21
FHGR Hours 1,100h
Wecan Hours 500h
Total Hours 1,600h
Lead FHGR Research Lead + Wecan Tech Lead

Objectives

  1. Develop intelligent field mapping between extracted data and CRM schemas
  2. Achieve zero-shot schema mapping with F1 > 85%
  3. Integrate with Wecan Comply platform
  4. Validate on 300+ field matching cases
  5. Support 50+ enterprise CRM schemas without manual configuration

Technical Approach

Zero-Shot Schema Mapping

Extracted Fields              Target CRM Schema
     |                              |
     v                              v
+----------+                  +----------+
| Field    |                  | Schema   |
| Embeddings|                 | Embeddings|
+----------+                  +----------+
     |                              |
     +------------+---------------+
                  |
                  v
         +----------------+
         | Similarity     |
         | Matching       |
         +----------------+
                  |
                  v
         +----------------+
         | Confidence     |
         | Scoring        |
         +----------------+
                  |
                  v
           Field Mapping

Multi-Source Fusion

Source 1: Scanned PDF        Source 2: Email        Source 3: Database
       |                           |                       |
       v                           v                       v
+------------+              +------------+           +------------+
| Extracted  |              | Extracted  |           | Retrieved  |
| Fields     |              | Fields     |           | Records    |
+------------+              +------------+           +------------+
       |                           |                       |
       +-------------+-------------+                       |
                     |                                     |
                     v                                     |
              +-------------+                              |
              | Conflict    |<-----------------------------+
              | Resolution  |
              +-------------+
                     |
                     v
              Fused Record

Activities

M10-M12: Foundation

Activity Owner Output
Design schema mapping algorithm FHGR Algorithm spec
Create schema embedding pipeline FHGR Embedding module
Collect 20 sample CRM schemas Wecan Schema samples
Demonstrate field matching on 50 forms FHGR Demo results
Document initial accuracy metrics FHGR Metrics report

M13-M16: Development

Activity Owner Output
Implement zero-shot mapping FHGR Mapping system
Integrate with Wecan Comply API Wecan API integration
Collect 30 additional schemas (total: 50) Wecan Schema library
Build confidence scoring FHGR Scoring module
Validate on 150 cases FHGR Validation results
Deliver CRM pre-filling system Both D4.1

M17-M21: Validation

Activity Owner Output
Extend validation to 300 cases FHGR D4.2
Optimize mapping performance FHGR Optimized system
Document scientific results FHGR D4.3

Deliverables

ID Deliverable Due Owner Status
D4.1 Intelligent CRM form pre-filling system M16 Wecan Complete
D4.2 Field matching validation (300 cases) M20 FHGR Complete
D4.3 Scientific validation report M21 FHGR Complete

All deliverable templates complete. See deliverables/ for detailed templates.


Schema Mapping Approach

Field Matching Algorithm

  1. Schema Analysis
    • Parse target CRM schema structure
    • Identify field types (text, number, date, boolean)
    • Extract field descriptions and constraints
  2. Semantic Embedding
    • Embed extracted field names and values
    • Embed target field names and descriptions
    • Use multilingual embeddings (E5, mE5)
  3. Similarity Matching
    • Cosine similarity between embeddings
    • Type-aware matching (numeric -> numeric)
    • Context-aware disambiguation
  4. Confidence Scoring
    • Score based on similarity + type match
    • Flag low-confidence mappings for review
    • Learn from corrections

Supported CRM Types

Category Systems Schema Complexity
Banking Core banking, private banking High (500+ fields)
Insurance Life, property, liability High (400+ fields)
Trustees Wealth management Medium (200+ fields)
Asset Mgmt Fund admin, portfolio Medium (300+ fields)
Fintech Crypto, payments Variable

CRM Integration Scope

Target CRM Systems

# CRM System Vendor Integration Level Schema Count Status
1 Wecan Comply WeCanGroup Full (read/write) 15+ schemas Primary
2 Salesforce Financial Services Cloud Salesforce Read + limited write 10+ schemas Optional
3 Microsoft Dynamics 365 Microsoft Read-only 8+ schemas Optional
4 SAP CRM (Banking) SAP API adapter 5+ schemas POC only
5 Custom Banking Platform Per-client Custom adapter Variable Pilot only

Integration Level Definitions

Level Description Capabilities
Full Complete bidirectional integration Read schemas, write fields, sync updates, audit trail
Read + Limited Write Query and selective updates Read schemas, write core fields only
Read-only Query-only access Read schemas, export mappings, no write
API Adapter Custom integration layer Client-specific adapter development
Custom Per-deployment configuration Full customization per institution

Wecan Comply (Primary Integration)

Priority: Required - Core platform integration

Capability Support Notes
Schema discovery Yes Dynamic schema enumeration
Field mapping Yes Automatic field matching
Pre-fill generation Yes PDF/Excel output
Real-time sync Yes Webhook notifications
Audit trail Yes 7-year retention
Multi-tenant Yes Per-institution isolation

Salesforce FSC (Optional)

Priority: High - Major enterprise platform

Capability Support Notes
Schema discovery Yes Via Metadata API
Field mapping Yes Standard + custom objects
Pre-fill generation Limited Export to Salesforce records
Real-time sync Partial Scheduled sync preferred
Audit trail Via Salesforce Field History Tracking

Microsoft Dynamics 365 (Optional)

Priority: Medium - Enterprise market coverage

Capability Support Notes
Schema discovery Yes Via Web API
Field mapping Yes Entities and attributes
Pre-fill generation Limited Dataverse record creation
Real-time sync No Batch export only
Audit trail Via Dynamics Audit History

Schema Distribution

CRM System KYC Financial Regulatory Total
Wecan Comply 6 5 4 15
Salesforce FSC 4 4 2 10
Dynamics 365 3 3 2 8
SAP CRM 2 2 1 5
Custom Variable Variable Variable ~12
Total 15+ 14+ 9+ 50+

API Authentication

CRM System Auth Method Token Refresh
Wecan Comply OAuth 2.0 + API Key 1 hour
Salesforce OAuth 2.0 (JWT Bearer) 2 hours
Dynamics 365 OAuth 2.0 (Client Credentials) 1 hour
SAP CRM OAuth 2.0 + X.509 12 hours
Custom Per-client Per-client

Sandbox/Test Environments

CRM System Sandbox Available Data Notes
Wecan Comply Yes Synthetic Full feature parity
Salesforce Yes Synthetic Developer/Sandbox orgs
Dynamics 365 Yes Synthetic Trial/Sandbox instances
SAP CRM Limited Synthetic Requires client provision
Custom Per-client Per-client Client responsibility

Wecan Comply Integration

API Contract

Endpoint Method Description
/schemas GET List available CRM schemas
/schemas/{id} GET Get schema definition
/mappings POST Submit field mapping
/mappings/{id} GET Get mapping results
/prefill POST Generate pre-filled document

Integration Architecture

AI Orchestrator              Wecan Comply
      |                           |
      |   1. Get Schema           |
      |-------------------------->|
      |<--------------------------|
      |   Schema Definition       |
      |                           |
      |   2. Submit Mapping       |
      |-------------------------->|
      |<--------------------------|
      |   Mapping ID              |
      |                           |
      |   3. Get Results          |
      |-------------------------->|
      |<--------------------------|
      |   Mapped Fields           |

GitHub Issue: #446 - Wecan Comply API Specification


Objective Alignment

Objective WP4 Contribution
OBJ2: Zero-Shot Schema Mapping (F1 > 85%) Primary owner
OBJ1: 90% Document Accuracy Field mapping accuracy
OBJ5: 3-5 Institution Deployments CRM integration

GitHub Issues:


Validation Methodology

Test Set Design

Category Schemas Cases per Schema Total Cases
Banking 15 5 75
Insurance 12 5 60
Trustees 10 5 50
Asset Mgmt 8 5 40
Other 5 5 25
Total 50 5 250
+ Edge cases - - 50
Grand Total - - 300

Metrics

Metric Definition Target
Precision Correct / Predicted > 85%
Recall Correct / Actual > 85%
F1 Score Harmonic mean > 85%
Coverage Fields mapped / Total > 95%

Milestone Checkpoints

MS3 (M12)

MS4 (M16)

MS5 (M20)


Integration Points

From WP2/WP3

Input Description Timeline
Extracted fields Structured data from documents M12
Confidence scores Per-field certainty M12
Domain-adapted models Fine-tuned embeddings M12

To WP5

Output Description Timeline
Field mappings Source -> Target mappings M16
CRM integration API connection M16
Validation results Accuracy metrics M20

Conflict Resolution

Multi-Source Scenarios

Scenario Resolution
Same value from multiple sources Use highest confidence
Different values from sources Flag for review
Missing in one source Use available source
Conflicting types Prefer structured source

Review Workflow

  1. High confidence (>90%): Auto-accept
  2. Medium confidence (70-90%): Highlight for review
  3. Low confidence (<70%): Require manual decision

Back to Work Packages Previous: WP3 Next: WP5