Lecture 8: Modern Networks and Future Directions

Duration: ~45 minutes Slides: 17 Prerequisites: Lecture 7

Learning Objectives

After completing this lecture, you should be able to:

  1. Describe the evolution from MLPs to modern architectures
  2. Explain the basic concepts behind CNNs, RNNs, and Transformers
  3. Understand the advantages and limitations of each architecture
  4. Identify ethical considerations in financial AI
  5. Recognize emerging trends in deep learning for finance
  6. Connect course concepts to current research

Key Concepts

1. The Deep Learning Timeline

The field has evolved dramatically since the perceptron:

Year Milestone Impact
1943 McCulloch-Pitts First neural model
1958 Perceptron First learning algorithm
1969 Minsky-Papert XOR limitation revealed
1986 Backpropagation Training deep networks
1998 LeNet (CNN) Handwriting recognition
2012 AlexNet ImageNet breakthrough
2014 GAN Generative models
2017 Transformer Attention is all you need
2022-24 GPT-4, Claude Large language models
Timeline Click chart to view Python source code

2. Convolutional Neural Networks (CNNs)

Designed for: Grid-like data (images, spatial data)

Key innovation: Local connectivity and weight sharing

How it works:

  • Convolutional layers: Small filters slide across input
  • Pooling layers: Reduce spatial dimensions
  • Fully connected layers: Final classification

Finance applications:

  • Chart pattern recognition
  • Satellite imagery analysis (counting cars in parking lots)
  • Document processing (financial statements)

Advantages:

  • Parameter efficient (weight sharing)
  • Translation invariant (patterns recognized anywhere)
  • Hierarchical feature learning

3. Recurrent Neural Networks (RNNs)

Designed for: Sequential data (time series, text)

Key innovation: Hidden state that persists across time steps

Basic RNN equation:

h_t = f(W_h * h_{t-1} + W_x * x_t + b)

Where h_t is the hidden state at time t.

Variants:

  • LSTM: Long Short-Term Memory (handles long sequences)
  • GRU: Gated Recurrent Unit (simpler than LSTM)

Finance applications:

  • Stock price prediction
  • Volatility forecasting
  • Sentiment analysis of news

Limitations:

  • Sequential processing (slow)
  • Still struggles with very long sequences
  • Difficult to parallelize

4. Transformers and Attention

Designed for: Any sequential data (text, time series)

Key innovation: Self-attention mechanism

The attention idea: “When processing one element, look at all other elements and decide which are relevant.”

Self-attention:

Attention(Q, K, V) = softmax(Q * K^T / sqrt(d_k)) * V

Where:

  • Q = Query (what am I looking for?)
  • K = Key (what do I contain?)
  • V = Value (what do I return?)

Advantages:

  • Parallelizable (unlike RNNs)
  • Captures long-range dependencies
  • State-of-the-art in many domains

Finance applications:

  • Market regime detection
  • Multi-asset modeling
  • News and report analysis
  • Large language models for financial analysis

5. Architecture Family Tree

Neural Networks
    |
    +-- Feedforward (MLP)
    |       |
    |       +-- Autoencoders
    |
    +-- Convolutional (CNN)
    |       |
    |       +-- ResNet, VGG, etc.
    |
    +-- Recurrent (RNN)
    |       |
    |       +-- LSTM
    |       +-- GRU
    |
    +-- Attention-based
            |
            +-- Transformer
                    |
                    +-- GPT (decoder)
                    +-- BERT (encoder)
                    +-- Claude, GPT-4 (large language models)
Architecture Family Tree Click chart to view Python source code

6. Choosing an Architecture

Guidelines for financial applications:

Data Type Recommended Architecture
Tabular (features) MLP, Gradient Boosting
Time series (prices) LSTM, Transformer
Images (charts) CNN
Text (news, reports) Transformer, BERT
Multiple modalities Multi-head networks

When in doubt:

  • Start with simpler models (MLP, gradient boosting)
  • Add complexity only if needed
  • Ensemble methods often beat single networks

7. Ethical Considerations

Bias in financial models:

  • Historical data reflects historical biases
  • Credit scoring can perpetuate discrimination
  • Algorithmic trading can amplify market instability

Transparency and explainability:

  • Regulators increasingly require model explanations
  • “Black box” models face legal challenges
  • Need to balance performance and interpretability

Market manipulation concerns:

  • Coordinated algorithmic behavior
  • Flash crashes
  • Market microstructure effects

Responsible AI principles:

  1. Fairness: Models should not discriminate
  2. Transparency: Decisions should be explainable
  3. Accountability: Clear ownership of model outcomes
  4. Privacy: Protect sensitive financial data

Large Language Models (LLMs):

  • Analyzing earnings calls
  • Processing financial news
  • Generating investment summaries
  • Answering financial questions

Reinforcement Learning:

  • Portfolio optimization
  • Order execution
  • Market making

Graph Neural Networks:

  • Corporate relationship modeling
  • Systemic risk analysis
  • Fraud detection in transaction networks

Alternative Data:

  • Satellite imagery
  • Social media sentiment
  • Web scraping
  • Credit card transactions

9. Limitations of Current AI

What neural networks struggle with:

Challenge Description
Reasoning Poor at multi-step logical deduction
Causality Learn correlation, not causation
Uncertainty Often overconfident in predictions
Generalization Struggle with distribution shift
Sample efficiency Need lots of data

Implications for finance:

  • Don’t expect AI to understand market fundamentals
  • Be skeptical of extreme predictions
  • Always validate with domain expertise
  • Prepare for model degradation

10. The Future of Neural Networks in Finance

Near-term (2-5 years):

  • Better integration of alternative data
  • More sophisticated ensemble methods
  • Improved risk management with AI
  • Regulatory frameworks developing

Medium-term (5-10 years):

  • Agents that can reason about markets
  • Real-time adaptive trading systems
  • AI-human collaboration tools
  • Standardized model validation

Long-term questions:

  • Will markets become more or less efficient?
  • How will AI change the role of financial professionals?
  • What new risks will emerge?

Key Concepts Summary

Architecture Comparison

Architecture Best For Finance Use Case
MLP Tabular data Factor models
CNN Spatial patterns Chart analysis
RNN/LSTM Sequences Time series
Transformer Long sequences, text News analysis, LLMs

Ethical Framework

  1. Data: Is training data representative and unbiased?
  2. Model: Can decisions be explained?
  3. Deployment: Are risks managed?
  4. Monitoring: Is ongoing bias detection in place?

Finance Application: Where We Are Today

Current state of AI in finance:

Application Maturity Typical Performance
Fraud detection High 95%+ accuracy
Credit scoring High Better than traditional
Sentiment analysis Medium Useful signals
Price prediction Low Marginal edge
Automated trading Medium Depends on strategy

Key insight: AI works best where:

  • Signal is strong (fraud has clear patterns)
  • Data is plentiful (credit scoring)
  • Competition is limited (alternative data)

AI struggles where:

  • Signal is weak (short-term price prediction)
  • Markets are efficient (liquid stocks)
  • Regime changes are frequent (macro shifts)

Practice Questions

Conceptual Understanding

Q1: Why are Transformers better than RNNs for long sequences?

Answer Two main reasons: 1. **Attention mechanism:** Transformers can directly attend to any position in the sequence, regardless of distance. RNNs must pass information through all intermediate time steps, losing information along the way. 2. **Parallelization:** RNNs process sequences one step at a time (sequential). Transformers can process all positions in parallel, making training much faster on modern hardware. This matters for finance when analyzing long documents (annual reports) or long price histories.

Q2: When would you use a CNN vs an MLP for financial data?

Answer Use CNN when: - Data has spatial/local structure (candlestick charts, heatmaps) - Patterns can appear anywhere in the input - Want translation invariance Use MLP when: - Data is tabular with no spatial structure - Features are hand-engineered - Simpler model is preferred For most structured financial data (company features, technical indicators), MLPs or gradient boosting often outperform CNNs because the data lacks spatial locality.

Q3: Give an example of how bias in training data could lead to unfair financial outcomes.

Answer Example: Credit scoring model trained on historical lending data. Problem: Historical data reflects past discrimination. If certain demographics were historically denied credit (even when creditworthy), the model learns these biased patterns. Result: The model may: - Reject qualified applicants from underrepresented groups - Perpetuate historical inequalities - Create legal liability under fair lending laws Solution: Careful feature selection, bias auditing, disparate impact analysis, and potentially removing protected characteristics from model inputs.

Q4: Why might a reinforcement learning approach to trading be difficult to implement in practice?

Answer Challenges: 1. **Non-stationary environment:** Markets change, so the "optimal" policy changes 2. **Sparse rewards:** Profits/losses only realized at trade close, hard to attribute to specific actions 3. **Limited exploration:** Can't afford to make random trades to explore 4. **Simulation gap:** RL agent trained in simulation faces different conditions live 5. **Sample efficiency:** RL typically needs millions of episodes; market data is limited 6. **Multi-agent:** Other traders adapt to your strategy 7. **Transaction costs:** Small RL improvements may be eaten by costs This is why most production trading systems still use supervised learning.

Application

Q5: You’re tasked with building a system to analyze earnings call transcripts. What architecture would you recommend and why?

Answer Recommended: Transformer-based model (e.g., fine-tuned BERT or FinBERT) Reasoning: 1. **Text data:** Transformers are state-of-the-art for NLP 2. **Long documents:** Attention handles long-range dependencies 3. **Pre-training available:** Can fine-tune existing financial language models 4. **Multiple tasks:** Same architecture for sentiment, topic extraction, Q&A Architecture: - Input: Tokenized transcript - Model: FinBERT or similar finance-domain model - Output: Sentiment score, key topic probabilities Alternative: For simpler applications, even bag-of-words with MLP can work surprisingly well as a baseline.

Q6: How would you explain to a regulator why your credit model made a specific decision?

Answer Explanation strategy: 1. **Feature importance:** Show which features most influenced this decision - "Income-to-debt ratio contributed +15% to approval score" 2. **SHAP values:** Quantify each feature's contribution - "Credit history: -5%, Employment: +8%, Assets: +12%" 3. **Counterfactual:** What would change the decision? - "Applicant would be approved if income increased by $5,000" 4. **Reference to similar cases:** Compare to approved/rejected neighbors - "Among 100 similar applicants, 75% were approved" 5. **Model documentation:** Provide training methodology, validation results Key: Have explainability tools integrated from the start, not added as an afterthought.

Reading List

Modern Architectures

  • Vaswani et al. (2017) - “Attention Is All You Need” - The Transformer paper
  • Goodfellow et al., Chapter 10-12 - Deep Learning - CNN, RNN chapters

Financial Applications

  • Gu, Kelly & Xiu (2020) - “Empirical Asset Pricing via Machine Learning”
  • Lopez de Prado (2020) - “Machine Learning for Asset Managers”

Ethics and Fairness

  • Mehrabi et al. (2021) - “A Survey on Bias and Fairness in Machine Learning”
  • EU AI Act - Regulatory framework (2024)

Large Language Models in Finance

  • Wu et al. (2023) - “BloombergGPT: A Large Language Model for Finance”
  • Yang et al. (2023) - “FinGPT: Open-Source Financial Large Language Models”

Future Directions

  • Sutton (2019) - “The Bitter Lesson” (scale is key)
  • Bengio et al. (2021) - “Deep Learning for AI”

Summary

This lecture covered:

  1. Evolution of architectures - From perceptrons to Transformers
  2. CNNs - For spatial/image data
  3. RNNs/LSTMs - For sequential data
  4. Transformers - Attention-based, state-of-the-art for many tasks
  5. Architecture selection - Match model to data type
  6. Ethics - Bias, transparency, accountability
  7. Future directions - LLMs, RL, alternative data

Key Takeaway: Modern architectures offer powerful tools, but the fundamentals from this course (training, regularization, validation) apply universally.


Course Conclusion

You’ve now completed all 8 lectures covering:

  1. History and biological inspiration
  2. Perceptron fundamentals
  3. Multi-layer perceptron architecture
  4. Activation and loss functions
  5. Gradient descent and backpropagation
  6. Training dynamics and regularization
  7. Financial applications
  8. Modern networks and future directions

The journey continues:

  • Implement models on real data
  • Read primary literature
  • Stay current with rapid developments
  • Apply with appropriate skepticism and rigor

Good luck in your neural network journey!