Lecture 8: Modern Networks and Future Directions

Duration: ~45 minutes

Slides: 17

Prerequisites: Lecture 7

Learning Objectives

After completing this lecture, you should be able to:

Describe the evolution from MLPs to modern architectures
Explain the basic concepts behind CNNs, RNNs, and Transformers
Understand the advantages and limitations of each architecture
Identify ethical considerations in financial AI
Recognize emerging trends in deep learning for finance
Connect course concepts to current research

Key Concepts

1. The Deep Learning Timeline

The field has evolved dramatically since the perceptron:

Year	Milestone	Impact
1943	McCulloch-Pitts	First neural model
1958	Perceptron	First learning algorithm
1969	Minsky-Papert	XOR limitation revealed
1986	Backpropagation	Training deep networks
1998	LeNet (CNN)	Handwriting recognition
2012	AlexNet	ImageNet breakthrough
2014	GAN	Generative models
2017	Transformer	Attention is all you need
2022-24	GPT-4, Claude	Large language models

Click chart to view Python source code

2. Convolutional Neural Networks (CNNs)

Designed for: Grid-like data (images, spatial data)

Key innovation: Local connectivity and weight sharing

How it works:

Convolutional layers: Small filters slide across input
Pooling layers: Reduce spatial dimensions
Fully connected layers: Final classification

Finance applications:

Chart pattern recognition
Satellite imagery analysis (counting cars in parking lots)
Document processing (financial statements)

Advantages:

Parameter efficient (weight sharing)
Translation invariant (patterns recognized anywhere)
Hierarchical feature learning

3. Recurrent Neural Networks (RNNs)

Designed for: Sequential data (time series, text)

Key innovation: Hidden state that persists across time steps

Basic RNN equation:

h_t = f(W_h * h_{t-1} + W_x * x_t + b)

Where h_t is the hidden state at time t.

Variants:

LSTM: Long Short-Term Memory (handles long sequences)
GRU: Gated Recurrent Unit (simpler than LSTM)

Finance applications:

Stock price prediction
Volatility forecasting
Sentiment analysis of news

Limitations:

Sequential processing (slow)
Still struggles with very long sequences
Difficult to parallelize

4. Transformers and Attention

Designed for: Any sequential data (text, time series)

Key innovation: Self-attention mechanism

The attention idea: “When processing one element, look at all other elements and decide which are relevant.”

Self-attention:

Attention(Q, K, V) = softmax(Q * K^T / sqrt(d_k)) * V

Where:

Q = Query (what am I looking for?)
K = Key (what do I contain?)
V = Value (what do I return?)

Advantages:

Parallelizable (unlike RNNs)
Captures long-range dependencies
State-of-the-art in many domains

Finance applications:

Market regime detection
Multi-asset modeling
News and report analysis
Large language models for financial analysis

5. Architecture Family Tree

Neural Networks
    |
    +-- Feedforward (MLP)
    |       |
    |       +-- Autoencoders
    |
    +-- Convolutional (CNN)
    |       |
    |       +-- ResNet, VGG, etc.
    |
    +-- Recurrent (RNN)
    |       |
    |       +-- LSTM
    |       +-- GRU
    |
    +-- Attention-based
            |
            +-- Transformer
                    |
                    +-- GPT (decoder)
                    +-- BERT (encoder)
                    +-- Claude, GPT-4 (large language models)

Click chart to view Python source code

6. Choosing an Architecture

Guidelines for financial applications:

Data Type	Recommended Architecture
Tabular (features)	MLP, Gradient Boosting
Time series (prices)	LSTM, Transformer
Images (charts)	CNN
Text (news, reports)	Transformer, BERT
Multiple modalities	Multi-head networks

When in doubt:

Start with simpler models (MLP, gradient boosting)
Add complexity only if needed
Ensemble methods often beat single networks

7. Ethical Considerations

Bias in financial models:

Historical data reflects historical biases
Credit scoring can perpetuate discrimination
Algorithmic trading can amplify market instability

Transparency and explainability:

Regulators increasingly require model explanations
“Black box” models face legal challenges
Need to balance performance and interpretability

Market manipulation concerns:

Coordinated algorithmic behavior
Flash crashes
Market microstructure effects

Responsible AI principles:

Fairness: Models should not discriminate
Transparency: Decisions should be explainable
Accountability: Clear ownership of model outcomes
Privacy: Protect sensitive financial data

8. Current Trends in Financial ML

Large Language Models (LLMs):

Analyzing earnings calls
Processing financial news
Generating investment summaries
Answering financial questions

Reinforcement Learning:

Portfolio optimization
Order execution
Market making

Graph Neural Networks:

Corporate relationship modeling
Systemic risk analysis
Fraud detection in transaction networks

Alternative Data:

Satellite imagery
Social media sentiment
Web scraping
Credit card transactions

9. Limitations of Current AI

What neural networks struggle with:

Challenge	Description
Reasoning	Poor at multi-step logical deduction
Causality	Learn correlation, not causation
Uncertainty	Often overconfident in predictions
Generalization	Struggle with distribution shift
Sample efficiency	Need lots of data

Implications for finance:

Don’t expect AI to understand market fundamentals
Be skeptical of extreme predictions
Always validate with domain expertise
Prepare for model degradation

10. The Future of Neural Networks in Finance

Near-term (2-5 years):

Better integration of alternative data
More sophisticated ensemble methods
Improved risk management with AI
Regulatory frameworks developing

Medium-term (5-10 years):

Agents that can reason about markets
Real-time adaptive trading systems
AI-human collaboration tools
Standardized model validation

Long-term questions:

Will markets become more or less efficient?
How will AI change the role of financial professionals?
What new risks will emerge?

Key Concepts Summary

Architecture Comparison

Architecture	Best For	Finance Use Case
MLP	Tabular data	Factor models
CNN	Spatial patterns	Chart analysis
RNN/LSTM	Sequences	Time series
Transformer	Long sequences, text	News analysis, LLMs

Ethical Framework

Data: Is training data representative and unbiased?
Model: Can decisions be explained?
Deployment: Are risks managed?
Monitoring: Is ongoing bias detection in place?

Finance Application: Where We Are Today

Current state of AI in finance:

Application	Maturity	Typical Performance
Fraud detection	High	95%+ accuracy
Credit scoring	High	Better than traditional
Sentiment analysis	Medium	Useful signals
Price prediction	Low	Marginal edge
Automated trading	Medium	Depends on strategy

Key insight: AI works best where:

Signal is strong (fraud has clear patterns)
Data is plentiful (credit scoring)
Competition is limited (alternative data)

AI struggles where:

Signal is weak (short-term price prediction)
Markets are efficient (liquid stocks)
Regime changes are frequent (macro shifts)

Practice Questions

Conceptual Understanding

Q1: Why are Transformers better than RNNs for long sequences?

Answer

Two main reasons: 1. **Attention mechanism:** Transformers can directly attend to any position in the sequence, regardless of distance. RNNs must pass information through all intermediate time steps, losing information along the way. 2. **Parallelization:** RNNs process sequences one step at a time (sequential). Transformers can process all positions in parallel, making training much faster on modern hardware. This matters for finance when analyzing long documents (annual reports) or long price histories.

Q2: When would you use a CNN vs an MLP for financial data?

Answer

Use CNN when: - Data has spatial/local structure (candlestick charts, heatmaps) - Patterns can appear anywhere in the input - Want translation invariance Use MLP when: - Data is tabular with no spatial structure - Features are hand-engineered - Simpler model is preferred For most structured financial data (company features, technical indicators), MLPs or gradient boosting often outperform CNNs because the data lacks spatial locality.

Q3: Give an example of how bias in training data could lead to unfair financial outcomes.

Answer

Example: Credit scoring model trained on historical lending data. Problem: Historical data reflects past discrimination. If certain demographics were historically denied credit (even when creditworthy), the model learns these biased patterns. Result: The model may: - Reject qualified applicants from underrepresented groups - Perpetuate historical inequalities - Create legal liability under fair lending laws Solution: Careful feature selection, bias auditing, disparate impact analysis, and potentially removing protected characteristics from model inputs.

Q4: Why might a reinforcement learning approach to trading be difficult to implement in practice?

Answer

Challenges: 1. **Non-stationary environment:** Markets change, so the "optimal" policy changes 2. **Sparse rewards:** Profits/losses only realized at trade close, hard to attribute to specific actions 3. **Limited exploration:** Can't afford to make random trades to explore 4. **Simulation gap:** RL agent trained in simulation faces different conditions live 5. **Sample efficiency:** RL typically needs millions of episodes; market data is limited 6. **Multi-agent:** Other traders adapt to your strategy 7. **Transaction costs:** Small RL improvements may be eaten by costs This is why most production trading systems still use supervised learning.

Application

Q5: You’re tasked with building a system to analyze earnings call transcripts. What architecture would you recommend and why?

Answer

Recommended: Transformer-based model (e.g., fine-tuned BERT or FinBERT) Reasoning: 1. **Text data:** Transformers are state-of-the-art for NLP 2. **Long documents:** Attention handles long-range dependencies 3. **Pre-training available:** Can fine-tune existing financial language models 4. **Multiple tasks:** Same architecture for sentiment, topic extraction, Q&A Architecture: - Input: Tokenized transcript - Model: FinBERT or similar finance-domain model - Output: Sentiment score, key topic probabilities Alternative: For simpler applications, even bag-of-words with MLP can work surprisingly well as a baseline.

Q6: How would you explain to a regulator why your credit model made a specific decision?

Answer

Explanation strategy: 1. **Feature importance:** Show which features most influenced this decision - "Income-to-debt ratio contributed +15% to approval score" 2. **SHAP values:** Quantify each feature's contribution - "Credit history: -5%, Employment: +8%, Assets: +12%" 3. **Counterfactual:** What would change the decision? - "Applicant would be approved if income increased by $5,000" 4. **Reference to similar cases:** Compare to approved/rejected neighbors - "Among 100 similar applicants, 75% were approved" 5. **Model documentation:** Provide training methodology, validation results Key: Have explainability tools integrated from the start, not added as an afterthought.

Reading List

Modern Architectures

Vaswani et al. (2017) - “Attention Is All You Need” - The Transformer paper
Goodfellow et al., Chapter 10-12 - Deep Learning - CNN, RNN chapters

Financial Applications

Gu, Kelly & Xiu (2020) - “Empirical Asset Pricing via Machine Learning”
Lopez de Prado (2020) - “Machine Learning for Asset Managers”

Ethics and Fairness

Mehrabi et al. (2021) - “A Survey on Bias and Fairness in Machine Learning”
EU AI Act - Regulatory framework (2024)

Large Language Models in Finance

Wu et al. (2023) - “BloombergGPT: A Large Language Model for Finance”
Yang et al. (2023) - “FinGPT: Open-Source Financial Large Language Models”

Future Directions

Sutton (2019) - “The Bitter Lesson” (scale is key)
Bengio et al. (2021) - “Deep Learning for AI”

Summary

This lecture covered:

Evolution of architectures - From perceptrons to Transformers
CNNs - For spatial/image data
RNNs/LSTMs - For sequential data
Transformers - Attention-based, state-of-the-art for many tasks
Architecture selection - Match model to data type
Ethics - Bias, transparency, accountability
Future directions - LLMs, RL, alternative data

Key Takeaway: Modern architectures offer powerful tools, but the fundamentals from this course (training, regularization, validation) apply universally.

Course Conclusion

You’ve now completed all 8 lectures covering:

History and biological inspiration
Perceptron fundamentals
Multi-layer perceptron architecture
Activation and loss functions
Gradient descent and backpropagation
Training dynamics and regularization
Financial applications
Modern networks and future directions

The journey continues:

Implement models on real data
Read primary literature
Stay current with rapid developments
Apply with appropriate skepticism and rigor

Good luck in your neural network journey!

Previous: Finance Home Next