Lecture 8: Modern Networks and Future Directions
| Duration: ~45 minutes | Slides: 17 | Prerequisites: Lecture 7 |
Learning Objectives
After completing this lecture, you should be able to:
- Describe the evolution from MLPs to modern architectures
- Explain the basic concepts behind CNNs, RNNs, and Transformers
- Understand the advantages and limitations of each architecture
- Identify ethical considerations in financial AI
- Recognize emerging trends in deep learning for finance
- Connect course concepts to current research
Key Concepts
1. The Deep Learning Timeline
The field has evolved dramatically since the perceptron:
| Year | Milestone | Impact |
|---|---|---|
| 1943 | McCulloch-Pitts | First neural model |
| 1958 | Perceptron | First learning algorithm |
| 1969 | Minsky-Papert | XOR limitation revealed |
| 1986 | Backpropagation | Training deep networks |
| 1998 | LeNet (CNN) | Handwriting recognition |
| 2012 | AlexNet | ImageNet breakthrough |
| 2014 | GAN | Generative models |
| 2017 | Transformer | Attention is all you need |
| 2022-24 | GPT-4, Claude | Large language models |
2. Convolutional Neural Networks (CNNs)
Designed for: Grid-like data (images, spatial data)
Key innovation: Local connectivity and weight sharing
How it works:
- Convolutional layers: Small filters slide across input
- Pooling layers: Reduce spatial dimensions
- Fully connected layers: Final classification
Finance applications:
- Chart pattern recognition
- Satellite imagery analysis (counting cars in parking lots)
- Document processing (financial statements)
Advantages:
- Parameter efficient (weight sharing)
- Translation invariant (patterns recognized anywhere)
- Hierarchical feature learning
3. Recurrent Neural Networks (RNNs)
Designed for: Sequential data (time series, text)
Key innovation: Hidden state that persists across time steps
Basic RNN equation:
h_t = f(W_h * h_{t-1} + W_x * x_t + b)
Where h_t is the hidden state at time t.
Variants:
- LSTM: Long Short-Term Memory (handles long sequences)
- GRU: Gated Recurrent Unit (simpler than LSTM)
Finance applications:
- Stock price prediction
- Volatility forecasting
- Sentiment analysis of news
Limitations:
- Sequential processing (slow)
- Still struggles with very long sequences
- Difficult to parallelize
4. Transformers and Attention
Designed for: Any sequential data (text, time series)
Key innovation: Self-attention mechanism
The attention idea: “When processing one element, look at all other elements and decide which are relevant.”
Self-attention:
Attention(Q, K, V) = softmax(Q * K^T / sqrt(d_k)) * V
Where:
- Q = Query (what am I looking for?)
- K = Key (what do I contain?)
- V = Value (what do I return?)
Advantages:
- Parallelizable (unlike RNNs)
- Captures long-range dependencies
- State-of-the-art in many domains
Finance applications:
- Market regime detection
- Multi-asset modeling
- News and report analysis
- Large language models for financial analysis
5. Architecture Family Tree
Neural Networks
|
+-- Feedforward (MLP)
| |
| +-- Autoencoders
|
+-- Convolutional (CNN)
| |
| +-- ResNet, VGG, etc.
|
+-- Recurrent (RNN)
| |
| +-- LSTM
| +-- GRU
|
+-- Attention-based
|
+-- Transformer
|
+-- GPT (decoder)
+-- BERT (encoder)
+-- Claude, GPT-4 (large language models)
6. Choosing an Architecture
Guidelines for financial applications:
| Data Type | Recommended Architecture |
|---|---|
| Tabular (features) | MLP, Gradient Boosting |
| Time series (prices) | LSTM, Transformer |
| Images (charts) | CNN |
| Text (news, reports) | Transformer, BERT |
| Multiple modalities | Multi-head networks |
When in doubt:
- Start with simpler models (MLP, gradient boosting)
- Add complexity only if needed
- Ensemble methods often beat single networks
7. Ethical Considerations
Bias in financial models:
- Historical data reflects historical biases
- Credit scoring can perpetuate discrimination
- Algorithmic trading can amplify market instability
Transparency and explainability:
- Regulators increasingly require model explanations
- “Black box” models face legal challenges
- Need to balance performance and interpretability
Market manipulation concerns:
- Coordinated algorithmic behavior
- Flash crashes
- Market microstructure effects
Responsible AI principles:
- Fairness: Models should not discriminate
- Transparency: Decisions should be explainable
- Accountability: Clear ownership of model outcomes
- Privacy: Protect sensitive financial data
8. Current Trends in Financial ML
Large Language Models (LLMs):
- Analyzing earnings calls
- Processing financial news
- Generating investment summaries
- Answering financial questions
Reinforcement Learning:
- Portfolio optimization
- Order execution
- Market making
Graph Neural Networks:
- Corporate relationship modeling
- Systemic risk analysis
- Fraud detection in transaction networks
Alternative Data:
- Satellite imagery
- Social media sentiment
- Web scraping
- Credit card transactions
9. Limitations of Current AI
What neural networks struggle with:
| Challenge | Description |
|---|---|
| Reasoning | Poor at multi-step logical deduction |
| Causality | Learn correlation, not causation |
| Uncertainty | Often overconfident in predictions |
| Generalization | Struggle with distribution shift |
| Sample efficiency | Need lots of data |
Implications for finance:
- Don’t expect AI to understand market fundamentals
- Be skeptical of extreme predictions
- Always validate with domain expertise
- Prepare for model degradation
10. The Future of Neural Networks in Finance
Near-term (2-5 years):
- Better integration of alternative data
- More sophisticated ensemble methods
- Improved risk management with AI
- Regulatory frameworks developing
Medium-term (5-10 years):
- Agents that can reason about markets
- Real-time adaptive trading systems
- AI-human collaboration tools
- Standardized model validation
Long-term questions:
- Will markets become more or less efficient?
- How will AI change the role of financial professionals?
- What new risks will emerge?
Key Concepts Summary
Architecture Comparison
| Architecture | Best For | Finance Use Case |
|---|---|---|
| MLP | Tabular data | Factor models |
| CNN | Spatial patterns | Chart analysis |
| RNN/LSTM | Sequences | Time series |
| Transformer | Long sequences, text | News analysis, LLMs |
Ethical Framework
- Data: Is training data representative and unbiased?
- Model: Can decisions be explained?
- Deployment: Are risks managed?
- Monitoring: Is ongoing bias detection in place?
Finance Application: Where We Are Today
Current state of AI in finance:
| Application | Maturity | Typical Performance |
|---|---|---|
| Fraud detection | High | 95%+ accuracy |
| Credit scoring | High | Better than traditional |
| Sentiment analysis | Medium | Useful signals |
| Price prediction | Low | Marginal edge |
| Automated trading | Medium | Depends on strategy |
Key insight: AI works best where:
- Signal is strong (fraud has clear patterns)
- Data is plentiful (credit scoring)
- Competition is limited (alternative data)
AI struggles where:
- Signal is weak (short-term price prediction)
- Markets are efficient (liquid stocks)
- Regime changes are frequent (macro shifts)
Practice Questions
Conceptual Understanding
Q1: Why are Transformers better than RNNs for long sequences?
Answer
Two main reasons: 1. **Attention mechanism:** Transformers can directly attend to any position in the sequence, regardless of distance. RNNs must pass information through all intermediate time steps, losing information along the way. 2. **Parallelization:** RNNs process sequences one step at a time (sequential). Transformers can process all positions in parallel, making training much faster on modern hardware. This matters for finance when analyzing long documents (annual reports) or long price histories.Q2: When would you use a CNN vs an MLP for financial data?
Answer
Use CNN when: - Data has spatial/local structure (candlestick charts, heatmaps) - Patterns can appear anywhere in the input - Want translation invariance Use MLP when: - Data is tabular with no spatial structure - Features are hand-engineered - Simpler model is preferred For most structured financial data (company features, technical indicators), MLPs or gradient boosting often outperform CNNs because the data lacks spatial locality.Q3: Give an example of how bias in training data could lead to unfair financial outcomes.
Answer
Example: Credit scoring model trained on historical lending data. Problem: Historical data reflects past discrimination. If certain demographics were historically denied credit (even when creditworthy), the model learns these biased patterns. Result: The model may: - Reject qualified applicants from underrepresented groups - Perpetuate historical inequalities - Create legal liability under fair lending laws Solution: Careful feature selection, bias auditing, disparate impact analysis, and potentially removing protected characteristics from model inputs.Q4: Why might a reinforcement learning approach to trading be difficult to implement in practice?
Answer
Challenges: 1. **Non-stationary environment:** Markets change, so the "optimal" policy changes 2. **Sparse rewards:** Profits/losses only realized at trade close, hard to attribute to specific actions 3. **Limited exploration:** Can't afford to make random trades to explore 4. **Simulation gap:** RL agent trained in simulation faces different conditions live 5. **Sample efficiency:** RL typically needs millions of episodes; market data is limited 6. **Multi-agent:** Other traders adapt to your strategy 7. **Transaction costs:** Small RL improvements may be eaten by costs This is why most production trading systems still use supervised learning.Application
Q5: You’re tasked with building a system to analyze earnings call transcripts. What architecture would you recommend and why?
Answer
Recommended: Transformer-based model (e.g., fine-tuned BERT or FinBERT) Reasoning: 1. **Text data:** Transformers are state-of-the-art for NLP 2. **Long documents:** Attention handles long-range dependencies 3. **Pre-training available:** Can fine-tune existing financial language models 4. **Multiple tasks:** Same architecture for sentiment, topic extraction, Q&A Architecture: - Input: Tokenized transcript - Model: FinBERT or similar finance-domain model - Output: Sentiment score, key topic probabilities Alternative: For simpler applications, even bag-of-words with MLP can work surprisingly well as a baseline.Q6: How would you explain to a regulator why your credit model made a specific decision?
Answer
Explanation strategy: 1. **Feature importance:** Show which features most influenced this decision - "Income-to-debt ratio contributed +15% to approval score" 2. **SHAP values:** Quantify each feature's contribution - "Credit history: -5%, Employment: +8%, Assets: +12%" 3. **Counterfactual:** What would change the decision? - "Applicant would be approved if income increased by $5,000" 4. **Reference to similar cases:** Compare to approved/rejected neighbors - "Among 100 similar applicants, 75% were approved" 5. **Model documentation:** Provide training methodology, validation results Key: Have explainability tools integrated from the start, not added as an afterthought.Reading List
Modern Architectures
- Vaswani et al. (2017) - “Attention Is All You Need” - The Transformer paper
- Goodfellow et al., Chapter 10-12 - Deep Learning - CNN, RNN chapters
Financial Applications
- Gu, Kelly & Xiu (2020) - “Empirical Asset Pricing via Machine Learning”
- Lopez de Prado (2020) - “Machine Learning for Asset Managers”
Ethics and Fairness
- Mehrabi et al. (2021) - “A Survey on Bias and Fairness in Machine Learning”
- EU AI Act - Regulatory framework (2024)
Large Language Models in Finance
- Wu et al. (2023) - “BloombergGPT: A Large Language Model for Finance”
- Yang et al. (2023) - “FinGPT: Open-Source Financial Large Language Models”
Future Directions
- Sutton (2019) - “The Bitter Lesson” (scale is key)
- Bengio et al. (2021) - “Deep Learning for AI”
Summary
This lecture covered:
- Evolution of architectures - From perceptrons to Transformers
- CNNs - For spatial/image data
- RNNs/LSTMs - For sequential data
- Transformers - Attention-based, state-of-the-art for many tasks
- Architecture selection - Match model to data type
- Ethics - Bias, transparency, accountability
- Future directions - LLMs, RL, alternative data
Key Takeaway: Modern architectures offer powerful tools, but the fundamentals from this course (training, regularization, validation) apply universally.
Course Conclusion
You’ve now completed all 8 lectures covering:
- History and biological inspiration
- Perceptron fundamentals
- Multi-layer perceptron architecture
- Activation and loss functions
- Gradient descent and backpropagation
- Training dynamics and regularization
- Financial applications
- Modern networks and future directions
The journey continues:
- Implement models on real data
- Read primary literature
- Stay current with rapid developments
- Apply with appropriate skepticism and rigor
Good luck in your neural network journey!