Week 11: Domain Applications

Code, finance, and healthcare agents with domain-specific constraints

Week 11 of 12

Learning Objectives

  • Define SWE-bench, code agent, FinAgent, clinical decision support
  • Explain domain-specific requirements for agent deployment
  • Implement a code agent using flow engineering
  • Compare agent architectures across different domains
  • Assess regulatory and safety requirements for each domain
  • Design a domain-specific agent with appropriate safeguards

Topics Covered

  • Domain maturity landscape (code, finance, healthcare)
  • Code agents and SWE-bench performance
  • AlphaCodium flow engineering methodology
  • FinAgent multimodal trading architecture
  • Healthcare agent regulatory constraints (FDA, HIPAA)

Resources

Jupyter Notebooks

Open Code Generation Agent in Colab Code Generation Agent

Required Readings

PaperAuthorsYearLink
SWE-bench: Real-World GitHub Issues Jimenez et al. 2024 arXiv
AlphaCodium: Flow Engineering for Code Ridnik et al. 2024 arXiv
FinAgent: Multimodal Trading Agent Li et al. 2024 arXiv

Reading Guide: Domain Applications

3-4 hours Code agents Financial agents Healthcare agents

Study of domain-specific agents for code, finance, and healthcare

Primary Paper

Devin: An Autonomous Software Engineer
Cognition AI (2024)
Technical Report Link

Secondary Papers

  • AlphaCodium: From Prompt Engineering to Flow Engineering - Ridnik, T., et al. (2024) arXiv
  • MDAgents: Adaptive Collaboration of LLMs for Medical Decision-Making - Kim, Y., et al. (2024) arXiv
  • LLM Agents for Financial Applications Survey - Li, S., et al. (2024) arXiv

Exercise: Domain Agent

100 Points 6-8 hours Expert

Build a domain-specific agent with specialized tools

Learning Objectives

  • Create: Build domain-specific agents
  • Apply: Handle regulatory constraints
  • Integrate: Integrate specialized tools

Tasks

TaskPointsDescription
Domain Analysis 25 Analyze domain requirements
Agent Implementation 45 Build specialized agent
Evaluation 30 Evaluate on domain tasks

Key Concepts

Domain Maturity Landscape:

  • Code (High): Clear success criteria (tests pass), sandboxed execution, active deployment
  • Finance (Medium): Regulatory constraints, compliance requirements, emerging deployments
  • Healthcare (Emerging): High stakes, human oversight required, FDA/HIPAA compliance

Flow Engineering: AlphaCodium’s structured multi-stage pipeline approach - break complex tasks into stages, generate and run tests iteratively.

SWE-bench Performance: Best agents solve ~50% of real GitHub issues - significant but still far from human-level.

Cross-Domain Patterns: Verification intensity should match domain risk level.

Exercise

Build a domain-specific agent for one of:

  1. Code Generation: Flow engineering approach with test-driven iteration
  2. Research Assistant: Literature search with citation verification
  3. Financial Analysis: Market data analysis with compliance guardrails
  4. Clinical Decision Support: Evidence-based recommendations with human oversight

Discussion Questions

  1. How do domain constraints affect agent architecture design?
  2. What safety measures are essential for high-stakes domains like healthcare?
  3. When should agents defer to humans vs act autonomously?
  4. How does regulatory compliance (SEC, FDA, HIPAA) constrain agent capabilities?
  5. What is the appropriate verification intensity for each domain?

Additional Resources

Discussion & Questions

Join the Conversation

Have questions about this week's material? Want to discuss concepts with fellow students?



Back to top