Research Papers

A curated collection of foundational and cutting-edge papers in agentic AI with summaries.

Key Papers with Summaries

ReAct: Synergizing Reasoning and Acting (Yao et al., 2023)

Core Idea: Interleave reasoning traces and actions in LLMs, allowing models to reason about tasks (Thought) and interact with external environments (Action) to gather information (Observation).

Key Contribution: Shows that combining reasoning and acting outperforms either approach alone on question answering and decision making tasks.

Why It Matters: Foundational paradigm for most modern LLM agents. The Thought-Action-Observation loop is now standard in agent frameworks.

arXiv | Week 1

Chain-of-Thought Prompting (Wei et al., 2022)

Core Idea: Adding "Let's think step by step" or providing reasoning examples dramatically improves LLM performance on complex reasoning tasks.

Key Contribution: Demonstrates emergent reasoning abilities in large models when prompted to show intermediate steps.

Why It Matters: Enables agents to break down complex problems and explain their reasoning, improving both accuracy and interpretability.

arXiv | Week 2

Toolformer (Schick et al., 2023)

Core Idea: Train LLMs to decide when and how to use external tools (calculators, search, etc.) by self-supervised learning on API calls.

Key Contribution: Shows LLMs can learn tool use without explicit supervision by generating and filtering training data from successful tool calls.

Why It Matters: Foundational work for function calling and tool-augmented LLMs used in modern APIs.

arXiv | Week 3

Reflexion (Shinn et al., 2023)

Core Idea: Agents learn from verbal self-reflection on failures, storing insights in episodic memory to avoid repeating mistakes.

Key Contribution: Introduces a framework for agents to improve through natural language feedback rather than gradient updates.

Why It Matters: Enables agents to learn from experience within a session, critical for complex multi-step tasks.

arXiv | Week 4

AutoGen (Wu et al., 2023)

Core Idea: Framework for building multi-agent systems through conversational interactions between specialized agents.

Key Contribution: Demonstrates that complex tasks can be solved by having agents with different roles collaborate through natural conversation.

Why It Matters: Pioneered the conversational multi-agent paradigm used in many modern agent frameworks.

arXiv | Week 5

Self-RAG (Asai et al., 2023)

Core Idea: Train LLMs to adaptively retrieve information and self-reflect on generated content using special tokens.

Key Contribution: Shows models can learn when to retrieve, what to retrieve, and how to critique their outputs for factuality.

Why It Matters: Improves RAG accuracy by making retrieval decisions dynamic rather than always-on.

arXiv | Week 7

GraphRAG (Edge et al., 2024)

Core Idea: Use knowledge graphs to structure document relationships, enabling community-based summarization and multi-hop reasoning.

Key Contribution: Outperforms naive RAG on global sensemaking queries that require synthesizing information across documents.

Why It Matters: Addresses RAG limitations for complex queries requiring broad understanding of a corpus.

Microsoft | Week 8

Chain-of-Verification (Dhuliawala et al., 2023)

Core Idea: Reduce hallucinations by having the model generate verification questions, answer them independently, and revise based on inconsistencies.

Key Contribution: Provides a systematic approach to fact-checking generated content without external knowledge bases.

Why It Matters: Critical technique for building trustworthy agents that can verify their own claims.

arXiv | Week 9

AgentBench (Liu et al., 2023)

Core Idea: Comprehensive benchmark for evaluating LLMs as agents across 8 distinct environments (web, DB, OS, games, etc.).

Key Contribution: First systematic evaluation framework for agent capabilities, revealing significant gaps between top models.

Why It Matters: Enables standardized comparison of agent capabilities and identifies areas for improvement.

arXiv | Week 10

Generative Agents (Park et al., 2023)

Core Idea: Simulate believable human behavior in a sandbox environment using memory, reflection, and planning architectures.

Key Contribution: Demonstrates emergent social behaviors from simple agent architectures, including information spreading and relationship formation.

Why It Matters: Opens possibilities for agent-based simulations of complex social systems.

arXiv | Week 12


Complete Paper List

Agent Architectures

Paper Authors Year Link Week
ReAct: Synergizing Reasoning and Acting Yao et al. 2023 arXiv 1
A Survey on LLM-based Autonomous Agents Wang et al. 2024 arXiv 1
The Rise of LLM-Based Agents Xi et al. 2023 arXiv 1

Reasoning and Prompting

Paper Authors Year Link Week
Chain-of-Thought Prompting Wei et al. 2022 arXiv 2
Tree of Thoughts Yao et al. 2023 arXiv 2
Self-Consistency Improves CoT Wang et al. 2023 arXiv 2

Tool Use

Paper Authors Year Link Week
Toolformer Schick et al. 2023 arXiv 3
Gorilla: LLM Connected with APIs Patil et al. 2023 arXiv 3
ToolLLM Qin et al. 2024 arXiv 3

Planning and Reflection

Paper Authors Year Link Week
Reflexion Shinn et al. 2023 arXiv 4
LATS: Language Agent Tree Search Zhou et al. 2024 arXiv 4
Plan-and-Solve Prompting Wang et al. 2023 arXiv 4

Multi-Agent Systems

Paper Authors Year Link Week
AutoGen Wu et al. 2023 arXiv 5
MetaGPT Hong et al. 2023 arXiv 5
ChatDev Qian et al. 2024 arXiv 5
Multi-Agent Collaboration Survey Tran et al. 2025 arXiv 5

Retrieval-Augmented Generation

Paper Authors Year Link Week
Self-RAG Asai et al. 2023 arXiv 7
Corrective RAG Yan et al. 2024 arXiv 7
RAPTOR Sarthi et al. 2024 arXiv 7
RAG Survey Gao et al. 2024 arXiv 7

Knowledge Graphs

Paper Authors Year Link Week
GraphRAG Edge et al. 2024 Microsoft 8
Graph of Thoughts Besta et al. 2024 arXiv 8
HippoRAG Gutierrez et al. 2024 arXiv 8

Hallucination and Safety

Paper Authors Year Link Week
Chain-of-Verification Dhuliawala et al. 2023 arXiv 9
FActScore Min et al. 2023 arXiv 9
Self-Refine Madaan et al. 2023 arXiv 9
Hallucination Survey Ji et al. 2023 arXiv 9

Evaluation and Benchmarks

Paper Authors Year Link Week
AgentBench Liu et al. 2023 arXiv 10
WebArena Zhou et al. 2024 arXiv 10
GAIA Benchmark Mialon et al. 2024 arXiv 10
SWE-bench Jimenez et al. 2024 arXiv 10

Domain Applications

Paper Authors Year Link Week
AlphaCodium Ridnik et al. 2024 arXiv 11
MDAgents Kim et al. 2024 arXiv 11
FinAgent Survey Li et al. 2024 arXiv 11

Research Frontiers

Paper Authors Year Link Week
Generative Agents Park et al. 2023 arXiv 12
Voyager Wang et al. 2023 arXiv 12
Constitutional AI Bai et al. 2022 arXiv 12

Reference Management

Zotero Collection

Import our curated paper collection directly into Zotero:

Agentic AI Course Papers

A shared Zotero collection with all course readings, organized by week.

View Zotero Collection

Join the group to sync papers to your library and add notes.

BibTeX Export

Download all citations in BibTeX format for your papers:

Coming soon - bibliography.bib file with all course papers

Reading Tips

  1. Start with abstracts - Get the main idea before deep diving
  2. Focus on methods - Understanding the approach is more valuable than memorizing results
  3. Take notes - Write summaries in your own words
  4. Discuss with peers - Different perspectives help understanding
  5. Implement key ideas - The best way to learn is by doing

Citation Format

When citing papers in your work, use the following format:

@article{yao2023react,
  title={ReAct: Synergizing Reasoning and Acting in Language Models},
  author={Yao, Shunyu and others},
  journal={arXiv preprint arXiv:2210.03629},
  year={2023}
}

Back to top