Research Papers

A curated collection of foundational and cutting-edge papers in agentic AI with summaries.

Key Papers with Summaries

ReAct: Synergizing Reasoning and Acting (Yao et al., 2023)

Core Idea: Interleave reasoning traces and actions in LLMs, allowing models to reason about tasks (Thought) and interact with external environments (Action) to gather information (Observation).

Key Contribution: Shows that combining reasoning and acting outperforms either approach alone on question answering and decision making tasks.

Why It Matters: Foundational paradigm for most modern LLM agents. The Thought-Action-Observation loop is now standard in agent frameworks.

arXiv | Week 1

Chain-of-Thought Prompting (Wei et al., 2022)

Core Idea: Adding "Let's think step by step" or providing reasoning examples dramatically improves LLM performance on complex reasoning tasks.

Key Contribution: Demonstrates emergent reasoning abilities in large models when prompted to show intermediate steps.

Why It Matters: Enables agents to break down complex problems and explain their reasoning, improving both accuracy and interpretability.

arXiv | Week 2

Toolformer (Schick et al., 2023)

Core Idea: Train LLMs to decide when and how to use external tools (calculators, search, etc.) by self-supervised learning on API calls.

Key Contribution: Shows LLMs can learn tool use without explicit supervision by generating and filtering training data from successful tool calls.

Why It Matters: Foundational work for function calling and tool-augmented LLMs used in modern APIs.

arXiv | Week 3

Reflexion (Shinn et al., 2023)

Core Idea: Agents learn from verbal self-reflection on failures, storing insights in episodic memory to avoid repeating mistakes.

Key Contribution: Introduces a framework for agents to improve through natural language feedback rather than gradient updates.

Why It Matters: Enables agents to learn from experience within a session, critical for complex multi-step tasks.

arXiv | Week 4

AutoGen (Wu et al., 2023)

Core Idea: Framework for building multi-agent systems through conversational interactions between specialized agents.

Key Contribution: Demonstrates that complex tasks can be solved by having agents with different roles collaborate through natural conversation.

Why It Matters: Pioneered the conversational multi-agent paradigm used in many modern agent frameworks.

arXiv | Week 5

Self-RAG (Asai et al., 2023)

Core Idea: Train LLMs to adaptively retrieve information and self-reflect on generated content using special tokens.

Key Contribution: Shows models can learn when to retrieve, what to retrieve, and how to critique their outputs for factuality.

Why It Matters: Improves RAG accuracy by making retrieval decisions dynamic rather than always-on.

arXiv | Week 7

GraphRAG (Edge et al., 2024)

Core Idea: Use knowledge graphs to structure document relationships, enabling community-based summarization and multi-hop reasoning.

Key Contribution: Outperforms naive RAG on global sensemaking queries that require synthesizing information across documents.

Why It Matters: Addresses RAG limitations for complex queries requiring broad understanding of a corpus.

Microsoft | Week 8

Chain-of-Verification (Dhuliawala et al., 2023)

Core Idea: Reduce hallucinations by having the model generate verification questions, answer them independently, and revise based on inconsistencies.

Key Contribution: Provides a systematic approach to fact-checking generated content without external knowledge bases.

Why It Matters: Critical technique for building trustworthy agents that can verify their own claims.

arXiv | Week 9

AgentBench (Liu et al., 2023)

Core Idea: Comprehensive benchmark for evaluating LLMs as agents across 8 distinct environments (web, DB, OS, games, etc.).

Key Contribution: First systematic evaluation framework for agent capabilities, revealing significant gaps between top models.

Why It Matters: Enables standardized comparison of agent capabilities and identifies areas for improvement.

arXiv | Week 10

Generative Agents (Park et al., 2023)

Core Idea: Simulate believable human behavior in a sandbox environment using memory, reflection, and planning architectures.

Key Contribution: Demonstrates emergent social behaviors from simple agent architectures, including information spreading and relationship formation.

Why It Matters: Opens possibilities for agent-based simulations of complex social systems.

arXiv | Week 12

Complete Paper List

Agent Architectures

Paper	Authors	Year	Link	Week
ReAct: Synergizing Reasoning and Acting	Yao et al.	2023	arXiv	1
A Survey on LLM-based Autonomous Agents	Wang et al.	2024	arXiv	1
The Rise of LLM-Based Agents	Xi et al.	2023	arXiv	1

Reasoning and Prompting

Paper	Authors	Year	Link	Week
Chain-of-Thought Prompting	Wei et al.	2022	arXiv	2
Tree of Thoughts	Yao et al.	2023	arXiv	2
Self-Consistency Improves CoT	Wang et al.	2023	arXiv	2

Tool Use

Paper	Authors	Year	Link	Week
Toolformer	Schick et al.	2023	arXiv	3
Gorilla: LLM Connected with APIs	Patil et al.	2023	arXiv	3
ToolLLM	Qin et al.	2024	arXiv	3

Planning and Reflection

Paper	Authors	Year	Link	Week
Reflexion	Shinn et al.	2023	arXiv	4
LATS: Language Agent Tree Search	Zhou et al.	2024	arXiv	4
Plan-and-Solve Prompting	Wang et al.	2023	arXiv	4

Multi-Agent Systems

Paper	Authors	Year	Link	Week
AutoGen	Wu et al.	2023	arXiv	5
MetaGPT	Hong et al.	2023	arXiv	5
ChatDev	Qian et al.	2024	arXiv	5
Multi-Agent Collaboration Survey	Tran et al.	2025	arXiv	5

Retrieval-Augmented Generation

Paper	Authors	Year	Link	Week
Self-RAG	Asai et al.	2023	arXiv	7
Corrective RAG	Yan et al.	2024	arXiv	7
RAPTOR	Sarthi et al.	2024	arXiv	7
RAG Survey	Gao et al.	2024	arXiv	7

Knowledge Graphs

Paper	Authors	Year	Link	Week
GraphRAG	Edge et al.	2024	Microsoft	8
Graph of Thoughts	Besta et al.	2024	arXiv	8
HippoRAG	Gutierrez et al.	2024	arXiv	8

Hallucination and Safety

Paper	Authors	Year	Link	Week
Chain-of-Verification	Dhuliawala et al.	2023	arXiv	9
FActScore	Min et al.	2023	arXiv	9
Self-Refine	Madaan et al.	2023	arXiv	9
Hallucination Survey	Ji et al.	2023	arXiv	9

Evaluation and Benchmarks

Paper	Authors	Year	Link	Week
AgentBench	Liu et al.	2023	arXiv	10
WebArena	Zhou et al.	2024	arXiv	10
GAIA Benchmark	Mialon et al.	2024	arXiv	10
SWE-bench	Jimenez et al.	2024	arXiv	10

Domain Applications

Paper	Authors	Year	Link	Week
AlphaCodium	Ridnik et al.	2024	arXiv	11
MDAgents	Kim et al.	2024	arXiv	11
FinAgent Survey	Li et al.	2024	arXiv	11

Research Frontiers

Paper	Authors	Year	Link	Week
Generative Agents	Park et al.	2023	arXiv	12
Voyager	Wang et al.	2023	arXiv	12
Constitutional AI	Bai et al.	2022	arXiv	12

Reference Management

Zotero Collection

Import our curated paper collection directly into Zotero:

Agentic AI Course Papers

A shared Zotero collection with all course readings, organized by week.

View Zotero Collection

Join the group to sync papers to your library and add notes.

BibTeX Export

Download all citations in BibTeX format for your papers:

Coming soon - bibliography.bib file with all course papers

Reading Tips

Start with abstracts - Get the main idea before deep diving
Focus on methods - Understanding the approach is more valuable than memorizing results
Take notes - Write summaries in your own words
Discuss with peers - Different perspectives help understanding
Implement key ideas - The best way to learn is by doing

Citation Format

When citing papers in your work, use the following format:

@article{yao2023react,
  title={ReAct: Synergizing Reasoning and Acting in Language Models},
  author={Yao, Shunyu and others},
  journal={arXiv preprint arXiv:2210.03629},
  year={2023}
}