Week 8: GraphRAG and Knowledge

From vector search to knowledge graphs for structured retrieval

Week 8 of 12

Learning Objectives

  • Define knowledge graphs, entities, relations, and communities
  • Explain how GraphRAG enhances retrieval with structure
  • Build a knowledge graph from unstructured text using LLMs
  • Compare vector-only vs graph-enhanced retrieval strategies
  • Assess when GraphRAG provides value over standard RAG
  • Design a hybrid retrieval system combining vectors and graphs

Topics Covered

  • Limitations of vector-only RAG
  • Knowledge graph fundamentals (entities, relations, communities)
  • GraphRAG architecture and community detection
  • Query routing (local vs global search)
  • Entity extraction with LLMs

Resources

Jupyter Notebooks

Open GraphRAG Implementation in Colab GraphRAG Implementation

Required Readings

PaperAuthorsYearLink
From Local to Global: A GraphRAG Approach Edge et al. 2024 arXiv
Unifying LLMs and Knowledge Graphs: A Roadmap Pan et al. 2024 arXiv
Graph of Thoughts Besta et al. 2024 arXiv

Reading Guide: GraphRAG and Knowledge Graphs

3-4 hours GraphRAG Knowledge graphs Entity extraction

Exploration of graph-based retrieval and knowledge graph construction

Primary Paper

GraphRAG: Unlocking LLM Discovery on Narrative Private Data
Edge, D., Trinh, H., Cheng, N., et al. (2024)
Microsoft Research Link

Secondary Papers

  • HippoRAG: Neurobiologically Inspired Long-Term Memory - Gutierrez, B. J., et al. (2024) arXiv
  • Graph of Thoughts - Besta, M., et al. (2024) arXiv

Exercise: GraphRAG

100 Points 6-8 hours Advanced

Build knowledge graphs and graph-based retrieval

Learning Objectives

  • Create: Build knowledge graphs from text
  • Apply: Implement graph-based retrieval
  • Analyze: Compare graph vs vector retrieval

Tasks

TaskPointsDescription
Entity Extraction 30 Extract entities and relationships
Graph Construction 35 Build and index knowledge graph
Query Implementation 35 Implement local and global queries

Key Concepts

Knowledge Graphs: Entity-relationship structures that make implicit relationships explicit and queryable. Represented as triples: (subject, predicate, object).

GraphRAG: Microsoft’s approach combining knowledge graphs with retrieval-augmented generation. Enables both local (entity-specific) and global (thematic) queries.

Community Detection: Using the Leiden algorithm to cluster related entities, enabling hierarchical summarization for global queries.

Query Routing: Intelligently matching query types (local vs global) to optimal retrieval strategies.

Exercise

Build a domain knowledge graph system:

  1. Extract entities and relationships from documents using an LLM
  2. Construct a queryable knowledge graph with Neo4j or NetworkX
  3. Implement community detection with Leiden algorithm
  4. Build both local search (entity lookup) and global search (community summaries)
  5. Compare performance with vector-based retrieval on multi-hop queries

Discussion Questions

  1. When is graph-based retrieval better than vector search? What query patterns benefit most?
  2. How do you handle entity disambiguation when the same entity appears with different names?
  3. What are the scaling challenges of knowledge graphs for large corpora?
  4. How does the upfront indexing cost of GraphRAG compare to the query benefits?
  5. When would a hybrid vector + graph approach be preferred over pure GraphRAG?

Additional Resources

Discussion & Questions

Join the Conversation

Have questions about this week's material? Want to discuss concepts with fellow students?



Back to top