Systematic Literature Reviews with AI
Overview#
Systematic reviews constitute a critical foundation for evidence-based decision-making across disciplines. However, the labor-intensive nature of traditional SLRs - requiring weeks to months of manual work - has driven significant interest in AI-assisted automation.
Key Statistics#
| Metric | Value |
|---|---|
| Workload reduction with AI screening | 40-95% |
| otto-SR: Cochrane reviews completed | 12 reviews in 2 days (vs ~12 work-years manually) |
| GPT-4 PICO extraction accuracy | >85% median |
AI Tools for Systematic Reviews#
Open Source Tools#
| Tool | Description | Features | Link |
|---|---|---|---|
| ASReview | Active learning for systematic reviews | Open-source, 95% workload reduction, Python-based | Visit GitHub |
| RobotReviewer | ML system for RCT assessment | Free, web-based, bias assessment | Visit |
| Colandr | Open-source screening tool | Free, collaborative | Visit |
| FAST2 | Active learning screening | Open source, Python 3 | Visit GitHub |
Commercial & Freemium Tools#
| Tool | Description | Pricing | Link |
|---|---|---|---|
| Rayyan | AI-powered review management | Free tier available | Visit |
| Elicit | AI research assistant | Free: basic / Pro: $42/mo | Visit |
| Covidence | Cochrane-recommended tool | Free for Cochrane reviews | Visit |
| DistillerSR | Enterprise review software | Subscription-based | Visit |
| Laser AI | Living systematic reviews | Commercial | Visit |
| otto-SR | End-to-end LLM workflow | Web platform | Visit |
| EPPI-Reviewer | Comprehensive review tool | Subscription | Visit |
Specialized LLM Applications#
| Tool/Method | Application | Model |
|---|---|---|
| Systematic Review Extractor Pro | Data extraction | Custom GPT |
| otto-SR Screening Agent | Abstract/full-text screening | GPT-4.1 |
| otto-SR Extraction Agent | Data extraction | o3-mini-high |
Key Research Papers#
Foundational Papers#
ASReview Framework
An open source machine learning framework for efficient and transparent systematic reviews
Nature Machine Intelligence 3 , 125-133
Rayyan Original Paper
Rayyan - a web and mobile app for systematic reviews
Systematic Reviews 5 , 210
Recent LLM Research (2024-2025)#
otto-SR: Automation of Systematic Reviews with LLMs
Demonstrated 96.7% sensitivity, 97.9% specificity in screening
medRxiv
LLMs for Systematic Reviews: Scoping Review
Large language models for conducting systematic reviews: on the rise, but not yet ready for use
Journal of Clinical Epidemiology
Preview PDF
GPT-4 Evaluation for SLR
Can large language models replace humans in systematic reviews?
Research Synthesis Methods
LLM-Assisted SLR System
Enhancing systematic literature reviews with generative AI
JAMIA 32 , 616
Methodology & Guidelines#
PRISMA-AI Guidelines
PRISMA AI reporting guidelines for systematic reviews and meta-analyses on AI in healthcare
Nature Medicine
Practical Guide to ML in Research Synthesis
Toward systematic review automation: a practical guide
Systematic Reviews
Methodological Guidelines#
PRISMA-AI Framework#
The PRISMA-AI extension provides standardized reporting for AI-related systematic reviews:
- Search strategy documentation
- Quality assessment with AI-specific criteria
- Transparent result reporting
- Technical reproducibility requirements
LLM Integration Guidelines#
When integrating LLMs into systematic reviews:
1. Screening Phase#
- Use zero-shot or few-shot classification
- Define clear inclusion/exclusion criteria in prompts
- Maintain human oversight for borderline cases
2. Data Extraction#
- Use structured prompts (RISEN framework)
- Validate extracted data against source documents
- Document prompt versions for reproducibility
3. Quality Assurance#
- Dual verification (AI + human) recommended
- Report sensitivity and specificity metrics
- Document AI model versions and parameters
Performance Benchmarks#
Screening Accuracy#
| Tool/Method | Sensitivity | Specificity | Notes |
|---|---|---|---|
| otto-SR | 96.7% | 97.9% | GPT-4.1 based |
| Human dual review | 81.7% | 98.1% | Traditional approach |
| Rayyan AI | 97-99% | 19-58% | At <2.5 threshold |
| ASReview | Variable | Variable | Depends on dataset |
Data Extraction#
| Model | Precision | Recall | Notes |
|---|---|---|---|
| GPT-based (pooled) | 83.0% | 86.0% | Mean across studies |
| BERT-based | Lower | Lower | Compared to GPT |
| otto-SR extraction | 93.1% accuracy | - | o3-mini-high |
Time Savings#
| Stage | Traditional | AI-Assisted | Reduction |
|---|---|---|---|
| Screening | 8-12 weeks | 2-3 weeks | ~75% |
| Data extraction | 10-16 weeks | 3-5 weeks | ~70% |
| Per-paper extraction | 36 min | 27 sec + 13 min review | ~60% |
Additional Resources#
Getting Started#
For Beginners:
- Start with Rayyan - Free tier, user-friendly interface
- Try ASReview - Open source, well-documented
- Read the PRISMA guidelines - Understand methodological requirements
For Advanced Users:
- Explore otto-SR - State-of-the-art LLM automation
- Build custom GPT extractors - Use RISEN framework
- Combine tools - ASReview for screening + ChatGPT for extraction
Key GitHub Repositories#
- asreview/asreview - Active learning for systematic reviews
- asreview/synergy-dataset - ML dataset for study selection
- systematic-reviews topic - GitHub topic for SLR tools
Library Guides#
- King’s College London - AI in Evidence Synthesis
- Purdue University - AI Tools for Systematic Review
- Harvard Library - Systematic Reviews Software
- Lancaster University - Systematic Reviews Tools
Python Quick Start#
# Install ASReview
pip install asreview
# Basic usage
from asreview import ASReviewProject
# See documentation: https://asreview.readthedocs.io/
(c) Joerg Osterrieder 2025