Skip to main content

Systematic Literature Reviews with AI

Overview
#

Systematic reviews constitute a critical foundation for evidence-based decision-making across disciplines. However, the labor-intensive nature of traditional SLRs - requiring weeks to months of manual work - has driven significant interest in AI-assisted automation.

Key Statistics
#

MetricValue
Workload reduction with AI screening40-95%
otto-SR: Cochrane reviews completed12 reviews in 2 days (vs ~12 work-years manually)
GPT-4 PICO extraction accuracy>85% median

AI Tools for Systematic Reviews
#

Open Source Tools
#

ToolDescriptionFeaturesLink
ASReviewActive learning for systematic reviewsOpen-source, 95% workload reduction, Python-basedVisit GitHub
RobotReviewerML system for RCT assessmentFree, web-based, bias assessmentVisit
ColandrOpen-source screening toolFree, collaborativeVisit
FAST2Active learning screeningOpen source, Python 3Visit GitHub

Commercial & Freemium Tools
#

ToolDescriptionPricingLink
RayyanAI-powered review managementFree tier availableVisit
ElicitAI research assistantFree: basic / Pro: $42/moVisit
CovidenceCochrane-recommended toolFree for Cochrane reviewsVisit
DistillerSREnterprise review softwareSubscription-basedVisit
Laser AILiving systematic reviewsCommercialVisit
otto-SREnd-to-end LLM workflowWeb platformVisit
EPPI-ReviewerComprehensive review toolSubscriptionVisit

Specialized LLM Applications
#

Tool/MethodApplicationModel
Systematic Review Extractor ProData extractionCustom GPT
otto-SR Screening AgentAbstract/full-text screeningGPT-4.1
otto-SR Extraction AgentData extractiono3-mini-high

Key Research Papers
#

Foundational Papers
#

2021 foundational

ASReview Framework

van de Schoot, R. et al.

An open source machine learning framework for efficient and transparent systematic reviews

Nature Machine Intelligence 3 , 125-133

2016 foundational

Rayyan Original Paper

Ouzzani, M. et al.

Rayyan - a web and mobile app for systematic reviews

Systematic Reviews 5 , 210

Recent LLM Research (2024-2025)
#

2025 llm

otto-SR: Automation of Systematic Reviews with LLMs

otto-SR Team

Demonstrated 96.7% sensitivity, 97.9% specificity in screening

medRxiv

2025 llm

LLMs for Systematic Reviews: Scoping Review

Various

Large language models for conducting systematic reviews: on the rise, but not yet ready for use

Journal of Clinical Epidemiology

Preview PDF
Download PDF
2024 llm

GPT-4 Evaluation for SLR

Khraisha, Q. et al.

Can large language models replace humans in systematic reviews?

Research Synthesis Methods

2025 llm

LLM-Assisted SLR System

Various

Enhancing systematic literature reviews with generative AI

JAMIA 32 , 616

Methodology & Guidelines
#

2023 methodology

PRISMA-AI Guidelines

Various

PRISMA AI reporting guidelines for systematic reviews and meta-analyses on AI in healthcare

Nature Medicine

2019 methodology

Practical Guide to ML in Research Synthesis

Various

Toward systematic review automation: a practical guide

Systematic Reviews


Methodological Guidelines
#

PRISMA-AI Framework
#

The PRISMA-AI extension provides standardized reporting for AI-related systematic reviews:

  • Search strategy documentation
  • Quality assessment with AI-specific criteria
  • Transparent result reporting
  • Technical reproducibility requirements

LLM Integration Guidelines
#

When integrating LLMs into systematic reviews:

1. Screening Phase
#

  • Use zero-shot or few-shot classification
  • Define clear inclusion/exclusion criteria in prompts
  • Maintain human oversight for borderline cases

2. Data Extraction
#

  • Use structured prompts (RISEN framework)
  • Validate extracted data against source documents
  • Document prompt versions for reproducibility

3. Quality Assurance
#

  • Dual verification (AI + human) recommended
  • Report sensitivity and specificity metrics
  • Document AI model versions and parameters

Performance Benchmarks
#

Screening Accuracy
#

Tool/MethodSensitivitySpecificityNotes
otto-SR96.7%97.9%GPT-4.1 based
Human dual review81.7%98.1%Traditional approach
Rayyan AI97-99%19-58%At <2.5 threshold
ASReviewVariableVariableDepends on dataset

Data Extraction
#

ModelPrecisionRecallNotes
GPT-based (pooled)83.0%86.0%Mean across studies
BERT-basedLowerLowerCompared to GPT
otto-SR extraction93.1% accuracy-o3-mini-high

Time Savings
#

StageTraditionalAI-AssistedReduction
Screening8-12 weeks2-3 weeks~75%
Data extraction10-16 weeks3-5 weeks~70%
Per-paper extraction36 min27 sec + 13 min review~60%

Additional Resources
#

Getting Started
#

For Beginners:

  1. Start with Rayyan - Free tier, user-friendly interface
  2. Try ASReview - Open source, well-documented
  3. Read the PRISMA guidelines - Understand methodological requirements

For Advanced Users:

  1. Explore otto-SR - State-of-the-art LLM automation
  2. Build custom GPT extractors - Use RISEN framework
  3. Combine tools - ASReview for screening + ChatGPT for extraction

Key GitHub Repositories
#

Library Guides
#

Python Quick Start
#

# Install ASReview
pip install asreview

# Basic usage
from asreview import ASReviewProject

# See documentation: https://asreview.readthedocs.io/


(c) Joerg Osterrieder 2025