Peer Review9 min read•Updated Apr 20, 2026

AI Peer Review in 2026: The ICLR Problem Isn't Going Away

A Pangram Labs study found 21% of ICLR reviews were fully AI-generated: not AI-assisted, fully written by an LLM. In 2026, the structural incentives driving this are stronger, not weaker. Here's the problem and what researchers can do about it.

By Erik Jia•December 23, 2025

Founder, Manusights

Author context

Founder of Manusights. Writes on the pre-submission review landscape — what services actually deliver, how they compare, and where each one fits in a realistic manuscript workflow.

Readiness scan

Find out if this manuscript is ready to submit.

Run the Free Readiness Scan before you submit. Catch the issues editors reject on first read.

Check my manuscriptAnthropic Privacy Partner. Zero-retention manuscript processing.See sample report

Question	What to do
Use this page for	Getting the structure, tone, and decision logic right before you send anything out.
Most important move	Make the reviewer-facing or editor-facing ask obvious early rather than burying it in prose.
Common mistake	Turning a practical page into a long explanation instead of a working template or checklist.
Next step	Use the page as a tool, then adjust it to the exact manuscript and journal situation.

Quick answer: AI peer review 2026 is no longer a fringe problem. More reviewers are using LLMs while policy guardrails still rely heavily on disclosure and human responsibility. The most defensible public estimate still comes from Pangram Labs' ICLR analysis, which flagged roughly 21% of reviews as fully AI-generated. For authors, the practical lesson is simple: use AI for structure and language help, but do not treat AI-only review as a substitute for human scientific judgment.

Researchers analyzed all 70,000 reviews submitted to ICLR 2025 and found that roughly 21% were fully AI-generated. Not polished with AI. Not outlined with AI. Fully written by an LLM, start to finish, for one of the most important machine learning conferences in the world. The analysis was conducted by Pangram Labs, which specializes in detecting AI-generated content.

That number came out in early 2025. In 2026, the structural conditions that produced it are worse, not better, and current conference and journal policies still put most of the burden on the human reviewer to disclose and take responsibility.

In our pre-submission review work

In our pre-submission review work, this trend matters because authors increasingly assume the formal peer-review system will catch the strategic scientific issues they missed. We do not think that assumption is strong enough anymore. When official review quality becomes less reliable, the cost of arriving at submission with unresolved scope-fit mismatch, citation-gap exposure, or figure-trust erosion goes up.

Our review of the current public policy landscape also points in the same direction: conferences and journals are trying to restrict or channel AI use, but the actual incentives still push reviewers toward AI assistance. That is why pre-submission expert judgment matters more, not less.

AI-assisted vs AI-generated review

Review mode	What it can do	What it gets wrong
Human review	Field-specific judgment, accountability, methodological nuance	Slow, uneven availability
AI-assisted human review	Faster drafting and clearer prose when a human still makes the judgment	Disclosure and enforcement remain messy
AI-generated review	Generic structure, plausible summaries, cheap scale	Weak accountability, shallow field judgment, hallucinated specifics

Why 21% Is Just the Start

The ICLR number was shocking because the scale was visible. Conference peer review is more auditable than journal peer review: you can analyze 70,000 reviews from a single venue in one study. Journals are fragmented across thousands of publications, each with its own editorial system. Nobody has done the equivalent analysis for journal reviews, but there's no reason to think journals are cleaner.

GPTZero found over 100 hallucinated citations in papers already accepted at NeurIPS: fake references invented by an LLM and passed by reviewers who apparently didn't check. These aren't edge cases. They're symptoms of a system under pressure it wasn't built to handle.

In 2025, global scientific output crossed 5 million papers per year. The pool of qualified expert reviewers hasn't grown at anything close to that rate. Editors are writing to the same researchers repeatedly. Reviewer fatigue is real and widely documented. When a researcher gets their 40th review request of the year, the temptation to paste the abstract into ChatGPT and clean up the output is obvious: and in most cases, undetectable.

What Makes AI Reviews Fail

The problem isn't that AI can't summarize a paper. It can. The problem is that it can't do the things peer review actually exists to do.

It can't catch what it doesn't know to look for. A reviewer who has spent five years doing single-cell RNA-seq knows that certain clustering algorithms oversplit populations under specific conditions. That knowledge doesn't live in papers: it lives in researchers' hands and heads. An LLM reviewing your scRNA-seq paper doesn't know this. It will produce a competent-sounding summary and miss the methodological issue that anyone in the field would flag.

It can't tell you whether the interpretation is actually right. Your statistics might be technically correct and your conclusions still wrong. A human reviewer who's done the same experiments in a different system can say: "I've seen this artifact before. Here's the simpler explanation you haven't considered." An AI reviewer says: "The authors may wish to consider whether alternative explanations exist." That's not the same thing.

It can't hold itself accountable. When a reviewer publishes in the same field they review, they have skin in the game. Getting something wrong has reputational consequences. An AI has no reputation to protect and no accountability to the scientific community it's supposedly serving.

The result is reviews that look like peer review: structured, formatted, grammatically correct: but function like a summary of the abstract with generic concerns appended. They're the peer review equivalent of a participation trophy.

What policy responses look like in 2026

Surface	What is publicly visible
ICLR 2026	Reviewer guidance says reviewers remain responsible for any LLM-generated content under their name
Nature Portfolio	Editors have publicly emphasized that reviewers should ask the editor before using AI to assist with review writing
PLOS / similar journal guidance	Public reviewer guidance remains focused on human accountability, disclosure, and review quality rather than blanket technical enforcement

The Accountability Gap

Here's what makes the structural problem sticky: there's no mechanism to fix it.

Editors can't pay reviewers (or can't pay enough to matter). They can't compel participation. They can't easily verify whether a review was written by a human. Detection tools like GPTZero and Turnitin's AI detector are imperfect and are getting outpaced by increasingly fluent LLMs.

Some journals have added AI disclosure requirements. A handful have banned AI-generated reviews explicitly. These are good policies. They're also largely unenforceable: a researcher who uses ChatGPT and doesn't disclose it faces no meaningful consequence unless caught, and being caught is rare.

The incentive structure hasn't changed. The volume of submissions hasn't dropped. The number of qualified human reviewers hasn't grown. In 2026, those conditions are more pronounced than they were when the ICLR data first surfaced.

What This Means for Your Paper

If your manuscript goes out to peer review, there's a meaningful chance that at least one of your reviewers will use AI to generate some or all of their evaluation. At high-volume journals, the probability is higher. At conferences in AI and computer science specifically, it's highest.

This has a practical implication: the official peer review your paper receives may not give you the substantive, expert-level feedback that would actually strengthen the work. You might get generic criticism that doesn't engage with your specific methods, hallucinated references in reviewer comments, or structural feedback that misses the real issues in your design.

There is one place in the publication pipeline where you can still be confident the feedback is human: before you submit.

Pre-submission review isn't just about improving your acceptance odds. In a system where official peer review is increasingly unreliable, getting real expert feedback on your manuscript before it enters that system is the only way to be sure the feedback reflects genuine scientific judgment.

Manusights reviewers are human scientists who've published in your field and reviewed for the journals you're targeting. Every review is written by a person, not generated by an LLM. For a full breakdown of how to spot AI-generated reviews and protect your work in this environment, see AI-Generated Peer Reviews: How Common Are They and What Researchers Can Do.

How journals are responding

The major journals have adopted different stances. Nature Portfolio journals require reviewers to disclose AI use and prohibit AI-generated review text. PLOS journals ask reviewers to confirm that reviews were not generated by AI. eLife's editorial model (consulting with reviewers before full review) provides some structural protection: the consulted review phase involves direct editor-reviewer dialogue.

Many journals now use AI-detection tools on submitted reviews. These tools are imperfect: they flag probabilistic outputs, not certainties: but they create accountability pressure. Reviewers who use AI extensively now risk being removed from reviewer pools.

The practical effect: highly specialized technical reviews that show no familiarity with the specific methodological details of your paper are more common as a red flag than before. If a review reads like it could apply to any paper in your field, rather than specifically yours, that's the signal.

What to do if you suspect an AI-generated review

You have options, and most journals have formal processes:

Flag specific concerns to the editor. Not accusations: observations. "Reviewer 2's comments appear to lack engagement with the specific experimental details in Figure 3, and several points seem to address a different study design than the one we submitted." Concrete and professional.

Request editorial assessment. Ask the editor whether the review meets their standards for specificity and engagement. Framing it as a quality question, not an AI accusation, is more effective.

Write a thorough response anyway. Even if you suspect AI-generated review, a complete point-by-point response is the right move. It documents your engagement with the concerns and gives the editor a basis for accepting over a weak review.

Don't accuse directly in the response letter. Accusations of AI use in a formal response rarely go well. Flag concerns through editor communication, separately from the response letter.

What AI-assisted review actually looks like

There's an important distinction between AI-generated reviews and AI-assisted reviews. AI-assisted means a researcher read your paper, formed their own judgments, and used AI to help draft the text. AI-generated means the AI did the reading and evaluation too.

AI-assisted reviews are increasingly common and increasingly permitted. Most journals allow AI assistance in drafting text as long as the scientific evaluation is the reviewer's own.

AI-generated reviews are what journals are trying to prevent. They typically show:

Generic praise or concern that doesn't reference specific figures, tables, or data points
Balanced language that hedges every point without committing to a position
Unusual uniformity of sentence structure across the review
Suggestions that contradict specific content in the manuscript

What this means for how you write your paper

If AI-reviewed papers are more common, papers that are resistant to superficial review fare better. Practically:

Make your key claims figure-specific. "Our main finding is in Figure 3B" forces reviewers to engage with the actual data.

Reference your own results in the discussion. Link claims to specific data points by figure number. This is good practice regardless, but it makes your paper harder to review superficially.

Anticipate the reviewers who won't read carefully. Your methods and statistics should be defensible without requiring the reviewer to read the entire paper.

Submit If / Think Twice If

Submit if:

you want a realistic 2026 framing of how AI use is affecting peer review quality
you are deciding how much to trust conference or journal review to catch scientific issues
you want a practical rule for where AI review tools fit before submission

Think twice if:

you are using this page as a substitute for journal-specific policy checking
the manuscript is close to submission and still has unresolved scientific-risk questions
you are treating AI-assisted review and AI-generated review as the same thing

Readiness check

Run the scan to see how your manuscript scores on these criteria.

See score, top issues, and what to fix before you submit.

Check my manuscriptAnthropic Privacy Partner. Zero-retention manuscript processing.See sample report

Practical checklist

Before you trust a review that materially changes your submission plan, ask:

Does the review engage with specific figures, tables, or experiments?
Does it identify a concrete scientific risk rather than generic prose concerns?
Would the same comments still make sense if they were pasted onto a different paper?
Does the review show the kind of field-specific judgment a real reviewer would bring?

If those answers look weak, treat the review as structural feedback rather than substantive scientific judgment.

Context links

AI peer review: how common is it?: detailed analysis of the 21% finding and what it means
How to respond to reviewer comments: responding when reviews are thin or off-base
Desk rejection red flags: what happens before peer review
The grant success rate crisis
NIH funding freeze 2025
How peer review actually works
How to find a manuscript reviewer
What pre-submission peer review includes

The Bottom Line

AI in peer review is expanding fast but isn't replacing expert judgment on the questions that matter for publication decisions. The practical takeaway for authors: use AI tools for what they're good at (language, structure, common errors) and human expert review for what they're not (scope fit, methodology validity, novelty assessment).

Key takeaway

Act on this if:

You use AI tools in manuscript preparation or review
Your target journal has specific AI disclosure policies
You want to understand the current landscape before choosing tools

Less urgent if:

You do not use AI tools in your research workflow
Your institution has not yet implemented AI use policies

When to use AI review tools vs. human expert review

AI tools and human reviewers aren't interchangeable, they're good at different things. Here's a practical framework:

Use AI tools when:

You need a fast structural check (section flow, missing methods details, formatting compliance)
You want language polish and grammar cleanup across the full manuscript
You're screening for common statistical reporting errors or reference formatting issues
You need a first pass before sending to human reviewers, so their time is spent on substance

Use human expert review when:

You need someone to evaluate whether your conclusions actually follow from your data
The journal-fit question matters, an AI tool can't tell you whether your paper reads like a PNAS paper vs. a PLOS ONE paper
Your methodology involves judgment calls that require hands-on experience with the technique
You're unsure whether the novelty claim holds up against recent work in the field
The stakes are high enough that a generic "looks good" isn't useful

The gap to watch: AI tools are getting better at surface-level feedback, which makes it tempting to skip human review entirely. That's a mistake. The 21% AI-generated review finding from ICLR shows what happens when surface-level evaluation replaces expert judgment. Don't replicate that problem in your own prep workflow.

If you want AI-assisted structural feedback combined with human expert scientific review, that's what manuscript readiness check is built for.

Last verified: April 20, 2026. ICLR statistics from Pangram Labs and current public policy language from ICLR and Nature Portfolio materials.

Frequently asked questions

Pangram Labs analyzed 70,000 reviews submitted to ICLR and found roughly 21% were fully AI-generated: meaning the entire review was written by a large language model, not just proofread or edited with AI assistance.

The structural incentives driving AI peer review: too many papers, too few experts, no accountability for review quality: have not improved. More capable LLMs, lower detection rates, and growing submission volumes all point toward more AI-generated reviews over time, not fewer.

Some are using AI tools for screening , checking for plagiarism, methodology flagging, and language quality , but not for substantive peer review decisions. The scientific judgment of whether findings are credible and whether conclusions follow from data still comes from human reviewers at most high-IF journals.

Not for substantive scientific review at high-IF journals in the near term. AI tools will continue to expand into pre-screening and language checking, but domain-expert evaluation of scientific claims, methodology appropriateness, and contribution significance is difficult to automate reliably.

Use AI tools for language polish, structure review, and common error detection. Don't use them for the scientific judgment calls , journal fit, scope framing, or whether your conclusions are supported by your data. Get expert human feedback on those before submission.

Reference library

Use the core publishing datasets alongside this guide

This article answers one part of the publishing decision. The reference library covers the recurring questions that usually come next: whether the package is ready, what drives desk rejection, how journals compare, and what the submission requirements look like across journals.

Open the reference library

Checklist system / operational asset

Elite Submission Checklist

A flagship pre-submission checklist that turns journal-fit, desk-reject, and package-quality lessons into one operational final-pass audit.

Flagship report / decision support

Desk Rejection Report

A canonical desk-rejection report that organizes the most common editorial failure modes, what they look like, and how to prevent them.

Dataset / reference hub

Journal Intelligence Dataset

A canonical journal dataset that combines selectivity posture, review timing, submission requirements, and Manusights fit signals in one citeable reference asset.

Dataset / reference guide

Peer Review Timelines by Journal

Reference-grade journal timeline data that authors, labs, and writing centers can cite when discussing realistic review timing.

Internal navigation

Where to go next

Supporting reads

Conversion step

Run a free manuscript preview

Back to all articles

Use this page to interpret the status and choose the next sensible move.

Guidance first. Use the scan for the next manuscript.

Open Status Guide