AI Peer Review in 2026: The ICLR Problem Isn't Going Away
A Pangram Labs study found 21% of ICLR reviews were fully AI-generated: not AI-assisted, fully written by an LLM. In 2026, the structural incentives driving this are stronger, not weaker. Here's the problem and what researchers can do about it.
Founder, Manusights
Author context
Founder of Manusights. Writes on the pre-submission review landscape — what services actually deliver, how they compare, and where each one fits in a realistic manuscript workflow.
Readiness scan
Find out if this manuscript is ready to submit.
Run the Free Readiness Scan before you submit. Catch the issues editors reject on first read.
How to use this page well
These pages work best when they behave like tools, not essays. Use the quick structure first, then apply it to the exact journal and manuscript situation.
Question | What to do |
|---|---|
Use this page for | Getting the structure, tone, and decision logic right before you send anything out. |
Most important move | Make the reviewer-facing or editor-facing ask obvious early rather than burying it in prose. |
Common mistake | Turning a practical page into a long explanation instead of a working template or checklist. |
Next step | Use the page as a tool, then adjust it to the exact manuscript and journal situation. |
Quick answer: AI peer review 2026 is no longer a fringe problem. More reviewers are using LLMs while policy guardrails still rely heavily on disclosure and human responsibility. The most defensible public estimate still comes from Pangram Labs' ICLR analysis, which flagged roughly 21% of reviews as fully AI-generated. For authors, the practical lesson is simple: use AI for structure and language help, but do not treat AI-only review as a substitute for human scientific judgment.
Researchers analyzed all 70,000 reviews submitted to ICLR 2025 and found that roughly 21% were fully AI-generated. Not polished with AI. Not outlined with AI. Fully written by an LLM, start to finish, for one of the most important machine learning conferences in the world. The analysis was conducted by Pangram Labs, which specializes in detecting AI-generated content.
That number came out in early 2025. In 2026, the structural conditions that produced it are worse, not better, and current conference and journal policies still put most of the burden on the human reviewer to disclose and take responsibility.
In our pre-submission review work
In our pre-submission review work, this trend matters because authors increasingly assume the formal peer-review system will catch the strategic scientific issues they missed. We do not think that assumption is strong enough anymore. When official review quality becomes less reliable, the cost of arriving at submission with unresolved scope-fit mismatch, citation-gap exposure, or figure-trust erosion goes up.
Our review of the current public policy landscape also points in the same direction: conferences and journals are trying to restrict or channel AI use, but the actual incentives still push reviewers toward AI assistance. That is why pre-submission expert judgment matters more, not less.
AI-assisted vs AI-generated review
Review mode | What it can do | What it gets wrong |
|---|---|---|
Human review | Field-specific judgment, accountability, methodological nuance | Slow, uneven availability |
AI-assisted human review | Faster drafting and clearer prose when a human still makes the judgment | Disclosure and enforcement remain messy |
AI-generated review | Generic structure, plausible summaries, cheap scale | Weak accountability, shallow field judgment, hallucinated specifics |
Why 21% Is Just the Start
The ICLR number was shocking because the scale was visible. Conference peer review is more auditable than journal peer review: you can analyze 70,000 reviews from a single venue in one study. Journals are fragmented across thousands of publications, each with its own editorial system. Nobody has done the equivalent analysis for journal reviews, but there's no reason to think journals are cleaner.
GPTZero found over 100 hallucinated citations in papers already accepted at NeurIPS: fake references invented by an LLM and passed by reviewers who apparently didn't check. These aren't edge cases. They're symptoms of a system under pressure it wasn't built to handle.
In 2025, global scientific output crossed 5 million papers per year. The pool of qualified expert reviewers hasn't grown at anything close to that rate. Editors are writing to the same researchers repeatedly. Reviewer fatigue is real and widely documented. When a researcher gets their 40th review request of the year, the temptation to paste the abstract into ChatGPT and clean up the output is obvious: and in most cases, undetectable.
What Makes AI Reviews Fail
The problem isn't that AI can't summarize a paper. It can. The problem is that it can't do the things peer review actually exists to do.
It can't catch what it doesn't know to look for. A reviewer who has spent five years doing single-cell RNA-seq knows that certain clustering algorithms oversplit populations under specific conditions. That knowledge doesn't live in papers: it lives in researchers' hands and heads. An LLM reviewing your scRNA-seq paper doesn't know this. It will produce a competent-sounding summary and miss the methodological issue that anyone in the field would flag.
It can't tell you whether the interpretation is actually right. Your statistics might be technically correct and your conclusions still wrong. A human reviewer who's done the same experiments in a different system can say: "I've seen this artifact before. Here's the simpler explanation you haven't considered." An AI reviewer says: "The authors may wish to consider whether alternative explanations exist." That's not the same thing.
It can't hold itself accountable. When a reviewer publishes in the same field they review, they have skin in the game. Getting something wrong has reputational consequences. An AI has no reputation to protect and no accountability to the scientific community it's supposedly serving.
The result is reviews that look like peer review: structured, formatted, grammatically correct: but function like a summary of the abstract with generic concerns appended. They're the peer review equivalent of a participation trophy.
What policy responses look like in 2026
Surface | What is publicly visible |
|---|---|
ICLR 2026 | Reviewer guidance says reviewers remain responsible for any LLM-generated content under their name |
Nature Portfolio | Editors have publicly emphasized that reviewers should ask the editor before using AI to assist with review writing |
PLOS / similar journal guidance | Public reviewer guidance remains focused on human accountability, disclosure, and review quality rather than blanket technical enforcement |
The Accountability Gap
Here's what makes the structural problem sticky: there's no mechanism to fix it.
Editors can't pay reviewers (or can't pay enough to matter). They can't compel participation. They can't easily verify whether a review was written by a human. Detection tools like GPTZero and Turnitin's AI detector are imperfect and are getting outpaced by increasingly fluent LLMs.
Some journals have added AI disclosure requirements. A handful have banned AI-generated reviews explicitly. These are good policies. They're also largely unenforceable: a researcher who uses ChatGPT and doesn't disclose it faces no meaningful consequence unless caught, and being caught is rare.
The incentive structure hasn't changed. The volume of submissions hasn't dropped. The number of qualified human reviewers hasn't grown. In 2026, those conditions are more pronounced than they were when the ICLR data first surfaced.
What This Means for Your Paper
If your manuscript goes out to peer review, there's a meaningful chance that at least one of your reviewers will use AI to generate some or all of their evaluation. At high-volume journals, the probability is higher. At conferences in AI and computer science specifically, it's highest.
This has a practical implication: the official peer review your paper receives may not give you the substantive, expert-level feedback that would actually strengthen the work. You might get generic criticism that doesn't engage with your specific methods, hallucinated references in reviewer comments, or structural feedback that misses the real issues in your design.
There is one place in the publication pipeline where you can still be confident the feedback is human: before you submit.
Pre-submission review isn't just about improving your acceptance odds. In a system where official peer review is increasingly unreliable, getting real expert feedback on your manuscript before it enters that system is the only way to be sure the feedback reflects genuine scientific judgment.
Manusights reviewers are human scientists who've published in your field and reviewed for the journals you're targeting. Every review is written by a person, not generated by an LLM. For a full breakdown of how to spot AI-generated reviews and protect your work in this environment, see AI-Generated Peer Reviews: How Common Are They and What Researchers Can Do.
How journals are responding
The major journals have adopted different stances. Nature Portfolio journals require reviewers to disclose AI use and prohibit AI-generated review text. PLOS journals ask reviewers to confirm that reviews were not generated by AI. eLife's editorial model (consulting with reviewers before full review) provides some structural protection: the consulted review phase involves direct editor-reviewer dialogue.
Many journals now use AI-detection tools on submitted reviews. These tools are imperfect: they flag probabilistic outputs, not certainties: but they create accountability pressure. Reviewers who use AI extensively now risk being removed from reviewer pools.
The practical effect: highly specialized technical reviews that show no familiarity with the specific methodological details of your paper are more common as a red flag than before. If a review reads like it could apply to any paper in your field, rather than specifically yours, that's the signal.
What to do if you suspect an AI-generated review
You have options, and most journals have formal processes:
- Flag specific concerns to the editor. Not accusations: observations. "Reviewer 2's comments appear to lack engagement with the specific experimental details in Figure 3, and several points seem to address a different study design than the one we submitted." Concrete and professional.
- Request editorial assessment. Ask the editor whether the review meets their standards for specificity and engagement. Framing it as a quality question, not an AI accusation, is more effective.
- Write a thorough response anyway. Even if you suspect AI-generated review, a complete point-by-point response is the right move. It documents your engagement with the concerns and gives the editor a basis for accepting over a weak review.
- Don't accuse directly in the response letter. Accusations of AI use in a formal response rarely go well. Flag concerns through editor communication, separately from the response letter.
What AI-assisted review actually looks like
There's an important distinction between AI-generated reviews and AI-assisted reviews. AI-assisted means a researcher read your paper, formed their own judgments, and used AI to help draft the text. AI-generated means the AI did the reading and evaluation too.
AI-assisted reviews are increasingly common and increasingly permitted. Most journals allow AI assistance in drafting text as long as the scientific evaluation is the reviewer's own.
AI-generated reviews are what journals are trying to prevent. They typically show:
- Generic praise or concern that doesn't reference specific figures, tables, or data points
- Balanced language that hedges every point without committing to a position
- Unusual uniformity of sentence structure across the review
- Suggestions that contradict specific content in the manuscript
What this means for how you write your paper
If AI-reviewed papers are more common, papers that are resistant to superficial review fare better. Practically:
- Make your key claims figure-specific. "Our main finding is in Figure 3B" forces reviewers to engage with the actual data.
- Reference your own results in the discussion. Link claims to specific data points by figure number. This is good practice regardless, but it makes your paper harder to review superficially.
- Anticipate the reviewers who won't read carefully. Your methods and statistics should be defensible without requiring the reviewer to read the entire paper.
Submit If / Think Twice If
Submit if:
- you want a realistic 2026 framing of how AI use is affecting peer review quality
- you are deciding how much to trust conference or journal review to catch scientific issues
- you want a practical rule for where AI review tools fit before submission
Think twice if:
- you are using this page as a substitute for journal-specific policy checking
- the manuscript is close to submission and still has unresolved scientific-risk questions
- you are treating AI-assisted review and AI-generated review as the same thing
Readiness check
Run the scan to see how your manuscript scores on these criteria.
See score, top issues, and what to fix before you submit.
Practical checklist
Before you trust a review that materially changes your submission plan, ask:
- Does the review engage with specific figures, tables, or experiments?
- Does it identify a concrete scientific risk rather than generic prose concerns?
- Would the same comments still make sense if they were pasted onto a different paper?
- Does the review show the kind of field-specific judgment a real reviewer would bring?
If those answers look weak, treat the review as structural feedback rather than substantive scientific judgment.
Context links
- AI peer review: how common is it?: detailed analysis of the 21% finding and what it means
- How to respond to reviewer comments: responding when reviews are thin or off-base
- Desk rejection red flags: what happens before peer review
- The grant success rate crisis
- NIH funding freeze 2025
- How peer review actually works
- How to find a manuscript reviewer
- What pre-submission peer review includes
The Bottom Line
AI in peer review is expanding fast but isn't replacing expert judgment on the questions that matter for publication decisions. The practical takeaway for authors: use AI tools for what they're good at (language, structure, common errors) and human expert review for what they're not (scope fit, methodology validity, novelty assessment).
Key takeaway
Act on this if:
- You use AI tools in manuscript preparation or review
- Your target journal has specific AI disclosure policies
- You want to understand the current landscape before choosing tools
Less urgent if:
- You do not use AI tools in your research workflow
- Your institution has not yet implemented AI use policies
When to use AI review tools vs. human expert review
AI tools and human reviewers aren't interchangeable, they're good at different things. Here's a practical framework:
Use AI tools when:
- You need a fast structural check (section flow, missing methods details, formatting compliance)
- You want language polish and grammar cleanup across the full manuscript
- You're screening for common statistical reporting errors or reference formatting issues
- You need a first pass before sending to human reviewers, so their time is spent on substance
Use human expert review when:
- You need someone to evaluate whether your conclusions actually follow from your data
- The journal-fit question matters, an AI tool can't tell you whether your paper reads like a PNAS paper vs. a PLOS ONE paper
- Your methodology involves judgment calls that require hands-on experience with the technique
- You're unsure whether the novelty claim holds up against recent work in the field
- The stakes are high enough that a generic "looks good" isn't useful
The gap to watch: AI tools are getting better at surface-level feedback, which makes it tempting to skip human review entirely. That's a mistake. The 21% AI-generated review finding from ICLR shows what happens when surface-level evaluation replaces expert judgment. Don't replicate that problem in your own prep workflow.
If you want AI-assisted structural feedback combined with human expert scientific review, that's what manuscript readiness check is built for.
Last verified: April 20, 2026. ICLR statistics from Pangram Labs and current public policy language from ICLR and Nature Portfolio materials.
Frequently asked questions
Pangram Labs analyzed 70,000 reviews submitted to ICLR and found roughly 21% were fully AI-generated: meaning the entire review was written by a large language model, not just proofread or edited with AI assistance.
The structural incentives driving AI peer review: too many papers, too few experts, no accountability for review quality: have not improved. More capable LLMs, lower detection rates, and growing submission volumes all point toward more AI-generated reviews over time, not fewer.
Some are using AI tools for screening , checking for plagiarism, methodology flagging, and language quality , but not for substantive peer review decisions. The scientific judgment of whether findings are credible and whether conclusions follow from data still comes from human reviewers at most high-IF journals.
Not for substantive scientific review at high-IF journals in the near term. AI tools will continue to expand into pre-screening and language checking, but domain-expert evaluation of scientific claims, methodology appropriateness, and contribution significance is difficult to automate reliably.
Use AI tools for language polish, structure review, and common error detection. Don't use them for the scientific judgment calls , journal fit, scope framing, or whether your conclusions are supported by your data. Get expert human feedback on those before submission.
Sources
Reference library
Use the core publishing datasets alongside this guide
This article answers one part of the publishing decision. The reference library covers the recurring questions that usually come next: whether the package is ready, what drives desk rejection, how journals compare, and what the submission requirements look like across journals.
Checklist system / operational asset
Elite Submission Checklist
A flagship pre-submission checklist that turns journal-fit, desk-reject, and package-quality lessons into one operational final-pass audit.
Flagship report / decision support
Desk Rejection Report
A canonical desk-rejection report that organizes the most common editorial failure modes, what they look like, and how to prevent them.
Dataset / reference hub
Journal Intelligence Dataset
A canonical journal dataset that combines selectivity posture, review timing, submission requirements, and Manusights fit signals in one citeable reference asset.
Dataset / reference guide
Peer Review Timelines by Journal
Reference-grade journal timeline data that authors, labs, and writing centers can cite when discussing realistic review timing.
Best next step
Use this page to interpret the status and choose the next sensible move.
The better next step is guidance on timing, follow-up, and what to do while the manuscript is still in the system. Save the Free Readiness Scan for the next paper you have not submitted yet.
Guidance first. Use the scan for the next manuscript.
Anthropic Privacy Partner. Zero-retention manuscript processing.
Where to go next
Supporting reads
Conversion step
Use this page to interpret the status and choose the next sensible move.
Guidance first. Use the scan for the next manuscript.