Peer Review10 min readUpdated Jan 1, 2026

Statistical Review Red Flags: What Reviewers Notice Fast

Most papers do not get in trouble because a statistician loves complexity. They get in trouble because the design, analysis, and reporting do not support the strength of the claims being made.

Senior Researcher, Oncology & Cell Biology

Author context

Specializes in manuscript preparation and peer review strategy for oncology and cell biology, with deep experience evaluating submissions to Nature Medicine, JCO, Cancer Cell, and Cell-family journals.

Readiness scan

Find out if this manuscript is ready to submit.

Run the Free Readiness Scan before you submit. Catch the issues editors reject on first read.

Get free manuscript previewAnthropic Privacy Partner. Zero-retention manuscript processing.See sample report
Working map

How to use this page well

These pages work best when they behave like tools, not essays. Use the quick structure first, then apply it to the exact journal and manuscript situation.

Question
What to do
Use this page for
Building a point-by-point response that is easy for reviewers and editors to trust.
Start with
State the reviewer concern clearly, then pair each response with the exact evidence or revision.
Common mistake
Sounding defensive or abstract instead of specific about what changed.
Best next step
Turn the response into a visible checklist or matrix before you finalize the letter.

Most authors are afraid of statistical review for the wrong reason.

They imagine a reviewer hunting for obscure mathematical imperfections. In practice, statistical reviewers usually notice much simpler, more consequential problems: weak design, thin power, unclear analysis logic, selective reporting, and claims that outrun what the data can support.

That is why statistical review is less about elegance than about credibility.

Short answer

The most common statistical red flags are:

Red flag
Why reviewers worry
No sample-size logic
The study may be unable to support any strong conclusion
Bias not addressed
Apparent effects may reflect selection or design artifacts
Multiple testing without discipline
Positive-looking findings may be noise
Missing-data handling not explained
Results may depend on who disappeared from analysis
Subgroup or risk-factor fishing
Exploratory patterns are being sold as robust findings
Correlation treated like agreement or causation
Interpretation exceeds what the analysis can show

If you want the compressed lesson: statistical reviewers are testing whether the evidence architecture matches the manuscript's confidence.

What official reviewer guidance emphasizes

Elsevier's quick guide to common statistical errors is unusually blunt and practical.

It warns that:

  1. many studies are too small to detect even large effects
  2. clinical trials should always report sample-size calculations
  3. authors with negative results should not claim equivalence unless sufficiently powered
  4. post-hoc subgroup and risk-factor analyses should be treated as speculative
  5. comparing groups at multiple time points with repeated simple tests should be avoided

That already covers most of the failure modes I see in manuscripts sent for statistical review.

The same guide also flags design-level bias issues such as:

  • treatment-by-indication bias
  • historical controls
  • retrospective data-collection bias
  • selection bias from low response or high refusal
  • informative dropout
  • inadequate randomization or lack of concealment

This is the right frame. The biggest statistical problems often start before any model is fit.

Red flag 1: no credible sample-size logic

Underpowered studies are dangerous for two opposite reasons.

They can miss real effects, and they can also produce unstable positive-looking findings that the paper then overinterprets.

Elsevier's guide is especially useful here because it says directly: authors with negative results should not report equivalence unless the study was sufficiently powered. "Absence of evidence is not evidence of absence" is not a cliché in this context. It is a review criterion.

What reviewers want:

  • sample-size calculation or design justification
  • realistic effect assumptions
  • clarity on primary endpoint
  • acknowledgement of precision limits when the study is small

Red flag 2: bias is treated like a background nuisance instead of an analytic threat

Many papers discuss bias in the Discussion only after the central analyses are already locked in. Statistical reviewers usually want to see bias handled upstream in design and analysis.

Elsevier's reviewer sheet highlights several recurring bias problems:

  • historical controls often exaggerate treatment effect
  • retrospective data collection can vary systematically by group
  • informative dropout can skew outcome comparisons
  • poor randomization methods are not acceptable substitutes for allocation concealment

If a paper seems to assume that adjustment later can rescue a biased design fully, reviewers become skeptical quickly.

Red flag 3: subgroup analyses are sold like core findings

This is one of the oldest and still one of the most effective ways to weaken a paper.

Post-hoc subgroup analyses can generate seductive stories because they often look specific and mechanistic. But if enough slices are tested, something will look significant by chance alone.

Elsevier's guidance says conclusions should be drawn from a small number of clear, predefined hypotheses, and that post-hoc subgroup or risk-factor analyses should be treated as speculative. That sentence alone could rescue a lot of discussion sections.

If you ran subgroup analyses after seeing the data:

  • call them exploratory
  • reduce the rhetoric
  • do not promote them to headline results

Red flag 4: multiple testing is ignored or hand-waved

Many manuscripts do not fail because multiple testing occurred. They fail because authors behave as if it did not matter.

Reviewers want to know:

  • how many related tests were run
  • which analyses were primary
  • whether multiplicity adjustment was used
  • if not, how the exploratory nature is being framed

The bigger the analysis garden, the more discipline the reporting needs.

Red flag 5: missing data are invisible

Missing data are often where optimistic manuscripts quietly break.

Reviewers want answers to:

  • how much data were missing
  • whether missingness differed by group
  • whether dropout might be outcome-related
  • what imputation or sensitivity methods were used, if any

Elsevier's guide explicitly flags informative dropout because it can bias comparisons when follow-up ends for reasons connected to the primary outcome.

If your Methods and Results do not show the missing-data story, reviewers may assume you do not understand its importance.

Red flag 6: the unit of analysis is muddled

This shows up in papers that treat repeated measurements, clustered data, or grouped observations as if they were independent datapoints.

Even when the statistical code technically runs, the inference can still be wrong if the unit of analysis does not match the design.

Statistical reviewers often catch this by asking simple questions:

  • what exactly counts as one observation
  • were repeated measures summarized or modeled appropriately
  • were patients, samples, or sites clustered

If the answer is unclear, the model probably is too.

Red flag 7: correlation is used where agreement or prediction performance is the real question

Elsevier's guide says this plainly: correlation is not agreement.

That warning matters because manuscripts often use correlation coefficients to imply interchangeability between methods when the real question is whether two methods agree closely enough for practical use. These are different judgments.

The same category error appears when authors use association to imply prediction or causation without the necessary design support.

Red flag 8: figures and summary statistics hide the data

Poor statistical reporting is not only about formulas. It is also about what the reader can inspect.

Common reviewer discomfort points:

  • only bar charts with means and whiskers
  • no sense of sample distribution
  • no confidence intervals where effect estimation matters
  • no denominators or attrition counts

If the figures make it hard to see the data's structure, reviewers trust the analysis less.

Red flag 9: the manuscript overclaims what the analysis can show

This is often where a statistically decent paper turns into a statistically weak one.

Examples:

  • observational data written up with causal language
  • non-significant findings reframed as proof of no difference
  • exploratory models presented as confirmatory
  • adjusted analyses described as if confounding is solved

Statistical reviewers care about language because language reveals whether the authors understand the inferential boundaries of their own work.

A pre-submission statistical review checklist

Use this table before sending the manuscript out.

Check
What to confirm
Sample size
Justification exists and matches the primary question
Bias controls
Randomization, blinding, selection, and attrition are described
Outcomes
Primary and secondary outcomes are defined cleanly
Multiplicity
Primary versus exploratory analyses are separated clearly
Missing data
Amount, reasons, and handling are explicit
Reporting
Effect sizes, uncertainty, and denominators are visible

Why these issues often trigger revision

Statistical red flags make reviewers distrust everything else.

Once a reviewer believes the design or analysis logic is shaky, even strong domain expertise in the rest of the paper stops carrying the same weight. That is why statistical concerns so often trigger major revision rather than a few minor comments.

For many manuscripts, the most efficient workflow is:

  1. fix the methods and statistics section
  2. recast the headline claims to the true evidentiary limit
  3. then polish the prose

That sequence is more rational than the reverse.

If you want connected guidance, pair this with how to write a methods section, how to respond to reviewer comments, and a final Manusights AI Review.

Verdict

Statistical review is not about catching clever mistakes. It is about catching ordinary ways that manuscripts overstate what the data can support.

If you want to survive it, make the design defensible, the analysis explicit, the exploratory work clearly labeled, and the claims smaller than the data can bear rather than larger.

References

Sources

  1. 1. Elsevier quick guide to common statistical errors
  2. 2. CHAMP statement for statistical assessment of medical papers
  3. 3. Nature reproducibility checklist
  4. 4. JAMA: Manuscript review from a statistician's perspective

Reference library

Use the core publishing datasets alongside this guide

This article answers one part of the publishing decision. The reference library covers the recurring questions that usually come next: how selective journals are, how long review takes, and what the submission requirements look like across journals.

Open the reference library

Best next step

Use this page to interpret the status and choose the next sensible move.

The better next step is guidance on timing, follow-up, and what to do while the manuscript is still in the system. Save the Free Readiness Scan for the next paper you have not submitted yet.

Guidance first. Use the scan for the next manuscript.

Anthropic Privacy Partner. Zero-retention manuscript processing.

Internal navigation

Where to go next

Open Status Guide