Peer Review6 min read•Updated Apr 2, 2026

Statistical Review Red Flags: What Reviewers Notice Fast

Most papers do not get in trouble because a statistician loves complexity. They get in trouble because the design, analysis, and reporting do not support the strength of the claims being made.

By Senior Researcher, Oncology & Cell Biology•February 15, 2026

Author contextSenior Researcher, Oncology & Cell Biology. Experience with Nature Medicine, Cancer Cell, Journal of Clinical Oncology.View profile

Readiness scan

Find out if this manuscript is ready to submit.

Run the Free Readiness Scan before you submit. Catch the issues editors reject on first read.

Check my manuscriptAnthropic Privacy Partner. Zero-retention manuscript processing.See sample report Or sanity-check your Results section in 5 seconds →

Question	What to do
Use this page for	Building a point-by-point response that is easy for reviewers and editors to trust.
Start with	State the reviewer concern clearly, then pair each response with the exact evidence or revision.
Common mistake	Sounding defensive or abstract instead of specific about what changed.
Best next step	Turn the response into a visible checklist or matrix before you finalize the letter.

Quick answer: Most authors are afraid of statistical review for the wrong reason. They imagine a reviewer hunting for obscure mathematical imperfections. In practice, statistical reviewers usually notice much simpler, more consequential problems: weak design, thin power, unclear analysis logic, selective reporting, and claims that outrun what the data can support.

They imagine a reviewer hunting for obscure mathematical imperfections. In practice, statistical reviewers usually notice much simpler, more consequential problems: weak design, thin power, unclear analysis logic, selective reporting, and claims that outrun what the data can support.

That is why statistical review is less about elegance than about credibility.

Short answer

The most common statistical red flags are:

Red flag	Why reviewers worry
No sample-size logic	The study may be unable to support any strong conclusion
Bias not addressed	Apparent effects may reflect selection or design artifacts
Multiple testing without discipline	Positive-looking findings may be noise
Missing-data handling not explained	Results may depend on who disappeared from analysis
Subgroup or risk-factor fishing	Exploratory patterns are being sold as robust findings
Correlation treated like agreement or causation	Interpretation exceeds what the analysis can show

If you want the compressed lesson: statistical reviewers are testing whether the evidence architecture matches the manuscript's confidence.

What official reviewer guidance emphasizes

Elsevier's quick guide to common statistical errors is unusually blunt and practical.

It warns that:

many studies are too small to detect even large effects
clinical trials should always report sample-size calculations
authors with negative results should not claim equivalence unless sufficiently powered
post-hoc subgroup and risk-factor analyses should be treated as speculative
comparing groups at multiple time points with repeated simple tests should be avoided

That already covers most of the failure modes I see in manuscripts sent for statistical review.

The same guide also flags design-level bias issues such as:

treatment-by-indication bias
historical controls
retrospective data-collection bias
selection bias from low response or high refusal
informative dropout
inadequate randomization or lack of concealment

This is the right frame. The biggest statistical problems often start before any model is fit.

Red flag 1: no credible sample-size logic

Underpowered studies are dangerous for two opposite reasons.

They can miss real effects, and they can also produce unstable positive-looking findings that the paper then overinterprets.

Elsevier's guide is especially useful here because it says directly: authors with negative results should not report equivalence unless the study was sufficiently powered. "Absence of evidence is not evidence of absence" is not a cliché in this context. It is a review criterion.

What reviewers want:

sample-size calculation or design justification
realistic effect assumptions
clarity on primary endpoint
acknowledgement of precision limits when the study is small

Red flag 2: bias is treated like a background nuisance instead of an analytic threat

Many papers discuss bias in the Discussion only after the central analyses are already locked in. Statistical reviewers usually want to see bias handled upstream in design and analysis.

Elsevier's reviewer sheet highlights several recurring bias problems:

historical controls often exaggerate treatment effect
retrospective data collection can vary systematically by group
informative dropout can skew outcome comparisons
poor randomization methods are not acceptable substitutes for allocation concealment

If a paper seems to assume that adjustment later can rescue a biased design fully, reviewers become skeptical quickly.

Red flag 3: subgroup analyses are sold like core findings

This is one of the oldest and still one of the most effective ways to weaken a paper.

Post-hoc subgroup analyses can generate seductive stories because they often look specific and mechanistic. But if enough slices are tested, something will look significant by chance alone.

Elsevier's guidance says conclusions should be drawn from a small number of clear, predefined hypotheses, and that post-hoc subgroup or risk-factor analyses should be treated as speculative. That sentence alone could rescue a lot of discussion sections.

If you ran subgroup analyses after seeing the data:

call them exploratory
reduce the rhetoric
do not promote them to headline results

Red flag 4: multiple testing is ignored or hand-waved

Many manuscripts do not fail because multiple testing occurred. They fail because authors behave as if it did not matter.

Reviewers want to know:

how many related tests were run
which analyses were primary
whether multiplicity adjustment was used
if not, how the exploratory nature is being framed

The bigger the analysis garden, the more discipline the reporting needs.

Red flag 5: missing data are invisible

Missing data are often where optimistic manuscripts quietly break.

Reviewers want answers to:

how much data were missing
whether missingness differed by group
whether dropout might be outcome-related
what imputation or sensitivity methods were used, if any

Elsevier's guide explicitly flags informative dropout because it can bias comparisons when follow-up ends for reasons connected to the primary outcome.

If your Methods and Results do not show the missing-data story, reviewers may assume you do not understand its importance.

Red flag 6: the unit of analysis is muddled

This shows up in papers that treat repeated measurements, clustered data, or grouped observations as if they were independent datapoints.

Even when the statistical code technically runs, the inference can still be wrong if the unit of analysis does not match the design.

Statistical reviewers often catch this by asking simple questions:

what exactly counts as one observation
were repeated measures summarized or modeled appropriately
were patients, samples, or sites clustered

If the answer is unclear, the model probably is too.

Readiness check

Run the scan to see how your manuscript scores on these criteria.

See score, top issues, and what to fix before you submit.

Check my manuscriptAnthropic Privacy Partner. Zero-retention manuscript processing.See sample report Or check whether a cited paper supports your claim →

Red flag 7: correlation is used where agreement or prediction performance is the real question

Elsevier's guide says this plainly: correlation is not agreement.

That warning matters because manuscripts often use correlation coefficients to imply interchangeability between methods when the real question is whether two methods agree closely enough for practical use. These are different judgments.

The same category error appears when authors use association to imply prediction or causation without the necessary design support.

Red flag 8: figures and summary statistics hide the data

Poor statistical reporting is not only about formulas. It is also about what the reader can inspect.

Common reviewer discomfort points:

only bar charts with means and whiskers
no sense of sample distribution
no confidence intervals where effect estimation matters
no denominators or attrition counts

If the figures make it hard to see the data's structure, reviewers trust the analysis less.

Red flag 9: the manuscript overclaims what the analysis can show

This is often where a statistically decent paper turns into a statistically weak one.

Examples:

observational data written up with causal language
non-significant findings reframed as proof of no difference
exploratory models presented as confirmatory
adjusted analyses described as if confounding is solved

Statistical reviewers care about language because language reveals whether the authors understand the inferential boundaries of their own work.

A pre-submission statistical review checklist

Use this table before sending the manuscript out.

Check	What to confirm
Sample size	Justification exists and matches the primary question
Bias controls	Randomization, blinding, selection, and attrition are described
Outcomes	Primary and secondary outcomes are defined cleanly
Multiplicity	Primary versus exploratory analyses are separated clearly
Missing data	Amount, reasons, and handling are explicit
Reporting	Effect sizes, uncertainty, and denominators are visible

Why these issues often trigger revision

Statistical red flags make reviewers distrust everything else.

Once a reviewer believes the design or analysis logic is shaky, even strong domain expertise in the rest of the paper stops carrying the same weight. That is why statistical concerns so often trigger major revision rather than a few minor comments.

For many manuscripts, the most efficient workflow is:

fix the methods and statistics section
recast the headline claims to the true evidentiary limit
then polish the prose

That sequence is more rational than the reverse.

If you want connected guidance, pair this with how to write a methods section, how to respond to reviewer comments, and a final manuscript readiness check.

Verdict

Statistical review is not about catching clever mistakes. It is about catching ordinary ways that manuscripts overstate what the data can support.

If you want to survive it, make the design defensible, the analysis explicit, the exploratory work clearly labeled, and the claims smaller than the data can bear rather than larger.

Frequently asked questions

They usually look first at whether the study design, sample size, endpoints, and analysis plan can support the claims being made. Many later problems trace back to those basics.

Not always, but it becomes a major red flag when authors make strong negative or equivalence claims without power, precision, or design justification to support them.

Because post-hoc subgroup analyses can generate persuasive-looking patterns that are unstable or spurious, especially when many comparisons are explored without prespecification or multiplicity control.

Yes. Missing data can change the interpretation of a study materially. Reviewers want to know how much data were missing, why, whether missingness may be informative, and how the analysis addressed it.

Yes, especially when they are obvious from the methods and results. Journals may decline papers early if the analysis looks fundamentally underpowered, biased, or mismatched to the claim.

Internal navigation

Where to go next

Supporting reads

Conversion step

Run a free manuscript preview

Back to all articles

Use this page to interpret the status and choose the next sensible move.

Guidance first. Use the scan for the next manuscript.

Open Status Guide