Q: When does Audit suppress recompute?

When the paper explicitly states the test was one-tailed (we recompute as one-tailed instead of suppressing), and when the paper applies a multiple-comparisons correction (Bonferroni / FDR / Holm). Correction math is family-specific and depends on what was corrected for; rather than guess, we suppress the p-recompute and surface a "correction applied · recompute skipped" note. Descriptive checks (GRIM / GRIMMER / DEBIT) still run in the correction case because integer-bound consistency is independent of the correction.

Q: How is Audit different from statcheck.io?

statcheck.io accepts PDF uploads and runs regex-only extraction; Audit accepts pasted Results-section text and uses LLM extraction for higher recall on modern manuscript formats. statcheck only recomputes p-values; Audit additionally runs GRIM, GRIMMER, and DEBIT in the same paste-and-go flow. statcheck has no shareable URL and no plain-English flag explanations; Audit produces a 7-day-cached share URL and per-flag plain-English explanations (including the "could be one-tailed" alternative-cause UX). Audit is positioned as a modernized successor for the manuscript-prep loop. For batch checks of full PDF manuscripts, the official statcheck R package and Word add-in remain the right tool.

Question 1

How does Manusights Audit recompute p-values?

Accepted Answer

Audit replicates the statcheck algorithm (Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2016): for every reported t / F / χ² / r / z statistic with degrees of freedom and reported p, we recompute the two-tailed p-value from the test-statistic CDF and compare it to the reported value, applying a 0.5e-(reportedDecimals) rounding tolerance. The CDFs use the regularized incomplete beta function (Numerical Recipes 6.4 continued fraction) and the regularized lower incomplete gamma function (Numerical Recipes 6.2). The math runs in pure TypeScript with no LLM in the recompute path; only the extraction step (parsing reported claims out of prose) is LLM-mediated.

Question 2

What is GRIM and how does Audit run it?

Accepted Answer

GRIM (Granularity-Related Inconsistency of Means; Brown & Heathers, 2017) tests whether a reported mean of integer data is mathematically possible given the sample size and the number of decimals reported. Mean × N must be within rounding distance of an integer. Audit runs GRIM only when the underlying scale is integer-bounded (Likert, counts) · never on percentages, real-valued continuous measures, or means of measures we cannot identify. We surface the three nearest legal mean values for the "did you mean ___?" UX.

Question 3

What is GRIMMER and what is the v1.0 limitation?

Accepted Answer

GRIMMER (Anaya, 2017; Heathers & Brown, 2019) extends GRIM to the standard deviation: given a GRIM-passing mean, an integer scale, and N, only certain SD values are mathematically possible. v1.0 implements an upper-bound check: SDs that exceed the theoretical maximum (achievable when all values cluster at the scale extremes) are flagged. Anaya 2017's full integer-partition enumeration ships in a future update · it would additionally catch sub-maximal SDs that are nonetheless impossible. Flagged cases in v1.0 are correct; some genuinely-impossible sub-maximal SDs may currently pass.

Question 4

What is DEBIT and when does Audit apply it?

Accepted Answer

DEBIT (Heathers & Brown, 2019; Heathers, van der Zee, & Jung, 2018) tests whether a reported proportion + SD pair is mathematically possible for an N-sized binary sample. For binary data, SD = sqrt(p × (1-p) × N / (N-1)). Audit checks within rounding tolerance and surfaces nearest legal proportion values when inconsistent. We apply DEBIT only when the LLM extraction marks the underlying outcome as binary · never to continuous proportions or rates.

Question 5

Why is the math deterministic instead of LLM-computed?

Accepted Answer

Mathematical certainty is the entire pitch of this category. Reviewers and meta-analysts use these checks specifically because the math is closed-form. An LLM-computed flag would be hand-wavy: "Claude says your p is wrong" is not a defensible claim in peer review. By keeping recompute in pure code, every flag traces back to a citable algorithm and is reproducible. The only LLM-mediated step is parsing reported statistics from prose, where regex extraction has documented ~60% recall on real psychology papers (Nuijten et al., 2016) and modern manuscripts use formats · LaTeX, copy-pasted Word tables, mixed prose conventions · that break regex.

Question 6

What does "decision-flipping" mean?

Accepted Answer

A decision-flipping flag is the highest-severity bucket: the reported p and the recomputed p fall on opposite sides of the conventional α=0.05 threshold. The reported value claims significance (p < .05) when the recompute says non-significant, or vice-versa. These deserve attention before submission because they change the substantive interpretation of the result. Decision-flipping flags accounted for ~13% of inconsistencies in Nuijten et al.'s 30,000-paper psychology corpus.

Question 7

When does Audit suppress recompute?

Accepted Answer

When the paper explicitly states the test was one-tailed (we recompute as one-tailed instead of suppressing), and when the paper applies a multiple-comparisons correction (Bonferroni / FDR / Holm). Correction math is family-specific and depends on what was corrected for; rather than guess, we suppress the p-recompute and surface a "correction applied · recompute skipped" note. Descriptive checks (GRIM / GRIMMER / DEBIT) still run in the correction case because integer-bound consistency is independent of the correction.

Question 8

How is Audit different from statcheck.io?

Accepted Answer

statcheck.io accepts PDF uploads and runs regex-only extraction; Audit accepts pasted Results-section text and uses LLM extraction for higher recall on modern manuscript formats. statcheck only recomputes p-values; Audit additionally runs GRIM, GRIMMER, and DEBIT in the same paste-and-go flow. statcheck has no shareable URL and no plain-English flag explanations; Audit produces a 7-day-cached share URL and per-flag plain-English explanations (including the "could be one-tailed" alternative-cause UX). Audit is positioned as a modernized successor for the manuscript-prep loop. For batch checks of full PDF manuscripts, the official statcheck R package and Word add-in remain the right tool.

How Manusights Audit works

Architecture: LLM extraction + deterministic recompute

p-value recompute (statcheck-equivalent)

What it checks

How the math is computed

Severity classification

Suppression cases

Reference

GRIM (Granularity-Related Inconsistency of Means)

What it checks

When it applies

Reference

GRIMMER (Granularity-Related Inconsistency of Means Mapped to Error Repeats)

What it checks

v1.0 limitation: upper-bound check only

References

DEBIT (Descriptive Binary data Inconsistency Test)

What it checks

When it applies

References

Data handling

How to cite Manusights Audit