Methodology · 2026 Manuscript Findings Report · v1.0

Methodology

Sample frame, classification scheme, severity definitions, named limitations, and validation plan for the 2026 Manuscript Findings Report.

By Erik Jia, Founder, Manusights
Published
Companion to the report

1. Sample frame

The unit of analysis is one manuscript submitted to the Manusights pre-submission review service. Inclusion criteria: the manuscript completed the parsing and risk-extraction pipeline successfully (preview_status = preview_ready) and produced a top-risks payload with at least one issue. Exclusion criteria: parse failures, abandoned uploads, jobs that never reached the risk-extraction stage.

The window is 2026-03-23 to 2026-05-14 inclusive (52 days). The starting boundary corresponds to the launch of the v4 issue-classification engine. The ending boundary is the day of report publication.

Total manuscripts in the sample: 5,495. Manuscripts evaluated under the v4 classification engine with at least one classified issue: 835. Per-journal analyses use the intersection of the v4 subset and manuscripts with a self-reported target journal.

2. Data collection

Manuscripts are uploaded directly by researchers through manusights.com (no scraping, no third-party intake). At upload time, the author optionally provides a target journal as free text. The system parses the manuscript (PDF or DOCX), extracts content into a section model, and routes it to the review engine.

The engine produces a structured top-risks output: a list of issues, each tagged with title, summary, evidence quote, location reference, severity, and issue class. The risks list is persisted alongside the canonical job record. This report uses the persisted risks records, queried in aggregate and de-identified before tabulation.

We publish only aggregate counts and percentages. No manuscript-level data, no excerpt text, no author identifiers, no abstract content. The published CSV at /research/data/manuscript-findings-2026.csv contains the same aggregated counts presented in the report.

3. Issue classification scheme

The v4 engine classifies each flagged issue into one of five categories. Definitions, with concrete examples:

  • claim_support. The evidence shown does not fully back the claim made. Examples: an abstract sentence that overstates a finding compared with the supporting figure; a discussion conclusion asserted without an in-text figure or table reference; a benchmark that tests one regime while the claim covers another.
  • science_core. Methodology, validation, or core-argument issues. Examples: missing or weak control conditions; sampling issues that limit generalizability the manuscript claims; a benchmark that does not measure what the claim says it measures.
  • journal_fit. Scope mismatch between manuscript and target venue. Examples: a methods-paper submitted to a venue that publishes findings; a regional case study submitted to a venue requiring global generalizability; an incremental finding submitted to a broad-scope flagship.
  • structural_hygiene. Formatting, section ordering, missing required statements, or stylistic non-compliance with the target venue's instructions for authors.
  • confidence. The manuscript hedges past the point of asserting a finding, obscuring an otherwise valid result. Examples: a clean effect reported as “may suggest” or “could potentially indicate”; a result framed as exploratory when the data supports a stronger claim.

The categories are designed to be mutually exclusive at the per-issue level. Where an issue could fit two categories, the engine assigns the category that best matches the issue's primary remedy: a re-claim or evidence change is claim_support; a methods change is science_core; a venue change is journal_fit; a re-format is structural_hygiene; a re-phrasing of a hedge is confidence.

Three additional categories (validation, claim_framing, benchmarking) are present in the schema but appear in fewer than 0.2% of classified issues combined. The report folds them into a residual.

4. Severity definitions

Each issue is also tagged with a severity. The three levels and their working definitions:

  • critical. An issue that, in the engine's judgment, plausibly trips desk-screen rejection or first-round revision. The engine flags critical conservatively; not every critical-tagged issue is a guaranteed rejection, and the engine has no outcome data to ground that judgment.
  • major. An issue that warrants attention before submission but does not on its own determine outcome. Most major issues are repairable in a focused revision pass.
  • minor. Smaller issues that are worth fixing but rarely affect outcome on their own.

Severity reflects the engine's internal scale; it does not predict editor or reviewer decisions. We use the word “critical” on the same scale Manusights uses internally for triage. Readers should not infer that an issue tagged critical will produce a rejection, nor that an issue tagged minor will not.

5. Target journal handling

Target journal is captured at upload as free-text. We normalize obvious case and punctuation variants (“Nature comms” vs “Nature Communications”) but do not infer journal from content. Manuscripts without a stated target journal contribute to corpus-wide statistics but not to per-journal subsections.

Of the 835 v4-classified manuscripts, approximately 30% include a stated target journal that maps cleanly to a peer-reviewed journal. The figure is approximately 35% across the full 5,495 sample (which includes pre-v4 records).

Per-journal tables include only journals with at least 10 manuscripts in the classified-subset-with-target intersection. Long-tail and non-journal targets (conferences such as NeurIPS appear in the table because their N exceeded 10; memoir-style and thesis-stage targets are excluded because they fall outside the peer-reviewed-journal scope of this report).

6. Known limitations

  1. Self-selected sample. The corpus belongs to authors who chose to run a pre-submission review. Compared to all manuscripts in flight in 2026, these authors are more likely to have suspected an issue. The 82.9% claim_support rate should be read as an upper bound for what such a service typically catches; we do not estimate the rate among un-reviewed manuscripts.
  2. No outcome data. The report does not include data on whether the manuscripts were ultimately accepted, rejected, or revised. We make no causal claim that the flags in our data predict editor or reviewer decisions.
  3. Engine version mix. The full 5,495-manuscript corpus includes records from earlier engine versions that did not classify issues by class. To avoid mixing schemas, all class-by-class analyses use the 835-manuscript v4-classified subset.
  4. Engine validity not yet published. The v4 engine is the measurement instrument and we have not yet published a separate validation study (false-positive rate, inter-rater reliability against human peer reviewers, time-stability of classifications). A methodology paper covering those properties is planned.
  5. Small per-journal N. Six of the nine per-journal cells have N<20, which produces wide confidence intervals on any rate. We flag those rows as directional and present the narrative sections (Science, Nature Communications) explicitly as hypotheses, not estimates.
  6. Self-reported target journal. Target journal is what the author typed at upload, not where the manuscript was ultimately submitted. Some manuscripts may have shifted target after review; we cannot track that.
  7. Internal classification taxonomy. The five-class scheme is Manusights'. It is not identical to the categories used by journal editors or peer reviewers in their own decisions, nor to public rejection-reason taxonomies. Comparisons with editor-perspective taxonomies are out of scope for this report.

7. Validation roadmap

A v1.1 of this report and a separate engine-validation paper are planned for the August 2026 refresh window. The validation work targets three open questions:

  • False-positive rate. A sample of engine-flagged issues will be re-classified by a human reviewer panel. The disagreement rate will be reported per issue class.
  • Selection-bias bounding. A hold-out comparison group of un-reviewed manuscripts from public open-access archives will be scored by the same engine. The class-level rate differential will bound the self-selection inflation factor for the 82.9% headline.
  • Outcome linkage (where possible). A subset of customers will be asked to share post-submission outcomes (accepted, revised, rejected, withdrawn). Aggregate accept-vs-reject rates conditional on flag class and severity will be reported only with explicit consent and only at aggregate granularity.

8. Changelog

v1.0 (2026-05-14). Initial release. Sample window 2026-03-23 to 2026-05-14. Classified subset N=835. Headline finding: 82.9% claim_support per-manuscript rate.

Planned v1.1 (August 2026). 4-month sample window with full v4 engine coverage, plus a paired engine-validation paper.