Skip to main content
Manuscript Preparation6 min readUpdated Jun 12, 2026

Pre-Submission Review for Computational Biology Papers: Reproducibility, Code, and What Reviewers Check

Computational biology manuscripts face unique reproducibility scrutiny. About half of published computational models are not reproducible. Here is what to verify before submission to avoid being part of that statistic.

Author contextSenior Researcher, Oncology & Cell Biology. Experience with Nature Medicine, Cancer Cell, Journal of Clinical Oncology.View profile

Readiness scan

Find out if this manuscript is ready to submit.

Run the Free Readiness Scan before you submit. Catch the issues editors reject on first read.

Check my manuscriptAnthropic Privacy Partner. Zero-retention manuscript processing.See example reports
Working map

How to use this page well

These pages work best when they behave like tools, not essays. Use the quick structure first, then apply it to the exact journal and manuscript situation.

Question
What to do
Use this page for
Building a point-by-point response that is easy for reviewers and editors to trust.
Start with
State the reviewer concern clearly, then pair each response with the exact evidence or revision.
Common mistake
Sounding defensive or abstract instead of specific about what changed.
Best next step
Turn the response into a visible checklist or matrix before you finalize the letter.

Quick answer: Pre-submission review for computational biology should test whether another lab could rerun the paper from the manuscript, repository, environment, and data package you plan to submit. Journals in this area now expect code access, versioned dependencies, benchmark discipline, and enough documentation for reviewers to trust the result before they decide how exciting it is.

The right pre-submit check goes beyond prose quality. It asks whether the code, environment, data, parameter choices, figures, statistics, and biological claims are publication-ready as one package.

Check your computational biology manuscript readiness in 1-2 minutes with the free scan.

What This Page Owns

This page owns computational-biology-specific pre-submission review. It applies to bioinformatics, computational genomics, systems biology, single-cell analysis, omics integration, computational modeling, benchmarked biological pipelines, biological machine learning, and data-driven biological discovery where code, data, and reproducibility are central to the scientific claim.

Intent
Best owner
Computational biology manuscript needs reproducibility critique
This page
Data-science workflow dominates outside biology
Data science review
Genome-scale biological interpretation dominates
Genomics review
AI model novelty dominates
Artificial intelligence review
Wet-lab mechanism dominates
Molecular biology review

The boundary is reproducible biological computation. The page answers whether reviewers can inspect the computational evidence enough to trust the biological claim.

Pre-submission review computational biology: the real editorial screen

The problem is not theoretical. Multiple studies have demonstrated that computational biology results frequently cannot be reproduced:

  • about half of published computational models were not reproducible due to incorrect or missing information
  • key details in bioinformatics data processing are often omitted, including software versions, parameter settings, and configuration files
  • changes in reference data, software versions, and missing code make replication impossible even when the original analysis was correct

This means reviewers at top computational biology journals are specifically looking for reproducibility gaps. Not as a secondary concern, but as a primary evaluation criterion.

What Computational Biology Reviewers Check First

Computational biology reviewers often ask:

  • can the analysis be rerun from the repository, environment, and data record?
  • are code, data, accession numbers, software versions, parameters, and random seeds explicit?
  • does the benchmark compare against current and fairly tuned methods?
  • are preprocessing, filtering, normalization, and batch-correction choices auditable?
  • does the statistical analysis match the biological question and multiple-testing burden?
  • are figures, tables, and supplementary files traceable back to the pipeline?
  • does the validation package support the biological claim rather than only a computational performance claim?
  • does the paper fit PLOS Computational Biology, Bioinformatics, Genome Biology, Nature Computational Science, Cell Systems, or a domain biology journal?

If those points are weak, a technically impressive result can look unreviewable.

In Our Pre-Submission Review Work

In our pre-submission review work on computational biology manuscripts, papers usually break in one of three places. The repository exists but does not recreate the figures. The methods name the tools but not the exact versions and non-default parameters. Or the benchmark looks better than it should because the comparison baselines are outdated, poorly tuned, or easier than the real problem.

Our analysis of current reproducibility policies points to the same bottleneck: packaging, not ambition. We see a repeated problem where the science is strong, but the manuscript still reads like internal lab infrastructure instead of a reproducible external artifact. That is the difference between "the code worked in our hands" and "a reviewer can trust this enough to keep reading."

Repository-does-not-reproduce: code is public, but a reviewer cannot reproduce the main figures because data paths, environment details, seeds, model weights, or run order are missing.

Version-and-parameter gap: the manuscript names STAR, Seurat, DESeq2, Scanpy, AlphaFold, GATK, or another tool without the database release, package version, non-default parameters, and preprocessing choices needed to reconstruct the analysis.

Benchmark advantage: the comparison uses outdated baselines, untuned alternatives, uneven input data, or test conditions that flatter the new method.

Biological-claim overreach: the computational output is real, but the biological conclusion outruns validation, external data, or orthogonal evidence.

Supplement fragmentation: figures, methods, code, tables, and data availability statements are scattered so the reviewer cannot follow input-to-output logic.

Target-journal mismatch: computational biology papers aimed at PLOS Computational Biology, Nature Computational Science, Bioinformatics, Cell Systems, Genome Biology, or a domain biology journal fail for different reasons. We see authors treat those journals as interchangeable even though the abstract, methods, figures, code repository, data availability statement, and cover letter need different emphasis for a methods audience versus a biological-discovery audience.

A useful review should identify the first reproducibility, benchmark, or biological-validation objection a computational biology reviewer would raise.

Public Field Signals

PLOS Computational Biology requires authors to make the data underlying manuscript findings available and also requires author-generated code directly related to the findings to be available without access restriction unless a legal or ethical exception applies. Nature Computational Science says custom computer code or algorithms central to the paper's claims must be available to editors and reviewers on request, and Springer Nature policy requires code-availability statements when new code is necessary to interpret or replicate the conclusions.

Method note: official policies define the submission materials, but they do not decide whether a specific repository is usable. Manusights interpretation adds the reviewer-facing test: can a competent reader inspect the code, data, methods, figures, and biological claim without reconstructing the lab's private workflow?

Method note: official policies define the submission materials, but they do not decide whether a specific repository is usable. Manusights interpretation adds the reviewer-facing test: can a competent reader inspect the code, data, methods, figures, and biological claim without reconstructing the lab's private workflow?

The five pillars of reproducible computational research

A 2023 framework published in Briefings in Bioinformatics identified five pillars that reviewers increasingly expect:

Pillar
What it requires
Common failure
Literate programming
Analysis documented in notebooks (Jupyter, R Markdown) that interleave code, results, and interpretation
Code exists but is not documented or explained
Code version control
Code in a version-controlled repository (GitHub, GitLab) with tagged releases
Code shared as a zip file or "available upon request"
Compute environment control
Containerized environments (Docker, Singularity) or explicit dependency specifications
Software versions not recorded, conda/pip environments not exported
Persistent data sharing
Data in FAIR-compliant repositories with persistent identifiers (DOI, accession numbers)
Data "available upon request" or on a lab website that may disappear
Documentation
README files, parameter descriptions, example inputs and outputs
Code exists without instructions on how to run it

Not every journal requires all five, but the direction is clear. Reviewers increasingly check for these and flag their absence.

Code availability

Is the code in a public repository? Not "available upon request" but actually accessible right now. GitHub with a Zenodo DOI is the standard. The repository should include:

  • all custom scripts and pipelines used in the analysis
  • a README explaining how to run the code
  • version tags matching the submitted manuscript
  • example input data or test cases
  • dependency specifications (requirements.txt, environment.yml, or Dockerfile)

Software versions

Every piece of software used in the analysis must be specified with its version number. "We used STAR for alignment" is not reproducible. "We used STAR v2.7.10b with default parameters except --outFilterMismatchNmax 5" is reproducible.

This applies to every step: alignment, variant calling, differential expression, pathway analysis, visualization. If you used R, the R version and every package version matter. If you used Python, the Python version and library versions matter.

Data availability

Raw data should be deposited in appropriate repositories:

  • sequencing data: GEO, SRA, ENA
  • proteomics: PRIDE, ProteomeXchange
  • metabolomics: MetaboLights
  • structural data: PDB, EMDB
  • general: Figshare, Dryad, Zenodo

Processed data (count matrices, normalized expression values, variant calls) should also be available, either in the repository or as supplementary material. Reviewers need to be able to start from the raw data and arrive at the same processed data using your documented pipeline.

Statistical methods

Computational biology papers often involve multiple testing across thousands of genes, proteins, or genomic regions. Reviewers check:

  • multiple testing correction method (Bonferroni, Benjamini-Hochberg, or permutation-based)
  • significance thresholds justified, not arbitrary
  • effect size reported alongside statistical significance
  • batch effects addressed in multi-sample analyses
  • validation approach (cross-validation, independent cohort, or orthogonal method)

Benchmarking against existing methods

If the paper introduces a new method or pipeline, reviewers expect comparison against established alternatives using standard benchmark datasets. A new tool that has only been tested on the authors' own data is not convincing.

Computational Biology Review Matrix

Review layer
What it checks
Early failure signal
Code package
Repository, README, tagged release, run order
Main figures cannot be reproduced
Environment
Versions, dependencies, containers, seeds
Reviewer cannot recreate setup
Data
Raw data, processed data, accessions, restrictions
Data availability is vague
Pipeline
Preprocessing, filtering, normalization, parameters
Methods omit non-default choices
Benchmark
Baselines, datasets, tuning, fairness
Comparison favors the new method
Statistics
Multiple testing, uncertainty, validation
Significance outruns design
Biology claim
Interpretation, validation, target journal
Output is stronger than the biological conclusion

This matrix keeps the page distinct from generic data science, AI, and genomics pages.

What To Send

Send the manuscript, target journal, code repository, README, environment file or container, dependency versions, random seeds, data accessions, processed data, benchmark list, parameter table, figure-generation scripts, statistical analysis plan, validation evidence, data availability statement, code availability statement, supplement, and prior reviewer comments if available.

If data or code cannot be public, send the access constraint and the reviewer-facing substitute: controlled-access repository, synthetic data, toy example, executable workflow, or clear exception language.

What A Useful Review Should Deliver

A useful computational biology pre-submission review should include:

  • reproducibility verdict
  • code, data, and environment packaging critique
  • benchmark and baseline fairness review
  • statistical and validation check
  • biological-claim discipline note
  • figure and supplement traceability review
  • journal-lane recommendation
  • submit, revise, retarget, or diagnose deeper call

The review should not only say "make code available." It should state which missing file, parameter, data path, or validation step would make the paper untrustworthy to a reviewer.

Readiness check

Run the scan to see how your manuscript scores on these criteria.

See score, top issues, and what to fix before you submit.

Check my manuscriptAnthropic Privacy Partner. Zero-retention manuscript processing.See example reports

How To Avoid Cannibalizing Data Science Or AI Pages

Use this page when the manuscript's risk depends on biological computation, reproducibility, code, data, pipelines, benchmark fairness, and whether a computational result supports a biological claim. Use data science review when the paper is mostly a general data workflow, benchmark, or statistical analysis outside biology. Use AI review when model novelty, training method, architecture, or AI-system contribution is the main reviewer question.

That boundary keeps this page focused on computational biology authors who need to know whether their manuscript is rerunnable, biologically interpretable, and target-journal ready before upload.

Code and reproducibility

  • all code is in a public repository with a DOI (GitHub + Zenodo)
  • the repository has a README with instructions for running the analysis
  • software versions are specified for every tool in the pipeline
  • dependency specifications are included (requirements.txt, environment.yml, Dockerfile)
  • example data or test cases are provided
  • the analysis can be run from start to finish by someone outside your lab

Data

  • raw data deposited in appropriate domain-specific repositories with accession numbers
  • processed data available as supplementary material or in a general repository
  • data availability statement includes specific repository names and accession numbers
  • any access restrictions are explained and justified

Methodology

  • every step of the computational pipeline is described in enough detail for reproduction
  • parameter choices are stated and justified (not just "default parameters")
  • statistical methods are appropriate for the data type and multiple testing burden
  • batch effects are addressed where applicable
  • validation is performed using an independent approach or dataset

For new methods papers

  • benchmarking against existing alternatives using standard datasets
  • runtime and memory requirements documented
  • scalability discussed (does it work on larger datasets?)
  • limitations acknowledged (what the method cannot do)

Where pre-submission review helps most in computational biology

Computational biology manuscripts are uniquely well-suited for automated review because many of the reproducibility requirements are systematic and checkable:

  • Citation verification catches references to tools that have been superseded or papers that have been retracted. The field moves fast, and citing an outdated version of a widely-used tool signals that the pipeline may not be current.
  • Methodology evaluation checks whether the computational approach is described in enough detail and whether the statistical methods are appropriate.
  • Journal-specific calibration evaluates whether the paper meets the specific requirements of your target journal (Genome Biology has different standards than Bioinformatics).

The manuscript readiness check evaluates these in about 1-2 minutes. The manuscript readiness check provides a full report with 15+ verified citations from 500M+ live papers, figure-level feedback, and a prioritized revision checklist calibrated to your target journal.

For manuscripts targeting Genome Biology, Nature Methods, or Cell Systems, Manusights Expert Review ($1,000 to $1,800) connects you with a reviewer experienced in computational biology methodology at your target journal.

Common failure patterns in computational biology

In computational biology, reviewers often decide whether the manuscript is trustworthy before they decide whether it is interesting.

Weak point
What reviewers assume
Stronger pre-submit standard
Code is mentioned but not runnable
The result may not be reproducible
Public repository, tagged release, instructions, and test inputs
Software versions and parameters are incomplete
The analysis cannot be reconstructed
Version numbers and non-default settings documented step by step
Data are "available on request"
Reproducibility is being deferred
Stable repositories with accession numbers or DOIs
Pipeline logic is split across main text and supplements without order
Reviewers cannot follow the workflow
One coherent analytical narrative from input to output

A short checklist before you call the paper reproducible

Before submission, confirm:

  • a new lab member could rerun the main analysis from your repository without emailing you
  • every major tool, package, database release, and parameter choice is recorded
  • processed outputs can be traced back to the raw data source clearly
  • the README explains what to run first, what to expect, and where outputs should appear
  • the manuscript claims exactly what the shared code and data can support

If any of those answers is still no, the manuscript is closer to "promising analysis" than to a review-ready computational paper.

Ready To Submit / Pause First

Ready to submit if

  • the repository reproduces the core tables or figures from a clean setup
  • software versions, parameters, and data access details are explicit in the manuscript or repository
  • the benchmark is fair enough that a skeptical reviewer would not call it engineered to flatter the new method

Pause first if

  • a reviewer would need to email you to discover the right environment, seed, or preprocessing step
  • the strongest performance claim depends on weak or outdated comparison baselines
  • the biological claim is broader than what the validation package can currently support

Pros And Cons Of Pre-Submission Review In Computational Biology

Pros: a field-specific review can catch reproducibility failures, unfair baselines, missing version details, and biological claims that outrun the validation package before reviewers spend their first report on those issues.

Cons: it is less useful if the code is not yet runnable, the data cannot be shared even under controlled access, or the paper still lacks a clear biological question beyond method performance.

Official Source Detail Snapshot

These details are not the point of the page, but they explain why a computational biology review has to check reproducibility before prose polish. PLOS Computational Biology maintains a current editors page, so authors should verify the current editorial leadership there before quoting any name in submission materials. Its submission guidance says research manuscripts have no fixed word limit, while some article types such as Perspectives are capped at 2,500 words.

DOAJ lists PLOS Computational Biology publication fees up to 3165 USD, with waiver-policy context. The practical review lesson is that code, data, and article-type fit need to be ready before the author spends time on a prestige-driven journal pitch.

Frequently asked questions

They usually check whether the work is reproducible enough to trust. That means code availability, versioned dependencies, data access, parameter reporting, and a validation design that another group could actually rerun.

A weak validation package is the most common problem. Models that are overfit, benchmarked unfairly, or impossible to rerun from the shared repository tend to lose credibility quickly.

At minimum, use a versioned repository, clear setup instructions, exact software and package versions, fixed seeds where relevant, accessible data or accession numbers, and enough parameter detail for another lab to regenerate the main outputs.

It is most useful when the paper makes strong biological claims from computational outputs, when the benchmark design is complex, or when the target journal has a strict reproducibility culture and a rejection would cost months.

References

Sources

  1. Improving reproducibility in computational biology (PLOS Comp Biol)
  2. The five pillars of computational reproducibility (Briefings in Bioinformatics)
  3. Genomic reproducibility in the bioinformatics era (PMC)
  4. PLOS Computational Biology code availability policy
  5. PLOS Computational Biology submission guidelines
  6. PLOS Computational Biology Editors-in-Chief
  7. DOAJ PLOS Computational Biology record

Final step

Find out if this manuscript is ready to submit.

Run the Free Readiness Scan. See score, top issues, and journal-fit signals before you submit.

Anthropic Privacy Partner. Zero-retention manuscript processing.

Internal navigation

Where to go next