Pre-Submission Review for Information Retrieval Papers
Information retrieval papers need pre-submission review that checks task definition, metrics, baselines, leakage, artifacts, and venue fit.
Senior Researcher, Oncology & Cell Biology
Author context
Specializes in manuscript preparation and peer review strategy for oncology and cell biology, with deep experience evaluating submissions to Nature Medicine, JCO, Cancer Cell, and Cell-family journals.
Readiness scan
Find out if this manuscript is ready to submit.
Run the Free Readiness Scan before you submit. Catch the issues editors reject on first read.
How to use this page well
These pages work best when they behave like tools, not essays. Use the quick structure first, then apply it to the exact journal and manuscript situation.
Question | What to do |
|---|---|
Use this page for | Getting the structure, tone, and decision logic right before you send anything out. |
Most important move | Make the reviewer-facing or editor-facing ask obvious early rather than burying it in prose. |
Common mistake | Turning a practical page into a long explanation instead of a working template or checklist. |
Next step | Use the page as a tool, then adjust it to the exact manuscript and journal situation. |
Quick answer: Pre-submission review for information retrieval papers should test whether the retrieval task, collection, relevance judgments, metrics, baselines, train-test separation, artifacts, user context, and venue fit support the manuscript's claim. IR reviewers are quick to reject papers where the ranking result looks strong but the evaluation setup, collection, labels, or baseline comparison cannot be trusted.
If you need a manuscript-specific readiness diagnosis, start with the AI manuscript review. If the paper is mainly a general ML method, see pre-submission review for machine learning.
Method note: this page uses ACM SIGIR artifact-badging materials, ACM reproducibility guidance, IR reproducibility research, CHIIR/SIGIR-style artifact expectations, and Manusights computational review patterns reviewed in April 2026.
What This Page Owns
This page owns information-retrieval-specific pre-submission review. It applies to papers about search, ranking, retrieval evaluation, recommender systems, query understanding, document collections, relevance judgments, conversational search, neural retrieval, retrieval-augmented generation, indexing, user intent, learning to rank, and IR test collections.
Intent | Best owner |
|---|---|
IR manuscript needs retrieval-evaluation critique | This page |
General ML model contribution dominates | Machine learning review |
Data pipeline or analytics dominates | Data science review |
User interaction dominates | HCI review |
Statistics-only issue | Statistical review |
The boundary is retrieval evaluation and search relevance.
What IR Reviewers Check First
IR reviewers often ask:
- what is the retrieval task?
- are queries, documents, users, sessions, or recommendations defined clearly?
- are relevance judgments valid, consistent, and appropriate for the task?
- do metrics match the use case?
- are baselines current, tuned, and fair?
- is there leakage between train, validation, test, judgments, prompts, or collection construction?
- are artifacts, code, data, indexes, and runs available enough for reproduction?
- does the paper fit SIGIR, TOIS, ICTIR, CHIIR, RecSys, CIKM, WSDM, or an applied venue?
The paper has to make the evaluation credible before the result can matter.
In Our Pre-Submission Review Work
In our pre-submission review work, IR papers most often fail when the experimental design makes the ranking improvement hard to believe.
Task blur: search, recommendation, question answering, conversational retrieval, and RAG are mixed without a precise evaluation target.
Metric mismatch: nDCG, MAP, MRR, recall, precision, success, satisfaction, or latency is used without matching user intent or system goal.
Baseline weakness: the comparison set omits a strong lexical, neural, hybrid, tuned, or simple baseline.
Judgment fragility: relevance labels, pooling, annotation, assessor agreement, or gold-standard construction is underexplained.
Leakage risk: training data, query logs, prompt examples, document collections, or candidate sets leak test information.
A useful review should identify the first retrieval-evaluation objection.
Public Field Signals
ACM SIGIR artifact badging emphasizes reproducibility, replicability, and artifact sharing as part of IR research culture. ACM reproducibility guidance describes artifact review and badging as a way to prepare and review research artifacts. Recent IR reproducibility work on recommender systems reports problems such as data split errors, train-test leakage, artifact-paper inconsistency, and weak baselines.
Those signals make IR readiness more than model novelty. Evaluation design and artifact consistency are central.
Information Retrieval Review Matrix
Review layer | What it checks | Early failure signal |
|---|---|---|
Task | search, ranking, recommendation, RAG, conversational retrieval | Task definition is unstable |
Collection | queries, documents, sessions, users, candidates | Dataset construction is opaque |
Judgments | labels, pooling, assessors, agreement, gold standard | Relevance is hard to trust |
Metrics | nDCG, MAP, MRR, recall, latency, satisfaction | Metric does not match use |
Baselines | lexical, neural, hybrid, tuned, simple, recent | Weak comparison set |
Artifacts | code, data, runs, indexes, environment, seeds | Results cannot be reproduced |
Venue fit | SIGIR, TOIS, ICTIR, CHIIR, RecSys, CIKM, WSDM | Audience mismatch |
This matrix keeps the page distinct from broad ML review.
What To Send
Send the manuscript, target venue, code repository or archive, dataset or collection description, query and document construction details, relevance judgment protocol, run files, baseline settings, evaluation scripts, metric definitions, train-validation-test split logic, prompt or RAG construction if relevant, and prior reviews if available.
If the paper uses proprietary logs or private collections, include the reproducibility compromise and what can be shared.
What A Useful Review Should Deliver
A useful IR pre-submission review should include:
- retrieval-contribution verdict
- task and collection critique
- relevance-judgment and metric review
- baseline and leakage-risk check
- artifact and reproducibility readiness note
- user-context and venue-fit recommendation
- submit, revise, retarget, or diagnose deeper call
The review should not only say "add baselines." It should identify the baseline, metric, or judgment problem that will decide reviewer trust.
Common Fixes Before Submission
Before submission, authors often need to:
- define the retrieval task more narrowly
- justify metrics against the user or system goal
- add lexical, neural, hybrid, or simple baselines
- document relevance judgments and pooling
- check for leakage in splits, prompts, or collection construction
- package code, run files, and evaluation scripts
- narrow claims from general search improvement to a tested retrieval setting
- retarget from SIGIR to CHIIR, ICTIR, RecSys, CIKM, WSDM, TOIS, or a domain venue
These fixes make the paper easier to trust and reproduce.
Reviewer Lens By Paper Type
A ranking paper needs strong baselines, metric justification, and leakage control. A recommender paper needs split discipline, candidate set clarity, user or item cold-start context, and comparison to simple baselines. A RAG paper needs retrieval-grounding evidence separate from generation quality. A conversational search paper needs session, interaction, and user intent clarity. A collection paper needs construction, annotation, licensing, and reuse value. An evaluation paper needs metric validity and failure-mode analysis.
The AI manuscript review can flag whether the blocking risk is task definition, metrics, baselines, leakage, artifacts, or venue fit.
How To Avoid Cannibalizing ML Pages
Use this page when the manuscript's submission risk depends on search, ranking, relevance judgments, retrieval metrics, collections, recommender evaluation, RAG retrieval evidence, or IR venue fit. Use ML review when the main claim is a general learning method, architecture, optimization, or benchmark outside retrieval-specific evaluation.
That distinction keeps the page focused on the IR buyer's actual problem.
What Not To Submit Yet
Do not submit an IR paper if the evaluation task cannot be stated in one sentence. If reviewers cannot tell what kind of retrieval success the paper optimizes, they will disagree about whether the metrics, baselines, and collection are appropriate.
Also pause if the strongest result depends on a baseline that may be undertuned. IR reviewers are used to seeing new methods beat weak comparisons. A simple, well-tuned lexical or hybrid baseline can be more damaging to a paper than a complex competing model.
For RAG or conversational search papers, pause again if generation quality is masking retrieval weakness. The manuscript should separate retrieval relevance from downstream answer fluency.
For recommender papers, pause if the candidate set and negative-sampling logic are not explicit. A model can look strong when the evaluation excludes realistic alternatives or samples negatives too easily. Reviewers need to know whether the task reflects the choice set users or systems actually face, not just a convenient offline benchmark.
If the paper uses click logs, make the bias story explicit. Position bias, popularity bias, bot traffic, and changing product surfaces can turn a clean-looking signal into a misleading relevance proxy.
Submit If / Think Twice If
Submit if:
- retrieval task and collection are clear
- relevance judgments are credible
- metrics match the use case
- baselines are strong and fair
- leakage checks are explicit
- artifacts support reproduction
Think twice if:
- task definition shifts across sections
- labels or pooling are underexplained
- a weak baseline carries the result
- RAG claims blur retrieval and generation
Readiness check
Run the scan to see how your manuscript scores on these criteria.
See score, top issues, and what to fix before you submit.
Bottom Line
Pre-submission review for information retrieval papers should protect the link between retrieval evaluation and retrieval claim. The manuscript needs task clarity, credible judgments, fair baselines, leakage control, reproducible artifacts, and a venue target that fits the contribution.
Use the AI manuscript review if you need a fast readiness diagnosis before submitting an IR paper.
- https://sigir.org/general-information/acm-sigir-artifact-badging/
- https://www.acm.org/publications/reproducibility
- https://arxiv.org/abs/2503.07823
- https://reviewers.acm.org/training-course/review-criteria
Frequently asked questions
It is a field-specific review that checks whether an IR manuscript is ready for SIGIR-style or journal submission, including task definition, collection, relevance judgments, metrics, baselines, leakage, reproducibility artifacts, user context, and venue fit.
They often attack weak baselines, unclear retrieval task, inconsistent relevance labels, metric mismatch, train-test leakage, poor collection construction, irreproducible artifacts, and claims that do not match the retrieval setting.
Machine learning review focuses broadly on model contribution and benchmark evidence. Information retrieval review focuses on search tasks, ranking, collections, relevance judgments, evaluation metrics, indexing, user intent, and retrieval artifact reproducibility.
Use it before submitting search, ranking, recommender, retrieval-augmented generation, conversational search, evaluation, collection, or SIGIR/TOIS-style papers where evaluation design could decide review.
Final step
Find out if this manuscript is ready to submit.
Run the Free Readiness Scan. See score, top issues, and journal-fit signals before you submit.
Anthropic Privacy Partner. Zero-retention manuscript processing.
Where to go next
Supporting reads
Conversion step
Find out if this manuscript is ready to submit.
Anthropic Privacy Partner. Zero-retention manuscript processing.