How Manusights Sentry works
Full data-source and algorithm disclosure for Sentry, the Reference Integrity Checker. Every per-reference verdict traces back to a public, citable dataset (Crossref, Retraction Watch, the Retraction Watch Hijacked Journal Checker). No black boxes; no hand-wavy “AI says.”
Last reviewed: April 2026 · Sentry v1.0
Architecture: parse → resolve → enrich → classify
- Parse: auto-detect input format (DOIs / BibTeX / RIS / plain text) using citation-js for structured formats and doi-regex for plaintext sweeps. Output: normalized
ParsedReference[]. - Resolve: single Redis round-trip against the nightly Retraction Watch mirror to fast-path retractions; concurrency-3 batch fetch against Crossref REST API for paper metadata +
update-to[]arrays. - Enrich: for non-retracted references, match the reference’s cited URL/domain against Retraction Watch’s Hijacked Journal Checker clone-domain set (never ISSN or journal name — see the Hijacked section below for why).
- Classify: severity precedence: retracted > expression-of-concern > hijacked-journal > correction > clean.
Crossref REST API
Primary source for paper metadata and update-to[] retraction notices
Sentry queries https://api.crossref.org/works/{DOI} with a mailto-identified User-Agent (polite pool, 50 req/s rate limit; effective Dec 1 2025). We run concurrency 3 to stay well under the limit with safety margin. The response includes:
- title, author, container-title, ISSN, publisher, issued.date-parts — for the per-reference card display
- update-to[] — the critical field. Array of update notices linked TO this work. Non-empty for retracted / EoC / corrected papers. Each entry carries a type (“retraction” / “expression-of-concern” / “correction” / etc.), the notice DOI, the notice label, and the registration timestamp.
Per-DOI responses are cached in Upstash Redis for 24 hours. Crossref data updates daily; retraction notices typically propagate within 24-72 hours of publication.
Reference: Crossref REST API documentation.
Retraction Watch dataset
ISSN: 2692-4579 · Fast-path retraction lookup with full reason metadata
Crossref Labs hosts the Retraction Watch dataset as a downloadable CSV with per-DOI retraction metadata. We mirror this nightly via a cron job into a Redis hash keyed by DOI, so retraction lookups are O(1) and don’t require an extra Crossref call. Fields used:
- OriginalPaperDOI — primary key, lowercase normalized
- RetractionNature — controlled vocabulary: “Retraction”, “Expression of Concern”, “Correction”, “Reinstatement”
- RetractionDate — ISO YYYY-MM-DD
- Reason — semicolon-separated controlled-vocabulary categories from the Retraction Watch user guide (e.g., “Falsification/Fabrication of Data”, “Investigation by Company/Institution”, “Plagiarism of Article”)
- RetractionDOI — linkable DOI of the retraction notice itself
When a DOI appears in both Crossref’s update-to[] and the RW CSV, the RW entry takes precedence because it carries richer reason metadata. Source attribution (“via Retraction Watch” or “via Crossref”) is shown per-flag in the result UI.
Reference: The Retraction Watch Database. Retraction Watch is operated by The Center for Scientific Integrity, a 501(c)(3) nonprofit, edited by Ivan Oransky and Adam Marcus.
DOAJ withdrawn-journals log
Directory of Open Access Journals · Misconduct-removal flag
DOAJ maintains a public record of journals removed from the directory for reasons including editorial misconduct, predatory practices, and ceased publication. DOAJ-withdrawn screening is not currently active in Sentry: DOAJ does not publish a reliable machine-readable withdrawn-journals feed (the historical change log is being retired), so rather than ship an unreliable check we have held it back. When a dependable source is available, Sentry will surface these as “DOAJ-withdrawn” with the reason category.
Reference: Directory of Open Access Journals (DOAJ).
Retraction Watch Hijacked Journal Checker
Tracker of titles being impersonated by predatory websites
Retraction Watch maintains a Hijacked Journal Checker (public Google Sheet) tracking journals being impersonated by predatory clone websites that mimic established venues. Sentry mirrors this list weekly into a clone-domain set, over a bundled floor that ships with the app so the check works even between syncs.
We match by clone URL/domain only — never by ISSN or journal name. This is deliberate, and it is the safety-critical design choice of this check: a hijacked clone steals the legitimate journal’s exact ISSN and title, so matching on either would falsely flag legitimately-cited papers in the real journal. The clone’s web domain is unique to the clone, so it is the only false-positive-free signal. As a safety belt we also subtract every legitimate-journal domain from the clone set. A reference is flagged only when its text cites a known clone domain; a paper that resolves cleanly via Crossref is never flagged.
When flagged, Sentry names the clone domain and notes that hijacked sites copy a real journal’s name and ISSN, so the reader should confirm the cited content is actually from the legitimate journal rather than the clone.
Reference: Retraction Watch Hijacked Journal Checker.
Severity precedence
When multiple checks return positives for the same reference, Sentry collapses to a single highest-severity verdict for the card UI:
- Retracted — trumps everything; remove or replace the citation
- Expression of Concern — publisher-level concern, paper not retracted
- Hijacked Journal — reference cites a known clone domain; verify the citation source
- Correction — corrected/erratum/addendum; verify any specific numerical claims
- Clean — DOI resolves, no integrity flags
- Unverifiable — no DOI in the entry, or DOI did not resolve via Crossref
Hijacked-journal outranks correction because it’s a venue-level integrity category: a correction in a normal journal is a footnote, but a reference that cites a hijacked-journal clone is one reviewers may flag regardless of the paper’s own merit.
Data handling
- Pasted bibliographies are not stored. Submitted text is sent to the parser and discarded. Per-DOI Crossref responses are cached for 24 hours; per-bibliography results are cached server-side for 24 hours keyed by content hash, with no original text stored alongside the cache key.
- No account, no email gate. The tool works without sign-up. Rate limits are per-IP, not per-account: 20 checks per hour, 80 per day.
- Share URLs are not indexed. Per-result share URLs (
/share/{token}) carrynoindexso they don’t dilute the canonical tool page in search results. - Crossref polite-pool identification. All Crossref API calls include the
mailto:erik@manusights.comidentifier in the User-Agent, per Crossref’s requested practice.
How to cite Manusights Sentry
If you reference Sentry in a manuscript methods section or supplementary materials, please cite as:
Manusights. (2026). Sentry v1.0: Reference Integrity Checker (Crossref + Retraction Watch + Hijacked Journal Checker) [Free academic tool]. https://manusights.com/tools/reference-integrity Methodology: https://manusights.com/tools/reference-integrity/methodology (Accessed: YYYY-MM-DD)
When citing the underlying data sources directly (which we recommend for methods-section disclosures), please credit Crossref, Retraction Watch (ISSN 2692-4579), and the Retraction Watch Hijacked Journal Checker. See the About page for full attribution.
Want every citation claim verified, not just every reference? The full Manusights Readiness Scan reads your entire manuscript: it runs the same Crossref + Retraction Watch + hijacked-journal checks AND verifies whether each citation actually supports the claim it’s attached to, alongside methods, statistics, and journal fit. Free preview, $39 only if you want the full report.
Run the full readiness scan