Immunology · eLife 2025

AI Diagnostic vs. eLife Open Peer Review

A skin resident memory T cell paper with two public reviewer assessments. We ran our diagnostic blind, then compared every finding against eLife’s published evaluations.

5/10

Reviewer concerns matched

+7

Issues reviewers missed

15 min

Analysis time

$49

Diagnostic cost

The paper

“Epidermal resident memory T cell fitness requires antigen encounter in the skin”

Weiss et al. investigate how local antigen encounter during tissue seeding programs long-term fitness of skin-resident memory T cells (TRM). Using a dual-flank vaccinia/DNFB model, they identify TGFβRIII as a TCR-inducible co-receptor that lowers the TGFβ threshold for epidermal TRM persistence. The paper includes scRNA-seq, APL dose-response, and inducible CD8-specific Tgfbr3 knockout experiments.

DOI: 10.7554/eLife.1070962 public reviewerseLife Assessment: “important” finding, “convincing” evidence

Issue-by-issue comparison

Tgfbr3 knockout only phenocopies bystander TRM under pharmacological TGFβ stress, not at steady state

Matched

Both AI and Reviewer 1 flagged that the KO phenotype is conditional on CWHM12 treatment.

In vitro TCR stimulation data (Fig 6A) is mechanistically underdetermined

Matched

AI identified need for calcineurin inhibitor controls; Reviewer 2 questioned whether anti-CD3/CD28 cleanly attributes upregulation to TCR signaling.

scRNA-seq cluster 3 needs independent validation

Matched

AI recommended flow cytometry/IF validation; Reviewer 1 requested independent protein-level confirmation.

TGFβRIII measurement needed in high/low avidity peptide model

Matched

AI Section 5 recommended adding TGFβRIII readout to APL experiment; Reviewer 2 made the same request.

Need physiological TGFβ-limiting conditions to test model (not just pharmacological)

Matched

AI Weakness #1 and Reviewer 1 both noted the reliance on CWHM12 rather than physiological competition.

FTY720 half-life concern — drug may wear off before recall challenge

Reviewer only

Reviewer 1 raised concerns about FTY720 pharmacokinetics affecting recall experiment interpretation.

TGFβRIII regulation at RNA vs protein level discrepancy

Reviewer only

Reviewer 2 noted potential disconnect between mRNA and surface protein levels of TGFβRIII.

Quantify skin draining lymph node CD8+ T cells

Reviewer only

Reviewer 2 requested LN quantification to rule out differential drainage effects.

Reliance on prior published models without repeating key controls

Reviewer only

Reviewer 2 noted that some conclusions depend on previously published data without independent replication.

Single experimental model — antigen-independent factors could contribute

Reviewer only

Reviewer 1 questioned whether VV-specific inflammatory signals, not just antigen, drive the fitness advantage.

Missing Kurd et al. (2020) on TRM transcriptional heterogeneity

AI only

AI identified that scRNA-seq cluster 3 is analogous to differentiated TRM states in Kurd et al. intestinal data.

Missing Bromley et al. (2020) on CD49a-mediated TRM persistence

AI only

CD49a expression could confound the persistence advantage attributed to TGFβRIII.

Discrepancy with Solouki et al. (2020) on TCR signal strength and memory formation

AI only

Solouki showed reduced TCR signal enhanced circulating memory; this paper shows opposite for TRM. Authors don't address this.

In vitro TCR stimulation needs calcineurin inhibitor and cytokine-only controls

AI only

Without pathway-specific controls, Fig 6A cannot distinguish TCR-specific from general activation.

Sample sizes inconsistently reported across experiments

AI only

N numbers absent or unclear in Figs. 3C-G, 4C-D, and 6E-J.

Figure presentation issues — missing error bars, colorblind inaccessibility

AI only

Panels 1C, 1F, 3C, 3F, 3G, 4C, 4D lack error bars; UMAP color scheme not colorblind-accessible.

HTODemux batch effects could generate spurious scRNA-seq clustering

AI only

Cluster 3 could reflect hashtag capture efficiency differences rather than biological state.

“The AI diagnostic matched half of the reviewer concerns raised by two eLife experts, and identified seven additional issues none of them caught — including missing engagement with three directly relevant published studies.”

Manusights validation analysis, March 2026

Why this case study matters

eLife publishes all reviewer assessments publicly. That makes it the gold standard for validating any AI review system — you can compare output directly against what actual experts wrote, with no cherry-picking.

Our diagnostic caught 5 of the 10 major concerns the two reviewers raised. But it also found 7 issues they didn’t flag: missing citations to three directly relevant papers (Kurd et al., Bromley et al., Solouki et al.), the need for pathway-specific TCR controls, inconsistent sample size reporting, figure presentation problems, and potential batch effects in the scRNA-seq clustering.

The 5 concerns reviewers caught that we missed were more nuanced experimental design questions — FTY720 pharmacokinetics, RNA-vs-protein level discrepancies, and reliance on a single experimental model. These are the kinds of domain-specific judgment calls where experienced human reviewers still have an edge.

The takeaway: AI and human review are complementary. A $49 diagnostic in 15 minutes catches different things than months of expert evaluation. Used together, they provide coverage neither achieves alone.