ExamplesmanusightsFree preview
Manuscript
Manusights feedback

Biomedical

Accounting for contact network uncertainty in epidemic inferences

A simulation-based-inference method for epidemic models when the underlying contact network is only partially observed, applied to a real Shark Bay dolphin tuberculosis-skin-disease dataset. The method is genuinely interesting; the manuscript currently has model-definition contradictions and a missing reproducibility package that would draw a desk-reject from PLOS Computational Biology.

Abstract

Inferring the parameters of an epidemic process when the underlying contact network is only partially observed is a fundamental open problem in network epidemiology. We present a simulation-based-inference method that combines a Mixture Density Network compressor with Approximate Bayesian Computation (MDN-ABC) to jointly infer epidemic parameters and the latent contact structure from periodic observations of node disease status and noisy network samples. We validate the approach on simulated epidemics over Erdős–Rényi and log-normal-degree networks, and apply it to tuberculosis-skin-disease (TSD) surveillance data on Shark Bay bottlenose dolphins. The method recovers transmission parameters with calibrated uncertainty under both correctly specified and mildly misspecified network priors, and produces age-resolved susceptibility estimates for the dolphin application.

1. Introduction

A central challenge in network epidemiology is that the contact structure A on which an outbreak unfolds is rarely directly observed. Self-reported contact diaries, automated proximity sensors, and genomic-based reconstruction each give noisy partial views. Standard approaches either ignore network uncertainty (treating A as known) or marginalize over a prior class (e.g. assuming Erdős–Rényi structure). Both can substantially mis-state uncertainty in the inferred transmission parameters when the true contact distribution differs from the assumed prior.

In this work we develop a simulation-based-inference framework that jointly accounts for epidemic parameter uncertainty and contact network uncertainty. The core idea is to parameterize the network observation process explicitly, then use Approximate Bayesian Computation with a Mixture Density Network compression of the joint epidemic-network summary statistics. The result is a calibrated posterior over both the latent network A and the disease parameters θ=(β,γ), requiring only forward simulation of the joint process.

2. Methods: MDN-ABC for epidemic-network joint inference

Let X={Xt​}t=0T​ denote the periodic node-status observations, Y={Yw​}w=1W​ the noisy yearly network samples, and A the latent true contact network. We target the joint posterior

P(θ,A,ϕ∣X,Y)∝P(X∣θ,A)P(Y∣A,ϕ)P(A)P(θ)P(ϕ),

where ϕ parameterizes the network observation process. The likelihood is intractable so we approximate the posterior via simulation: sample (θ′,A′,ϕ′) from the prior, simulate (X′,Y′), compute a learned summary-statistic distance d(⋅,⋅), and accept the proposed parameters within an ABC tolerance band. The MDN compressor is trained offline to minimize the Bayes-optimal summary loss for θ.

3. Simulation study: Erdős–Rényi and log-normal degree networks

We simulate continuous-time SIR epidemics on networks of N=200 nodes. Transmission rate β and recovery rate γ are defined as continuous-time-Markov rates,

β=Δt→0lim​ΔtP(Wt+Δti​=I∣Wti​=S,∑j​Ctij​Wtj​>0)​.

We periodically observe binary disease status of each node every 7 timesteps over 50 timesteps total. The status of the node is considered to be 1 if the node is infected and 0 if susceptible or recovered. We re-simulate the original epidemic 10 times for each scenario to produce posterior bands.

Two ground-truth network classes are used: Erdős–Rényi with mean degree 4, and log-normal-degree with mean degree 4. The observation model assumes Bernoulli edge sampling, Aij​∼Bernoulli(ρ), which matches the ER prior but is a misspecified prior for the log-normal case — we use this contrast to characterize how robust posterior coverage is to network-prior misspecification.

4. Application: Tuberculosis-skin-disease in Shark Bay dolphins

We apply MDN-ABC to a real surveillance dataset on Shark Bay bottlenose dolphins, where individuals are followed across years and skin-lesion presence is recorded as a binary observation. Yearly inferred contact networks Yw​ for w=1,…,5 come from focal-follow association indices.

In Section 4.1 we model TSD as a discrete-time SIR process on the latent contact graph, with age-class-specific susceptibility for calves and adults. Disease status is Wti​∈{S,E,I,R} (with the role of E specified later in Sec. 4.1.1) and transmission probabilities are

βc​=P(Wt+Δti​=E via j∣Wti​=S,Wtj​=I,Cti​=calf).

In Section 4.2 we model the noisy yearly network observations. Observed edge counts are negative-binomially distributed conditional on whether a true edge is present in that year:

Xwij​∣Awij​=0∼NegBin(n0w​,p0w​),Xwij​∣Awij​=1∼NegBin(n1w​,p1w​).

Priors are n0w​,n1w​,p0w​∼Gamma(2,4) and ρw​∼Beta(1,20). The full vector of network observation parameters is ϕ=(n01​,…,n05​,p01​,…,p05​,n11​,…,n15​,p11​,…,p15​,ρ1​,…,ρ5​). We fix p0w​=1 for all w to address weak identifiability.

5. Results: posterior coverage and age-resolved susceptibility

For the simulation studies, the MDN-ABC posterior recovers the true (β,γ) to within nominal coverage when the network prior matches the data-generating process (ER case), and shows mild but well-calibrated coverage degradation under network-prior misspecification (log-normal case). Network-uncertainty marginalization is essential — fixing A to a point estimate produces severely overconfident posteriors on (β,γ).

For the dolphin TSD application, the joint posterior shows that calf-class susceptibility is materially higher than adult-class susceptibility, with separability driven by the differential transmission-probability priors specified in Section 4.1. We discuss the sensitivity of this conclusion to those priors in Section 6.

6. Discussion

Limitations: in the dolphin application we modeled TSD as a simple SIR disease, which assumes that the incubation period for TSD is short relative to our simulation time-steps. The age-class susceptibility result is partly driven by the choice of unequal priors over calf vs. adult transmission probabilities; sensitivity analyses with weaker priors are reported in the appendix.

Future extensions include: (i) explicit SEIR-with-latent-period dynamics; (ii) joint inference over the network observation process parameters ϕ rather than fixing per-year quantities; and (iii) extension to heterogeneous infectiousness with Weibull infectious-period distributions, the duration of infectiousness being distributed as Weibull with shape γa​ and scale γb​.

References

  1. [1] M. E. J. Newman, SIAM Rev. 45, 167 (2003).

  2. [2] L. A. Meyers, B. Pourbohloul, M. E. J. Newman, et al., J. Theor. Biol. 232, 71 (2005).

  3. [3] A. Vazquez et al., Proc. Natl. Acad. Sci. USA 101, 17940 (2004).

  4. [4] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, A. Vespignani, Rev. Mod. Phys. 87, 925 (2015).

  5. [5] M. J. Keeling and K. T. D. Eames, J. R. Soc. Interface 2, 295 (2005).

  6. [6] M. Salathé and J. H. Jones, PLOS Comput. Biol. 6, e1000736 (2010).

  7. [7] M. Salathé et al., Proc. Natl. Acad. Sci. USA 107, 22020 (2010).

  8. [8] L. Stack et al., Theor. Popul. Biol. 137, 1 (2021).

  9. [9] J. Sherborne et al., PLOS Comput. Biol. 14, e1006003 (2018).

  10. [10] M. A. Beaumont, W. Zhang, D. J. Balding, Genetics 162, 2025 (2002).

  11. [11] J.-M. Marin et al., Stat. Comput. 22, 1167 (2012).

  12. [12] D. Greenberg, M. Nonnenmacher, J. Macke, ICML (2019).

  13. [13] G. Papamakarios, I. Murray, NeurIPS (2016).

  14. [14] T. McKinley et al., Bayesian Anal. 13, 1 (2018).

  15. [15] K. Csilléry et al., Trends Ecol. Evol. 25, 410 (2010).

  16. [16] K. R. Bisset et al., J. Comput. Sci. 5, 28 (2014).

  17. [17] L. Mancastroppa et al., Phys. Rev. Res. 4, 043196 (2022).

  18. [18] M. Smith et al., Methods Ecol. Evol. 11, 836 (2020).

  19. [19] J. Mann, R. C. Connor, P. L. Tyack, H. Whitehead (eds.), Cetacean Societies (2000).

  20. [37] B. Carpenter et al., J. Stat. Softw. 76, 1 (2017).

  21. [52] M. D. Hoffman and A. Gelman, J. Mach. Learn. Res. 15, 1593 (2014).

Distilled from the public preprint at arXiv:2404.02924 · q-bio.PE / Statistics — applications

This is a Manusights review of a publicly-available preprint. The preprint authors are not affiliated with Manusights and have not endorsed this review. Reproduced for illustrative purposes; full text remains at the source.

Overall Feedback

Do not submit yet — strong premise, model-definition gaps, missing reproducibility package

Likely outcome if submitted today: desk reject — chiefly because of missing journal-required elements (Author Summary for PLOS Comput. Biol., ethics/permit statement for the dolphin field data, public availability for analysis data and code, complete references), but also because the dolphin disease model is internally contradictory (described as SIR but using an exposed state E with no E→I transition specified) and the network observation model has a parameter that is simultaneously fixed and assigned a Gamma prior on a [0,1]-bounded probability.

The statistical premise — joint posterior over epidemic parameters and a latent contact network using MDN-ABC — is genuinely strong and the simulation calibration is the right scaffolding. The single most important fix this week is to rewrite the methods around a single explicit generative model for each study (one for the simulation, one for the dolphin application), with a compact table defining every latent variable, observed variable, parameter, prior, time scale, transition rule, and sampling step. Once that table exists, most of the contradictions below collapse into one or two lines of clarifying text.

Recommended target: PLOS Computational Biology remains plausible after substantial revision, especially if framed as a simulation-based inference method for epidemics on uncertain networks. Cascade fallback: Epidemics or Journal of Theoretical Biology if method validation is strengthened; PLOS ONE only if the paper remains primarily a demonstration. Time to submission readiness: 6–10 weeks of focused revision, assuming code/data deposition and additional validation can be completed promptly.

Detailed Feedback

  • High severity.
    #4Statistical auditSec. 4.2

    Negative-binomial probability parameter has both a fixed value and a Gamma prior — at least one must be wrong

    Section 4.2 defines the network observation likelihood as Xwij​∣Awij​=0∼NegBin(n0w​,p0w​) and Xwij​∣Awij​=1∼NegBin(n1w​,p1w​). It then says "we will fix p0w​=1 for all w" but the priors block declares p0w​∼Gamma(2,4). p0w​ is therefore both a fixed constant and a sampled parameter — a contradiction that any HMC implementation will silently resolve in favor of one or the other.

    Two further problems compound this. First, p1w​ has no prior at all. Second, a Gamma prior is generally invalid for a probability parameter constrained to [0,1] — Gamma puts mass on values larger than 1. The standard parametric choice for a probability parameter is a Beta. If the negative-binomial is parameterized in mean-dispersion form instead (μ,θ), positive priors on those parameters are fine but p0w​ should not appear in the prior block.

    Suggested fix

    Choose one parameterization. If keeping the (n, p) form: declare p₀w as fixed in the model spec, remove it from the prior block, add a Beta(α, β) prior on p₁w, and verify your HMC sampler is treating each parameter consistently. If switching to mean-dispersion: rewrite Eq. (4.2) in terms of (μ, θ), specify positive priors on both, and remove all p references. Add a one-line "we use the (n, p) parameterization with p as a probability parameter" or equivalent note.

  • Low severity.
    #12Figure critiqueFigs. 3, 5

    Figure 3 legend lists 11 instance labels (0–10) but text says 10 instances; Figure 5 axis labels duplicated

    Section 3 says "we re-simulate the original epidemic 10 times for each scenario" but Figure 3's legend appears to list 0 through 10 (11 labels). Figure 5 uses inconsistent time indexing — Section 4.2 defines yearly networks for w = 1, ..., 5, but the caption refers to "for time w = 0." Both will be flagged in copy-edit and are minor in isolation, but together they signal that the figure pass was rushed.

    Suggested fix

    Decide whether 10 or 11 instances are shown, relabel Fig. 3 accordingly. Pick one indexing convention for Fig. 5 and apply consistently.

  • High severity.
    #1MethodologySec. 4.1; Discussion

    The dolphin disease model is described as SIR but uses an exposed state E with no E→I transition specified

    Section 4.1 says: "we will model TSD as a discrete-time SIR process," but then defines node state Wti​∈{S,E,I,R} and writes transmission probabilities into E (e.g. βc​=P(Wt+Δti​=E via j∣…)). No transition rule from E→I is given anywhere. The Discussion then states "we modeled TSD as a simple SIR disease, which assumes that the incubation period for TSD is short relative to our simulation time-steps" — but a "short incubation" is not a substitute for actually defining whether the model is SIR or SEIR.

    This is the load-bearing model for the entire empirical application. A reviewer reading Section 4.1 cannot tell whether the dolphin posteriors are conditioned on an SIR likelihood (no latent compartment) or an SEIR likelihood (with a latent compartment of unspecified duration). The age-class susceptibility result — the headline empirical claim — depends on which one. Pick one model, write the transition rules cleanly, and remove the unused state from the other.

    Suggested fix

    Decide whether the TSD model is SIR or SEIR. If SIR, replace E with I throughout the infection transition and remove "exposed" language. If SEIR, add latent-period parameters (rate or distribution), state the E→I transition rule explicitly, and report a sensitivity analysis to the latent-period prior. The Discussion claim about "short incubation" then becomes an explicit modeling assumption rather than a hedge.

  • High severity.
    #2MethodologySec. 4.1; Sec. 5; Discussion

    Age-class susceptibility conclusion is partly pre-imposed by the priors, not learned from data

    The headline empirical finding for the dolphin application — that calves are materially more susceptible than adults — depends on the choice of unequal priors over calf-class and adult-class transmission probabilities. The Discussion notes this in passing ("the age-class susceptibility result is partly driven by the choice of unequal priors") but the main text does not quantify how much of the posterior separability is data-driven versus prior-driven.

    For PLOS Comput. Biol., this is the kind of conclusion a reviewer will isolate. The standard fix is to (a) report the posterior under matched priors over calf and adult susceptibility, (b) show the prior-to-posterior shift on a single panel for both classes, and (c) make a clean statement: "with matched priors, the calf-vs-adult posterior separation is X; with the differential priors used here, it is Y; the data contribute Z to the gap." That is the analysis a careful reader needs to assess the claim.

    Suggested fix

    Add a sensitivity-analysis subsection to Sec. 5 reporting the posterior under matched calf/adult susceptibility priors. Report the prior-to-posterior shift visually and quantify the data-driven vs. prior-driven contribution to the age-class result. If the result attenuates substantially under matched priors, soften the abstract claim.

  • High severity.
    #3MethodologySec. 3

    Simulation log-normal scenario uses an Erdős–Rényi observation model — interpret as misspecification, not as a network prior

    Section 3 says the true network is either Erdős–Rényi with mean degree 4 or log-normal-degree with mean degree 4. The observation/sampling model assumes Aij​∼Bernoulli(ρ), which is an Erdős–Rényi prior. Two interpretations are possible: (a) the log-normal scenario is a misspecified-prior experiment and the manuscript should say so explicitly; (b) the network prior should match the data-generating process for the log-normal case. The current text does neither.

    This matters because the simulation studies are the manuscript's evidence that posterior coverage is well-calibrated. If the log-normal case is a misspecification experiment, the calibration claim becomes "calibrated under correctly specified network prior, gracefully degraded under misspecification" — which is a different and more interesting result than "calibrated under both" but also has different implications for the dolphin application (where the true contact distribution is unknown).

    Suggested fix

    Reframe Sec. 3 explicitly: "We use an Erdős–Rényi network prior throughout. The log-normal scenario tests robustness to network-prior misspecification; we do not claim calibration when the prior is wrong." Then either add a matching log-normal-prior comparison run, or note that doing so is straightforward but out of scope and that the dolphin application uses the more flexible prior described in Sec. 4.2.

  • High severity.
    #6Internal consistencySec. 3

    Binary disease-status labels are reversed in the simulation description

    Section 3 states: "We periodically observe the binary disease status of each node: 'infected' for nodes in the susceptible or recovered state, and 'not infected' for nodes in the infected state." This contradicts the next sentence: "The status of the node is considered to be 1 if the node is infected, and 0 if the node is susceptible or recovered." A reviewer reading either definition will lose confidence in the simulation pipeline until the contradiction is resolved.

    Suggested fix

    Change the first sentence to say "not infected" for susceptible/recovered and "infected" for infected. Both statements should match the natural convention.

  • High severity.
    #13ReproducibilityMethods; end matter

    No code repository, archived version, or software-version statement for an SBI methods paper

    The manuscript names STAN as the implementation software ("Similar to [37], we will implement this model in STAN, which utilizes a Hamiltonian Monte Carlo algorithm") but does not provide a code repository, archived version, release tag, DOI, or software version. For a computational methods paper at PLOS Computational Biology, code availability is an explicit submission requirement; for a Mixture-Density-Network-compressor + ABC pipeline specifically, reviewers will not assess calibration claims without being able to re-run the analysis.

    Suggested fix

    Deposit the analysis code on GitHub with a Zenodo DOI for the version submitted. Add a Code Availability statement: "Analysis code, including the MDN training pipeline, ABC sampler, and simulation harness, is available at github.com/<repo> and archived at Zenodo doi:10.5281/zenodo.<id>. STAN version 2.35; Python version 3.11; full dependency list in environment.yml."

  • High severity.
    #14Reporting guidelineEnd matter

    PLOS Comput. Biol. requires an Author Summary; ethics/permit statement for animal field data is missing

    PLOS Computational Biology requires every research article to include an Author Summary (a 150–200-word non-technical summary written for a broader audience). The manuscript does not include one. The Shark Bay dolphin work is observational field research on wild marine mammals; PLOS journals require an ethics statement covering field permits, animal-handling protocol approval, and the corresponding national/institutional regulatory authority (in Australia: typically a state-level Department of Biodiversity wildlife permit and university Animal Ethics Committee approval). Neither is present.

    Suggested fix

    Add an Author Summary section after the Abstract, 150–200 words, written for a non-specialist scientific audience. Add an Ethics Statement: "Field observations on wild Tursiops aduncus in Shark Bay, Western Australia, were conducted under [Permit / DBCA license number] and approved by the [University] Animal Ethics Committee under protocol [number]." If the permits/approvals are not yet obtained, this is a hard blocker for the submission.

  • Medium severity.
    #5Novelty assessmentSec. 1; Sec. 7

    Positioning vs Sherborne et al., Stack et al., and the SBI literature is left implicit

    The Introduction motivates contact-network uncertainty as an open problem but does not separate this work's contribution from three near neighbors: (a) Sherborne et al. 2018 (PLOS Comput. Biol.), which fits epidemic models on partially observed networks; (b) Stack et al. 2021 (Theor. Popul. Biol.), which infers networks from epidemic data; and (c) the broader simulation-based-inference literature (Greenberg-Nonnenmacher-Macke ICML 2019, Papamakarios-Murray NeurIPS 2016) which has produced several MDN-style approaches in the last 5 years.

    A reviewer at PLOS Comput. Biol. who knows any of these three lines will look for a one-paragraph statement of what this paper adds: e.g. "(a) is method development without a real-world application; (b) infers networks from epidemic data alone, not jointly; (c) is general SBI machinery not specialized to epidemic-network joint inference. The present work specializes the MDN-ABC framework to the joint epidemic-network setting and demonstrates calibration on simulated data plus a real Shark Bay dolphin application." That paragraph does not exist in the current draft.

    Suggested fix

    Add a "Related work" paragraph at the end of Sec. 1 (or a short Sec. 1.1) explicitly distinguishing this work from Sherborne, Stack, Greenberg-Nonnenmacher-Macke, and Papamakarios-Murray. Three sentences each, with the contrast spelled out. This converts an asserted contribution ("we develop") into a positioned contribution ("we develop X, which extends Y by Z").

  • Medium severity.
    #7Internal consistencySec. 3

    Continuous-time SIR rate definitions conflict with discrete-time simulation language

    Section 3 defines transmission and recovery as continuous-time-Markov rates, β=limΔt→0​(…)/Δt and similarly for γ. But the same section says "We continue our simulated epidemic for 50 timesteps" and observes every 7 timesteps, language that fits a discrete-time approximation rather than a continuous-time Markov simulator. The reader cannot tell which one was implemented.

    Suggested fix

    State explicitly: "We simulate the continuous-time Markov SIR process via the Doob–Gillespie algorithm and record observations at discrete times t ∈ {0, 7, 14, ...}." If a discrete-time approximation was actually used, define the per-step transition probabilities directly and remove the continuous-time-rate definitions.

  • Medium severity.
    #15Submission readinessSubmission package

    References are cited but the bibliography appears truncated; data availability statement is missing

    The manuscript cites Sherborne, Stack, Beaumont, Greenberg, Papamakarios, Carpenter, Hoffman-Gelman, and several Mann/Connor dolphin-society works, but the supplied file appears to truncate the reference list (no entries past [55]). The data availability statement is missing: simulated data should be reproducible from code; the dolphin TSD surveillance data needs an explicit access statement (institutional repository, controlled-access pointer, or "available from the corresponding author on reasonable request, subject to data-use agreement with [institution]"). Both are PLOS submission requirements.

    Suggested fix

    Verify the reference list is complete in the submission file. Add a Data Availability Statement covering: (a) simulated data — link to the code repository with seed lists; (b) Shark Bay TSD data — institutional repository or DUA pointer; (c) any derived intermediate data files. PLOS requires these as a separate end-matter section, not buried in Methods.

  • Low severity.
    #8Internal consistencySec. 4.1

    Weibull infectious-period parameters use two different notations in the same paragraph

    Section 4.1 defines: "The duration of infectiousness is distributed as a Weibull distribution with shape γa​ and scale γb​." The priors are then written as γk​∼Uniform(0.2,5) and γλ​∼Uniform(10,160). The reader has to infer that γk​≡γa​ (shape) and γλ​≡γb​ (scale).

    Suggested fix

    Use one notation throughout — γ_shape and γ_scale, or γ_a and γ_b — and update the prior block to match. Half a line of edit, but the kind of inconsistency that gets flagged in copy-edit anyway.

  • Low severity.
    #9Internal consistencySec. 4.2

    Network parameter vector φ includes p₀w which is fixed; either remove or label as constant

    Section 4.2 defines the network-observation parameter vector ϕ=(n01​,…,n05​,p01​,…,p05​,n11​,…,n15​,p11​,…,p15​,ρ1​,…,ρ5​) but then fixes p0w​=1 for all w. The vector ϕ is presented as the object being inferred (P(A, ϕ | X)), so a reader will assume every component is sampled. Remove fixed components from ϕ, or state explicitly that p₀w entries are constants.

    Suggested fix

    Redefine φ without p₀w entries. State separately: "the constants p₀w = 1 are fixed by identifiability and not included in φ." Update Eq. (2.3) and Sec. 4.2 to be consistent.

  • Low severity.
    #10Internal consistencyMethods (Sec. 2.3); Appendix

    A vs A_t / A_w: static-network and time-indexed-network notation are used inconsistently

    The Methods define the true contact network as A (static), but Section 2.3 and the Appendix use At​ (time-indexed) — e.g. P(At​,ϕ∣X) and "we will treat At​ as a nuisance parameter." This is mostly cosmetic but it does create one substantive ambiguity: in the dolphin application the network IS time-indexed by year (Aw​), and the reader cannot tell from notation alone whether the framework treats per-year networks as the same object as a static A.

    Suggested fix

    Use A for the static-network case (Sec. 3 simulation), A_w for the yearly dolphin networks, and never A_t — there is no per-timestep network object in this paper.

  • Low severity.
    #11Internal consistencySec. 2.3; Sec. 4.2

    Algorithm in Sec. 2.3 has duplicated step numbering and is referenced as "Algorithm 1" without a label

    Section 2.3 lists Step 3 twice — once as "Using A′ and θ′..." and once as "Given distance function d...". Section 4.2 then says "In Step 2 of Algorithm 1..." but no algorithm in the manuscript is formally labeled Algorithm 1.

    Suggested fix

    Wrap the Sec. 2.3 algorithm in a labeled \begin{algorithm} environment titled "Algorithm 1: MDN-ABC posterior sampling." Renumber the steps so each appears once.

Manusights produces this depth of feedback for your manuscript. Start with a free preview, or get the full submission package if you are preparing to send.

Run a free preview