Statistical Sciences & Operations Research Seminar
Organized by Indranil Sahoo.
| Date | Time | Location | Speaker | Affiliation | Title |
|---|---|---|---|---|---|
| Aug. 28 | 11:00 A.M. | Harris Hall 4119 | Samuel Anyaso-Samuel | National Cancer Institute | Inference for current status observations in a multi-state setting |
| Sep. 11 | 11:00 A.M. | Harris Hall 4119 | Enakshi Saha | University of South Carolina Arnold School of Public Health | Harnessing the Power of Empirical Bayes for Network-informed Precision Medicine |
| Sep. 25 | 11:00 A.M. | Harris Hall 4119 | Grace S. Chiu | Batten School of Coastal and Marine Sciences, William & Mary / Virginia Institute of Marine Science | Model-based calibration of fishery resource survey data as a change-of-support problem |
| Nov. 13 | 11:00 A.M. | Harris Hall 4119 |
Madison Griffin |
Department of Natural Resources, William & Mary’s Batten School of Coastal & Marine Sciences AND Virginia Institute of Marine Science | Big shells, bigger data: cohort analysis of Chesapeake Bay Crassostrea virginica reefs |
| Nov. 13 | 11:00 A.M. | Harris Hall 4119 | Stephen Tivenan | Virginia Commonwealth University | A Statistical Framework for Detecting Climate Boundary Shifts |
Multistate models are widely used to characterize transitions between discrete states in progressive disease processes or biological systems. In many practical settings, exact transition times are not observed; instead, current status data, where each subject is observed only once at a random inspection time, are collected. This leads to a severely interval-censored (case I) multivariate survival problem. In this talk, I will present nonparametric methods for both marginal and conditional estimation of transition and occupation probabilities in multistate systems under current status observation. I will also discuss regression models for evaluating the effects of covariates on these temporal functions. Special attention will be given to estimation in the presence of cluster-correlated data and strategies for adjusting bias due to informative clustering. Applications will include analyses of periodontal disease progression in an understudied population and breast cancer progression in a European cohort.
Molecular mechanisms of complex diseases such as cancer often vary across patients, leading to differences in risk, progression, and treatment response. The central premise of precision medicine is to tailor preventative and therapeutic strategies to individual patients, by accounting for this heterogeneity in disease mechanisms. Recently, gene regulatory networks, which capture interactions among genes and their molecular regulators, have proven to be valuable tools for uncovering disease mechanisms, paving the way for network medicine. However, conventional approaches for biological network inference typically estimate population-level networks that average out individual heterogeneity. This limitation arises because individual-level data are scarce, and most statistical methods require reasonably sized samples to extract meaningful signals. In this talk, I will introduce an Empirical Bayes framework for estimating individual-specific networks that reflect person-specific disease biology. By integrating population-level prior information with individual-specific omics data, our framework recovers both shared and unique regulatory patterns across individuals. Our methods are highly scalable to large scale multi-omics datasets. I will describe two such methods: one for individual-specific co- expression networks, and another for individual-specific multi-omic Gaussian graphical models. Applications to simulated and human cancer datasets demonstrate that these methods not only recover accurate interactions between omics data types, but they also reveal patient-level network differences linked to clinical outcomes. This work highlights the potential for Empirical Bayes as a principled and practical strategy for individualized network inference, with broad implications for precision medicine.
In fish species abundance studies, a major challenge in deriving an absolute abundance estimate across a continental-scale spatial domain lies in the fact that regional survey teams possess and deploy different gear types, each with its unique field of view that produces gear-specific relative abundance observations. In a continental-scale study, the dataset from any regional survey must be converted from the gear-specific relative abundance scale to an absolute abundance scale, so that regional observations can be combined to estimate a continental scale absolute abundance. In this paper, we develop an operational conversion tool that takes regional gear-based data as input, and produces as output the required conversion, with associated uncertainty. Methodologically, the conversion tool is operationalized from a Bayesian hierarchical model which we develop in an inferential context that is akin to the change-of-support problem often encountered in large-scale spatial studies; the actual context here is to reconcile abundance data observed at various gear-specific scales, some being relative, and others, absolute. To this end, we consider data from a small-scale calibration experiment in which 2 to 4 different underwater video camera types were simultaneously deployed on each of 21 boat trips. Alongside each suite of deployed cameras was also an acoustic echosounder that recorded fish signals along a set of surrounding transects. While the echosounder records data on the absolute scale, it is subject to confounding from acoustically similar species, thus requiring an externally derived correction factor. Conversely, a camera allows visual distinction between species but records data on a relative scale specific to the camera type. Our statistical modeling framework reflects the relationship among all five gear types across the 21 boat trips, and the resulting model is used to derive calibration formulae that translate camera-specific relative abundance data to the corrected absolute abundance scale whenever a camera is deployed alone. Cross-validation is conducted using mark-recapture abundance estimates (only available for 10 trips, all observed at the same type of habitat). We also briefly discuss the case when one camera type is deployed alongside the echosounder.
Oysters in Virginia Chesapeake Bay oyster reefs are “age-truncated”, possibly due to a combination of historical overfishing, disease epizootics, environmental degradation, and climate change. Research has suggested that oysters exhibit resilience to environmental stressors; however, that evidence is based on the current limited understanding of oyster lifespan. Until this paper, the Virginia Oyster Stock Assessment and Replenishment Archive (VOSARA), a spatially and temporally expansive dataset (222 reefs across 2003-2023) of shell lengths (SL, mm), had yet to be examined comprehensively in the context of resilience. We develop a novel method using Gaussian mixture modeling (GMM) to identify the age groups in each reef using yearly SL data and then link those age groups over time to identify cohorts and estimate their lifespan. Sixty-four reefs (29%) are deemed to have sufficient data (at least 300 oysters sampled for a minimum of 8 consecutive years) for this analysis. We fit univariate GMMs for each year (t) and reef (r) for each of the seven river strata (R) to estimate 1) the mean and standard deviation of SL for each a_{Rrt}-th age group, and 2) the mixture percentage of each a_{Rrt} age group. We link age groups across time to infer age ohorts by developing a mechanistic algorithm that prevents the shrinking of shell length when an a_ {Rrt}-th group becomes an a_{R,r,t+1}-th group. Our method shows promise in identifying oyster cohorts and estimating lifespan solely using SL data. Our results show signals of resiliency in almost all river systems: oyster cohorts live longer and grow larger in the mid-to-late 2010s compared to the early 2000s.
The African Sahel is known for being the transitional region between hyper arid climate to a sub tropical humid climate. The regional climate is affected by periods of rainfall from year to year. Due to natural weather cyclical fluctuations, scientists have questioned the fluctuations of the Sahara Desert boundaries and its impact on the region's economies and ecosystems. Using the Köppen-Trewartha (KT) arid classifications we use Canny Edge Detection to extract the boundary points between the Sahel and Saharan region. With the extracted points we use a Heteroskedastic Gaussian Process model to estimate and predict the demarcated boundary line at different time periods during the years 1960-1989. We then apply a scaled Global envelope to test whether a boundary line has changed with respect. With the implementation of our methods, we are able to model a spatial boundary and apply a hypothesis framework to determine if there is difference between two respective boundaries.