Below are a list of current research projects. For information on previously completed research projects, click here.
Current Research Projects
PI: Tyler McCormick
Sponsor: National Institute of Health (NIH)
Project Period: May 1, 2015 - January 31, 2020
Title: "Estimating vital rates in the developing world: A Bayesian process modeling approach"
Abstract: Though quality data on basic population indicators such as births and deaths are vital for forming and evaluating policy and public health, only a small subset of the world's countries maintain ongoing, full coverage civil registration systems. This proposal contributes statistical methods to combine demographic data which arise from multiple sources, using differing sampling frames, and are of highly variable quality. The proposed methods leverage lessons learned from surveys and demographic surveillance systems to develop new strategies for estimating temporal trends in fertility and quantify non-sampling errors that present major obstacles to obtaining reliable fertility rates from surveys.
PI: Elena Erosheva
Project Period: April 1, 2013 - March 31, 2016
Title: "Respondent-driven Sampling for Highly Structured Populations"
Abstract: A network-based type of sampling technique and the corresponding
set of estimates, known as Respondent- Driven Sampling (RDS), is the
current method of choice for many researchers studying hard-to-reach or
hidden populations. RDS exploits social networks by starting with a small
set of individuals and allowing the respondents at each wave to recruit
the next wave of the sample from their contacts. However, it is often
unclear whether important assumptions of RDS estimators about the
population-specific network structure and the chain-referral recruitment
process are satisfied. In this project, focusing on population clustering
structures, we will (1) Infer relational structures from egocentric data
that are important for RDS feasibility; (2) develop a comprehensive
simulation study framework for assessing RDS feasibility; and (3) extend
the model-assisted approach to inference from RDS data to account for
population clustering. We will apply these new methods to unique
observational data on the size and structure of social networks of older
GLBT adults from the study Caring and Aging with Pride to inform computer
simulations of both social networks and RDS chain-referral processes in
order to systematically study the quality of potential RDS estimators in
this hard-to-reach population. We will make these methods available in the
R-package RDSAnalyst so they can be used by applied RDS researchers to
decide whether RDS is warranted in a fashion similar to the sample size
computation prior to a funding request for traditional survey research.
PI: Tyler McCormick
Sponsor: The Reagan Udall Foundation
Project Period: September 16, 2012 - September 25, 2015
Title: "Evaluating Bayesian Methods for Predictive Models"
Abstract: The self-controlled case-series method (Farrington, 1995) compares rates of outcome events during times when a person is exposed to a drug versus outcome event rates during unexposed periods. In essence, each person serves as his/her own control. This feature naturally accounts for covariates that do not vary with time and means that only cases where a given event occurred are used in analysis, greatly reducing computation. The proposed work extends the current Bayesian multiple self-controlled case series in two ways. First, the current implementation uses the maximum of the posterior distribution as an estimate for the drug-effects. Though computationally efficient, this approach provides results that are only single-number summaries of the parameters. Using recent computational developments, however, we will implement a fully Bayesian approach where inference is done by sampling from the posterior distribution, thus generating uncertainty estimates for the parameters. Second, we will introduce hierarchical structure in the model based on associations between drugs and events. Multiple drugs of the same class can be modeled as having the same prior mean, for example, which encourages borrowing strength across similar drugs. Similarly, we can encourage sharing information across events or classes of events (musculoskeletal events, for example).
PI: Tyler McCormick
Sponsor: US Army Research Office (ARO)
Project Period: August 3, 2012 - August 2, 2015
Title: "Taming Twitter: Using Social Media Networks to Identify Deviant Behavior"
Abstract: Our goal is to identify actors in social media networks who are likely to engage in non-normative or deviant behavior (such as being arrested or drunk driving). Our research will be informed by sociological theories on stigma and deviance. More specifically, we hope to use these theoretical paradigms to understand why people choose to disclose deviant behavior and the characteristics of the social networks of these individuals.
PI: Adrian Raftery
Project Period: March 1, 2012 - February 28, 2017
Title: "Probabilistic Population Projections for All Countries"
The United Nations publishes updated estimates and projections of the populations of all the world's countries, broken down by age and sex. These are widely used by international organizations, governments, the private sector and researchers, for example for climate modeling and for assessing progress towards the Millenium Development Goals. The UN's current projections are deterministic, but assessing uncertainty about population estimates and projections is important for policy-making and other purposes. We propose to develop a fully probabilistic population projection methodology.
We will develop methods for probabilistic projection of fertility and mortality, taking account of within-country and between-country correlations. We will develop methods for probabilistic projection of international migration. We will develop methods for probabilistic population projections in countries with generalized sexually transmitted infectious disease epidemics, which require special methods because the demographic impact of such diseases is massive and different from most other diseases, being concentrated among the least vulnerable parts of the population, namely young sexually active adults. We will develop methods for reconstructing past populations with uncertainty from fragmentary data.
We will produce publicly available software for implementing the new methods.
PI: Adrian Dobra
Project Period: August 1, 2011 - July 31, 2015
Title: "ATD Collaborative Research: Statistical Ensembles for the Identification
of Bacterial Genomes
As defined by the Center for Disease Control and Prevention, a bioterrorism attack is
the deliberate release of viruses, bacteria, or other germs used to cause illness or death
in people, animals, or plants. The use of micro-organisms to cause disease is a growing
concern for public health officials and national defense agencies, in light of the terrorist
attacks of September 11, 2001, and the subsequent releases of anthrax to individuals in
congress and the media. There exists biological agents that, if used effectively as biological
weapons, could cause substantial public health challenge in terms of our ability to limit the
damage to both our citizens and our nations. One of the scientific initiatives to reduce the
threat of bioterrorism is the development of mathematical and statistical methods for the
rapid identification of genome differences and the accurate classification of bacterial genomes
as harmless or potentially pathogenic. The main objective of this proposal is the development
of high dimensional classification and clustering tools for this purpose. We consider three
statistical approaches to the identification of bacterial genomes in a given bacterial "soup":
(1) classification by overlap enrichment; (2) comparison of empirical clusterings and
consensus genomes; and (3) shrinkage estimation and model selection in hierararchial log-
PI: Adrian Raftery
Project Period: January 15, 2012 - December 31, 2016
Title: "Bayesian Estimation of Prevalence and At-Risk Group Size in Sexually
Transmitted Infection Epidemics"
The goal of this proposal is to develop new statistical methods for estimating
prevalence and the size of at-risk groups in sexually transmitted infection epidemics. We also aim to estimate other policy-relevant quantities such as the number of orphans and children impacted, and treatment needs. We consider two types of epidemic: generalized epidemics, in which the disease is spread throughout the general population, and concentrated epidemics, in which the disease is largely confined to at-risk groups such as intravenous drug users, sex workers and men who have sex with men. Our goal is to develop methods appropriate for countries with sparse data, most of which are developing countries. For generalized epidemics, we propose a susceptible-infected model with a stochastic infection rate. We will develop a Bayesian approach to estimating the model from clinic data over time and sparse household surveys. We will extend the model to take account of changes in treatment availability, and to produce provincial as well as national estimates. For concentrated epidemics, we will first develop new integrated Bayesian methods for estimating the sizes of the main at-risk groups from fragmentary data, including mapping or hotspot data, behavioral surveillance data, program enrollment data and the overlaps between them. Much recent data comes from two relatively new network-based data collection methods, respondent-driven sampling (RDS) and the network scale-up method. We will develop methods for estimating unknown population size from multiple data sources, including RDS and network scale-up. We will then develop methods for estimating at-risk group size and prevalence over time, using a dynamic Bayesian model. We will produce publicly available software to implement our new methods and make them available to the research community and policy-makers.
PI: Peter Hoff
Project Period: September 1, 2011 - August 31, 2015
Title: "Analyzing Social Networks and Behavior"
The goal of this grant is to develop statistical methods and software for the joint analysis of networks and nadal attribute data. The methods will be based on extensions of well-studied and familiar data analysis methods such as factor analysis, linear regression and probit models. The project will provide:
- statistical tests and descriptions of the relationship between a network and nadal attributes.
- predicition and imputation of network information based on nodal attribute data.
- prediction and imputation of nodal attirbutes based on network data.
- estimation and inference in the presence of missing network and nodal data.
- a class of dynaic network models that can be extended into the time domain.
- open source statistical software that will be accessible to researchers.