Below are a list of current research projects. For information on previously completed research projects, click here.
Current Research Projects
PI: Elena Erosheva
Project Period: April 1, 2013 - March 31, 2015
Title: "Respondent-driven Sampling for Highly Structured Populations"
Abstract: A network-based type of sampling technique and the corresponding
set of estimates, known as Respondent- Driven Sampling (RDS), is the
current method of choice for many researchers studying hard-to-reach or
hidden populations. RDS exploits social networks by starting with a small
set of individuals and allowing the respondents at each wave to recruit
the next wave of the sample from their contacts. However, it is often
unclear whether important assumptions of RDS estimators about the
population-specific network structure and the chain-referral recruitment
process are satisfied. In this project, focusing on population clustering
structures, we will (1) Infer relational structures from egocentric data
that are important for RDS feasibility; (2) develop a comprehensive
simulation study framework for assessing RDS feasibility; and (3) extend
the model-assisted approach to inference from RDS data to account for
population clustering. We will apply these new methods to unique
observational data on the size and structure of social networks of older
GLBT adults from the study Caring and Aging with Pride to inform computer
simulations of both social networks and RDS chain-referral processes in
order to systematically study the quality of potential RDS estimators in
this hard-to-reach population. We will make these methods available in the
R-package RDSAnalyst so they can be used by applied RDS researchers to
decide whether RDS is warranted in a fashion similar to the sample size
computation prior to a funding request for traditional survey research.
PI: Tyler McCormick
Sponsor: US Army Research Office (ARO)
Project Period: August 3, 2012 - December 2, 2014
Title: "Taming Twitter: Using Social Media Networks to Identify Deviant Behavior"
Abstract: Our goal is to identify actors in social media networks who are likely to engage in non-normative or deviant behavior (such as being arrested or drunk driving). Our research will be informed by sociological theories on stigma and deviance. More specifically, we hope to use these theoretical paradigms to understand why people choose to disclose deviant behavior and the characteristics of the social networks of these individuals.
PI: Adrian Raftery
Project Period: March 1, 2012 – February 28, 2017
Title: "Probabilistic Population Projections for All Countries"
The United Nations publishes updated estimates and projections of the populations of all the world's countries, broken down by age and sex. These are widely used by international organizations, governments, the private sector and researchers, for example for climate modeling and for assessing progress towards the Millenium Development Goals. The UN's current projections are deterministic, but assessing uncertainty about population estimates and projections is important for policy-making and other purposes. We propose to develop a fully probabilistic population projection methodology.
We will develop methods for probabilistic projection of fertility and mortality, taking account of within-country and between-country correlations. We will develop methods for probabilistic projection of international migration. We will develop methods for probabilistic population projections in countries with generalized sexually transmitted infectious disease epidemics, which require special methods because the demographic impact of such diseases is massive and different from most other diseases, being concentrated among the least vulnerable parts of the population, namely young sexually active adults. We will develop methods for reconstructing past populations with uncertainty from fragmentary data.
We will produce publicly available software for implementing the new methods.
PI: Adrian Dobra
Project Period: August 1, 2011 – July 31, 2015
Title: "ATD Collaborative Research: Statistical Ensembles for the Identification
of Bacterial Genomes
As defined by the Center for Disease Control and Prevention, a bioterrorism attack is
the deliberate release of viruses, bacteria, or other germs used to cause illness or death
in people, animals, or plants. The use of micro-organisms to cause disease is a growing
concern for public health officials and national defense agencies, in light of the terrorist
attacks of September 11, 2001, and the subsequent releases of anthrax to individuals in
congress and the media. There exists biological agents that, if used effectively as biological
weapons, could cause substantial public health challenge in terms of our ability to limit the
damage to both our citizens and our nations. One of the scientific initiatives to reduce the
threat of bioterrorism is the development of mathematical and statistical methods for the
rapid identification of genome differences and the accurate classification of bacterial genomes
as harmless or potentially pathogenic. The main objective of this proposal is the development
of high dimensional classification and clustering tools for this purpose. We consider three
statistical approaches to the identification of bacterial genomes in a given bacterial “soup”:
(1) classification by overlap enrichment; (2) comparison of empirical clusterings and
consensus genomes; and (3) shrinkage estimation and model selection in hierararchial log-
PI: Adrian Raftery
Project Period: January 15, 2012 – December 31, 2016
Title: "Bayesian Estimation of Prevalence and At-Risk Group Size in Sexually
Transmitted Infection Epidemics"
The goal of this proposal is to develop new statistical methods for estimating
prevalence and the size of at-risk groups in sexually transmitted infection epidemics. We also aim to estimate other policy-relevant quantities such as the number of orphans and children impacted, and treatment needs. We consider two types of epidemic: generalized epidemics, in which the disease is spread throughout the general population, and concentrated epidemics, in which the disease is largely confined to at-risk groups such as intravenous drug users, sex workers and men who have sex with men. Our goal is to develop methods appropriate for countries with sparse data, most of which are developing countries. For generalized epidemics, we propose a susceptible-infected model with a stochastic infection rate. We will develop a Bayesian approach to estimating the model from clinic data over time and sparse household surveys. We will extend the model to take account of changes in treatment availability, and to produce provincial as well as national estimates. For concentrated epidemics, we will first develop new integrated Bayesian methods for estimating the sizes of the main at-risk groups from fragmentary data, including mapping or hotspot data, behavioral surveillance data, program enrollment data and the overlaps between them. Much recent data comes from two relatively new network-based data collection methods, respondent-driven sampling (RDS) and the network scale-up method. We will develop methods for estimating unknown population size from multiple data sources, including RDS and network scale-up. We will then develop methods for estimating at-risk group size and prevalence over time, using a dynamic Bayesian model. We will produce publicly available software to implement our new methods and make them available to the research community and policy-makers.
PI: Peter Hoff
Project Period: September 1, 2011 - August 31, 2015
Title: "Analyzing Social Networks and Behavior"
The goal of this grant is to develop statistical methods and software for the joint analysis of networks and nadal attribute data. The methods will be based on extensions of well-studied and familiar data analysis methods such as factor analysis, linear regression and probit models. The project will provide:
- statistical tests and descriptions of the relationship between a network and nadal attributes.
- predicition and imputation of network information based on nodal attribute data.
- prediction and imputation of nodal attirbutes based on network data.
- estimation and inference in the presence of missing network and nodal data.
- a class of dynaic network models that can be extended into the time domain.
- open source statistical software that will be accessible to researchers.