CSSS

CENTER FOR STATISTICS AND THE SOCIAL SCIENCES,
TALK ABSTRACTS

UW

Home
People
Departments
Seminars
Working
Papers
Student Seminar
Research
Abstracts
Seed Grants
Travel Grants
Undergrad Research Grants
Consulting
Courses
Ph.D. Tracks
Blalock Fellowship
Newsletters
Photos
Links
Conference Room/Equipment Reservation
Computing
Math Camp

Elena Erosheva, Stephen Fienberg, and John Lafferty

"Exploring Internal Structure of PNAS Publications: A Hierarchical Model for Text and References"

Presented at Case Studies in Bayesian Statistics 7, Carnegie Mellon University, Pittsburgh, PA, September 2003.

The Proceedings of the National Academy of Sciences is one of the world's most cited multidisciplinary scientific journals. PNAS publishes research reports in the Physical, Biological, and Social Sciences. The journal's official classification structure is reflected in topic labels submitted by the authors of manuscripts, largely related to traditionally established disciplines within the Physical, Biological, and Social Sciences. Focusing on articles in the Biological Sciences, we explore their internal soft classification structure based only on semantic decompositions of abstracts and bibliographies, and compare it with the formal discipline classifications.

Our hierarchical model assumes that there is a fixed number of internal categories, each characterized by multinomial distributions over words (in abstracts) and references (in bibliographies). Soft classification for each article is based on proportions of the article's content coming from each category. Using eight internal categories in the model, we find that most articles have major soft classification components in more than one internal category.



UW - CSSS: Wednesday, 17-Sep-2003 12:43:21 PDT Contact: Webmaster or CSSS