Home Bibliography CV Projects News Talks Search
Links/Contact VA openVA MTB Etc. Biography Refresh




The openVA Team recently completed work on pyOpenVA with Jason Thomas responsible for most of the coding. pyOpenVA is a re-implementation of openVA, InterVA5, and InSilicoVA to address the fact that many users were not able to easily use the R versions of the software - maintaining installations of R and Java was not practically feasible. Additionally, crossVA and InSilicoVA were painfully slow on many users computers. So, pyOpenVA

For more information or help using the openVA tools, contact us using these email addresses:


Developing an Agenda for Population Aging and Social Research in Low- and Middle-Income Countries (LMICs): Proceedings of a Workshop (2024)

In early fall last year, I participated in a National Academy of Sciences workshop to support the National Institute on Aging in their medium- and long-range planning for programs related to research in lower- and middle-income countries. The report came out recently, and I think it's useful. Download the PDF: NAS/NIA LMIC Agenda Report.


Paper Accepted

I'm a little late with this - was actually accepted last year!


Resources for Scientific Writing, Presentations, and Curation of Data and Code


Data and Code


Interesting Papers


Pontol do Sul MTB

I broke down and bought a nice cross-country mountain bike. I've been riding around Pontol do Sul enjoying the beach and rain forest! There's been night riding, rain riding, and riding into the very strong wind on the beach. After some trial and error, a nice 18km loop has been worked out. The only real issue is mosquitos - very vicious and in staggering numbers in the forest. 100% DEET is required!

Sam on the rain forest side of the loop, Pontal do Sul
Sam riding on the beach, Pontal do Sul
Riding in the rain forest in the rain, Pontal do Sul


Huge Milestone! Completed First VA Methods Development Project with a VA Study using InSilicoVA published in The Lancet Global Health

Around 2010, I started thinking about working on verbal autopsy methods. Over the next few years, Basia Zaba and I decided to coordinate work that she was leading at the ALPHA Network of health and demographic surveillance system (HDSS) sites on identifying AIDS as a cause of death with the work that Tyler McCormick, Zehang (Richard) Li, and I were beginning to do on verbal autopsy cause-coding methods. We wove things together for a grant application that Basia was working on for the Gates Foundation, and we were funded by the Gates Foundation for a couple years to get started. During that time we came up with the basic idea for InSilicoVA, and I led the team to apply for an NIH R01 grant. We were supported by NICHD for five years to complete work on InSilicoVA and support the ALPHA Network sites and the London-based ALPHA secretariat to collect, clean, and harmonize VA data from all the sites. Basia, Tyler, Richard, and I had the idea that we'd develop a new and improved automated cause-coding method for VA and use the ALPHA data for testing and validation. We would then apply the method to the ALPHA Network VAs to produce a comparative description of cause-specific mortality through time, with a focus on HIV/AIDS. To make the method fully transparent and credible, we planned to 1) publish a technical paper in a good statistics journal, 2) create open source software to support that publication and make the method widely usable, and 3) conduct a detailed substantive investigation of the ALPHA Network's cause specific mortality using the new method and publish that in a good public/global health journal.

We have just completed the last of those tasks with the acceptance of the ALPHA Network cause-specific mortality paper by The Lancet Global Health - effectively the best journal for material like this, summary below. Clara Calvert, Milly Marston, Yue Chu, and myself with all of the ALPHA HDSS sites worked over many years to get this done! The methods paper was published in 2016 in the Journal of the American Statistical Association, one of the best statistics journals, see below. Finally, the software has turned into a major success and is the reference software supporting the WHO Standard VA used by many research and mortality surveillance groups worldwide. We published a paper describing the software and how to use it in The R Journal, below. In addition to the research software in R, we were supported by the NGO Vital Strategies to create a production version of the software implemented in Python and C++ by Jason Thomas. That version is very fast, easy to install in the standard way with no additional dependencies required, and easy to use through a GUI (public release in the next few weeks after final validation testing).

Along the way, we quickly convinced ourselves that the verbal autopsy interview is potentially a major source of error, omission, and general data quality issues. With Clarissa Surek-Clark, Nicole Angotti, and soon Brian Houle, we have observed the interview in many settings and are beginning work to improve and standardize it.

Altogether, especially measured by our original objectives, this project has been a total and overwhelming success! This is because of the fantastic team we had from the very beginning. Having this last paper accepted in such a great journal is particularly poignant given the arc of the project and the fact that we lost Basia halfway through.

The three key outputs of the project:

  1. The methods paper describes InSilicoVA - our new automated cause-coding algorithm for verbal autopsy. It was published in the Journal of the American Statistical Association in 2016: Probabilistic Cause-of-Death Assignment using Verbal Autopsies.
  2. We created and released open source, freely-available software for InSilicoVA and all of the other commonly-used verbal autopsy cause-coding algorithms (except Tariff 2.0) in the statistical programming environment R - called openVA. All of the packages are available at the Comprehensive R Archive Network (CRAN). We also maintain a Github repository with the code and a variety of additional resources. For users, we published a tutorial and user manual in the The R Journal: The openVA Toolkit for Verbal Autopsies in 2023.
  3. The ALPHA Network cause-specific morality paper is coming out in The Lancet Global Health sometime in the next few weeks:

    Temporal Changes in Cause of Death Among Adolescents and Adults in Six Countries in Eastern And Southern Africa: A Multi-Country Cohort Study using Verbal Autopsy Data by Yue Chu, Milly Marston, Albert Dube, Charles Festo, Eveline Geubbels, Simon Gregson, Kobus Herbst, Chodziwadziwa Kabudula, Kathleen Kahn, Tom Lutalo, Louisa Moorhouse, Robert Newton, Constance Nyamukapa, Ronald Makanga, Emma Slaymaker, Mark Urassa, Abdhalah Ziraba, Clara Calvert, and Samuel J. Clark


    Background. The absence of high-quality comprehensive civil registration and vital statistics systems across many settings in Africa has led to limited empirical data on causes of death in the region.

    Methods. We harmonized verbal autopsy (VA) and residency data from nine health and demographic surveillance system (HDSS) sites across Eastern and Southern Africa, each with variable coverage across the period 1995-2019. InSilicoVA, a probabilistic model, was used to assign cause of death based on the signs and symptoms reported in the VA. Levels and trends in all-cause and cause-specific mortality rates and cause-specific mortality fractions were calculated, stratified by HDSS site, sex, age, and calendar periods.

    Findings. All-cause mortality has generally decreased across the HDSS sites, particularly for adults aged 20-59. In many of the HDSS sites, these decreases were driven by reductions in HIV/TB-related deaths. For 2010-2014, the top causes of death were: road traffic accidents, HIV/TB and meningitis/sepsis for adolescents (12-19 years), HIV/TB for adults (20-59 years), and neoplasms and cardiovascular disease for older adults (>59 years). There was greater between-HDSS and between-sex variation in causes of death for adolescents compared to adults.

    Interpretation. This study shows that there has been progress in reducing mortality across Eastern and Southern Africa but also points to age, sex and between-HDSS differences in causes of adolescent and adult deaths. This highlights the importance of detailed local-level data to inform health needs to ensure continued improvements in survival.


Three Papers Accepted

Three papers that have been in the works for a long time have been accepted -


Reference Death Archive Kickoff at WHO

Kobus Herbst, Yue Chu, and I have developed the Reference Death Archive pilot database over the past few months. In mid November we visited WHO and started the process of handing it off to become the WHO-hosted Reference Death Archive for verbal autopsy under Doris Ma Fat's supervision. This is a significant milestone in the project, and we're all excited about the progress.

Sam, Yue, Doris, and Kobus at WHO


Moved to Brazil for a Year

For my sabbatical year, I have moved with my family to Brazil. Clarissa and I will work on our reference death project with the Department of Pathology at the University of São Paulo, and we will spend time with Clarissa's family in Curitiba.

Atlantic Rain Forest, Parana, Brazil
River in Atlantic Rain Forest, Parana, Brazil
Waterfall in Atlantic Rain Forest, Parana, Brazil


Five Manuscripts R/R and South Africa/Brazil Exchange

Super exciting: my fabulous colleagues and I have five papers at an advanced R/R stage - feels like the pandemic is finally working out of our pipeline!

Also, finally doing something that I'd wanted to do for some time (again, the pandemic ...) - helping my South African and Brazilian colleagues connect around cause of death ascertainment. Brazilian colleague Luiz Fernando (Burns) who directs the mortality surveillance unit in the Department of Pathology at the University of São Paulo, Ryan Wagner who co-directs the new minimally-invasive tissue sample (MITS) project at the Agincourt health and demographic surveillance system (HDSS) site, and Alison Castle at the Africa Health Research Institute (AHRI) who has just started an autopsy project on community deaths. Burns is visiting the South African sites this week with me and Clarissa, and a South African contingent will visit São Paulo later this year, again with me and Clarissa.


World Population Fractions Update - for 2022 WPP

Here's an update to my plot showing how the world population is distributed across major regions from 1950-2100 according to the 2022 edition of the UN Population Division's World Population Prospects. Notice the ever increasing importance of Africa! Code to do this is in this Github repo.


Contributed to an issue of the National Geographic Magazine

I spent some time with the authors and editors preparing a recent issue of National Geographic Magazine editing and creating figures that interpret the UN Population Division's World Population Prospects population estimates and forecasts, including the one here: Will Nigeria’s booming population lead it to prosperity or poverty?


"The openVA Toolkit for Verbal Autopsies" appeared in The R Journal today.

Li, Z., J. Thomas, E. Choi, T.H. McCormick, and S.J. Clark (2023). The openVA Toolkit for Verbal Autopsies The R JournalLink ]

See openVA Toolkit just below for details.


Accepted in BMJ Open

Houle, B., C.W. Kabudula, D. Gareta, K. Herbst, and S.J. Clark (Accepted 2023). Household Structure, Composition, and Child Mortality in the Unfolding Antiretroviral Therapy Era in Rural South Africa: Comparative Evidence from Population Surveillance, 2000-2015. BMJ Open.


Objectives: The structure and composition of the household has important influences on child mortality. However, little is known about these factors in HIV-endemic areas and how associations may change with the introduction and widespread availability of antiretroviral treatment (ART). We use comparative, longitudinal data from two demographic surveillance sites in rural South Africa (2000-2015) on mortality of children younger than five years (n=101,105).

Design: We use multilevel discrete time event history analysis to estimate children’s probability of dying by their matrilineal residential arrangements. We also test if associations have changed over time with ART availability.

Setting: Rural South Africa.

Participants: Children younger than five years (n=101,105).

Results: 3,603 children died between 2000-2015. Mortality risks differed by co-residence patterns along with different types of kin present in the household. Children in nuclear households with both parents had the lowest risk of dying compared to all other household types. Associations with kin and child mortality were moderated by parental status. Having older siblings lowered the probability of dying only for children in a household with both parents (relative risk ratio (RRR)=0.736 95% CI [0.633, 0.855]). Only in the later ART period was there evidence that older adult kin lowered the probability of dying for children in single parent households (RRR=0.753 95% CI [0.664, 0.853]).

Conclusions: Our findings provide comparative evidence of how differential household profiles may place children at higher mortality risk. Formative research is needed to understand the role of other household kin in promoting child well-being, particularly in one-parent households that are increasingly prevalent.


"The openVA Toolkit for Verbal Autopsies" will appear in the R Journal

This is one of the longest in-preparation papers I've ever worked on. Thanks to the persistence of Richard Li and other coauthors, we finally have a statistical software paper in the R Journal - along with the collection of packages it describes in CRAN:

This software and InSilicoVA - our new automated algorithm for classifying cause of death using verbal autopsy data - are primary outputs of our NIH project supported by NIH R01HD086227 from NICHD.

Recently, support for the continued development and maintenance of the software has come from Vital Strategies and the CDC Foundation as part of Bloomberg Philanthropies' Data for Health initiative.


New Grant

I have a new grant from the Bill & Melinda Gates Foundation for $2.04M to support the openVA Team to work closely with the WHO to create a pathology-informed reference death archive for verbal autopsy to be hosted by the WHO in Geneva. The first deaths will come from the MITS Alliance, the CHAMPS project, and the mortality surveillance system (SVO) in several cities in Brazil. The aim is to create pathology-informed symptom-cause information (like training data) for automated algorithms that identify likely causes for deaths with verbal autopsy.


UN World Population Prospects 2022

The UN Population Division recently released the 2022 iteration of their bi-annual World Population Prospects WPP global population estimates and projections. I led a small team including Jon Muir and Brian Houle to develop a mortality model for HIV-affected countries that was used to produce the 2022 WPP. OSU undergraduate student Michael Allen also contributed to the early stages of our work a couple years ago.


How to Update a Web Site Using Git

Here's markdown file and PDF describing how to set up git to automatically update a web site.


New York Times Article on Verbal Autopsy

This is a nice article on death registration and verbal autopsy - very high level overview for lay readers!

Although the openVA Team is not mentioned, we work closely with many of the organizations and people mentioned or quoted in this article.

A Door-to-Door Effort to Find Out Who Died Helps Low-Income Countries Aid the Living


Papers in Annual Meeting of the Population Association of America (PAA) 2022


New in Global Health Action

Chandramohan, D., E. Fottrell, J. Leitao, E. Nichols, S. J. CLARK, C. Alsokhn, D. C. Munoz, C. AbouZahr, A. Di Pasquale, R. Mswia, E. Choi, F. Baiden, J. Thomas, I. Lyatuu, Z. Li, P. Larbi-Debrah, Y. Chu, S. Cheburet, O. Sankoh, A. M. Badr, D. M. Fat, P. Setel, R. Jakob, and D. de Savigny (2021). Estimating Causes Of Death Where There Is No Medical Certification: Evolution And State of The Art Of Verbal Autopsy. Global Health Action. [ DOI ]


Over the past 70 years, significant advances have been made in determining the causes of death in populations not served by official medical certification of cause at the time of death using a technique known as Verbal Autopsy (VA). VA involves an interview of the family or caregivers of the deceased after a suitable bereavement interval about the circumstances, signs and symptoms of the deceased in the period leading to death. The VA interview data are then interpreted by physicians or, more recently, computer algorithms, to assign a probable cause of death. VA was originally developed and applied in field research settings. This paper traces the evolution of VA methods with special emphasis on the World Health Organization’s (WHO)’s efforts to standardize VA instruments and methods for expanded use in routine health information and vital statistics systems in low- and middle-income countries (LMICs). These advances in VA methods are culminating this year with the release of the 2022 WHO Standard Verbal Autopsy (VA) Toolkit. This paper highlights the many contributions the late Professor Peter Byass made to the current VA standards and methods, most notably, the development of InterVA, the most commonly used automated computer algorithm for interpreting data collected in the WHO standard instruments, and the capacity building in low- and middle-income countries (LMICs) that he promoted. This paper also provides an overview of the methods used to improve the current WHO VA standards, a catalogue of the changes and improvements in the instruments, and a mapping of current applications of the WHO VA standard approach in LMICs. It also provides access to tools and guidance needed for VA implementation in Civil Registration and Vital Statistics Systems at scale.


New in Global Health Action

Herbst, K., S. Juvekar, M. Jasseh, Y. Berhane, N. T. K. Chuk, J. Seeley, O. Sankoh, S. J. CLARK and M. Collinson (2021). Health And Demographic Surveillance Systems In Low- And Middle-income Countries: History, State of The Art And Future Prospects. Global Health Action. [ DOI ]


Health and Demographic Surveillance Systems (HDSS) have been developed in several low- and middle-income countries (LMICs) in Africa and Asia. This paper reviews their history, state of the art and future potential and highlights substantial areas of contribution by the late Professor Peter Byass.

Historically, HDSS appeared in the second half of the twentieth century, responding to a dearth of accurate population data in poorly resourced settings to contextualise the study of interventions to improve health and well-being. The progress of the development of this network is described starting with Pholela, and progressing through Gwembe, Balabgarh, Niakhar, Matlab, Navrongo, Agincourt, Farafenni, and Butajira, and the emergence of the INDEPTH Network in the early 1990’s

The paper describes the HDSS methodology, data, strengths, and limitations. The strengths are particularly their temporal coverage, detail, dense linkage, and the fact that they exist in chronically under-documented populations in LMICs where HDSS sites operate. The main limitations are generalisability to a national population and a potential Hawthorne effect, whereby the project itself may have changed characteristics of the population.

The future will include advances in HDSS data harmonisation, accessibility, and protection. Key applications of the data are to validate and assess bias in other datasets. A strong collaboration between a national HDSS network and the national statistics office is modelled in South Africa and Sierra Leone, and it is possible that other low- to middle-income countries will see the benefit and take this approach.


New in BMC Public Health

Houle, B., C.W. Kabudula, A.M. Tilstra, S.A. Mojola, E. Schatz, S.J. CLARK, N. Angotti, F.X. Gómez-Olivé, and J. Menken (2022). Twin epidemics: The Effects of HIV and Systolic Blood Pressure on Mortality Risk in Rural South Africa, 2010-2019. BMC Public Health. [ DOI ]


Background. Sub-Saharan African settings are experiencing dual epidemics of HIV and hypertension. We investigate effects of each condition on mortality and further examine whether HIV and hypertension interact in determining mortality.

Methods. Data come from the 2010 Ha Nakekela population-based survey of individuals ages 40 and older (1,802 women; 1,107 men) nested in the Agincourt Health and socio-Demographic Surveillance System in rural South Africa, which provides mortality follow-up from population surveillance until mid-2019. Using discrete-time event history models stratified by sex, we assessed differential mortality risks according to baseline measures of HIV infection, HIV-1 RNA viral load, and systolic blood pressure.

Results. During the 8-year follow-up period, mortality was high (477 deaths). 37% of men (mortality rate 987.53/100,00, 95% CI: 986.26 to 988.79) and 25% of women (mortality rate 937.28/100,000, 95% CI: 899.7 to 974.88) died. Over a quarter of participants were living with HIV (PLWH) at baseline, over 50% of whom had unsuppressed viral loads. The share of the population with a systolic blood pressure of 140mm Hg or higher increased from 24% at ages 40-59 to 50% at ages 75-plus and was generally higher for those not living with HIV compared to PLWH. Men and women with unsuppressed viral load had elevated mortality risks (men: adjusted odds ratio (aOR) 3.23, 95% CI: 2.21 to 4.71, women: (OR 2.05, 95% CI: 1.27 to 3.30). There was a weak, non-linear relationship between systolic blood pressure and higher mortality risk. We found no significant interaction between systolic blood pressure and HIV status for either men or women (p>0.05).

Conclusions. Our results indicate that HIV and elevated blood pressure are acting as separate, non-interacting epidemics affecting high proportions of the older adult population. PLWH with unsuppressed viral load were at higher mortality risk compared to those uninfected. Systolic blood pressure was a mortality risk factor independent of HIV status. As antiretroviral therapy becomes more widespread, further longitudinal follow-up is needed to understand how the dynamics of increased longevity and multimorbidity among people living with both HIV and high blood pressure, as well as the emergence of COVID-19, may alter these patterns.


I discovered something new (to me) that will be useful. Elements of the vector defined by the perpendicular projection of a vector \( \vec{p} = \{p_1,p_2\} \) onto the line \( y=x \) are the arithmetic mean of \( p_1 \) and \( p_2 \): \begin{align} \mbox{proj}_\vec{p}\hat{e} &= \frac{\vec{p} \cdot \hat{e}}{\hat{e} \cdot \hat{e}} \hat{e} \\ &= \frac{p_1e + p_2e}{e^2 + e^2} \hat{e} \\ &= \frac{p_1 + p_2}{2e} \hat{e} \\ &= \left\{\frac{p_1 + p_2}{2},\frac{p_1 + p_2}{2}\right\} \\ \end{align} where '\( \cdot \)' is the dot product, \(\hat{e} = \{e_1,e_2\} \) is the unit vector in the direction of \( y=x \), and \(e = e_1 = e_2 \).

I noticed something else interesting (also new to me). The projection of a point onto every line through the origin forms a circle. In the image below, the red point (2,4.5) is projected onto a set of the lines through the origin and each is marked with a black dot and connected with a red line.


Published in Annals of Epidemiology

Norris Turner, A., D. Kline, A. Norris,  W.G. Phillips, E. Root, J. Wakefield, Z. Li, S. Lemeshow, M. Spahnie, A. Luff, Y. Chu, M.K. Francis, M. Gallo, P. Chakraborty, M. Lindstrom, G. Lozanski,  W. Miller, S.J. CLARK (2021). Prevalence of Current and Past COVID-19 in Ohio Adults. Annals of Epidemiology. [ DOI ]


Purpose. To estimate the prevalence of current and past COVID-19 in Ohio adults.

Methods. We used stratified, probability-proportionate-to-size cluster sampling. During July 2020, we enrolled 727 randomly-sampled adult English- and Spanish-speaking participants through a household survey. Participants provided nasopharyngeal swabs and blood samples to detect current and past COVID-19. We used Bayesian latent class models with multilevel regression and poststratification to calculate the adjusted prevalence of current and past COVID-19. We accounted for the potential effects of non–ignorable non–response bias.

Results. The estimated statewide prevalence of current COVID-19 was 0.9% (95% credible interval: 0.1%–2.0%), corresponding to ∼85,000 prevalent infections (95% credible interval: 6,300–177,000) in Ohio adults during the study period. The estimated statewide prevalence of past COVID-19 was 1.3% (95% credible interval: 0.2%–2.7%), corresponding to ∼118,000 Ohio adults (95% credible interval: 22,000–240,000). Estimates did not change meaningfully due to non–response bias.

Conclusions. Total COVID-19 cases in Ohio in July 2020 were approximately 3.5 times as high as diagnosed cases. The lack of broad COVID-19 screening in the United States early in the pandemic resulted in a paucity of population-representative prevalence data, limiting the ability to measure the effects of statewide control efforts.


In July 2020 a large group of colleagues at The Ohio State University collaborated with the Ohio State Department of Health to conduct a probability-based sample survey representative of adults living in Ohio in order to estimate state-level CV19 prevalence of current and past infections.

This article describes the results of the survey for a public health audience. We developed a new Bayesian poststratification method to produce estimates from the data -described in PNAS.

Abigail Norris Turner led the overall study conducted by a large group of collaborators at The Ohio State University and Ohio State Department of Health.


New working paper on arXiv

Li, Z. R., Z. Wu, I. Chen, and S. J. CLARK (2021). Bayesian Nested Latent Class Models for Cause-of-Death Assignment using Verbal Autopsies Across Multiple Domains. arXiv Preprint arXiv:2112.12186. [ PDF ]


Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many low- and middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs as labeled data are usually unavailable in the target population. This article proposes a Latent Class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assign cause of death for out-of-domain observations, and estimate cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop an efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary materials for this article and the R package to implement the model are available online.


This appeared on my Twitter feed recently: Demography Abandons Its Core by Ron Lee in 2001. Unfortunately this is still highly relevant, and those of us interested in formal demography as a recognizable field need to do something! The Formal Demography Working Group is being formed as a vehicle for doing that - please join if you are interested!


Today Mary Shenk and I are discussants at the 16th De Jong Lecture in Social Demography at Penn State. Hans Peter-Peter Kohler is the featured speaker. Link to slides.


For the past 6-7 years my colleagues and I who work on verbal autopsy methods have been developing new statistical methods to automate identification of a cause of death using verbal autopsy records. Along the way we have developed a range of open source software tools to ensure that the methods are transparent and available to anyone who wants to use them. Recently we submitted a paper and posted a preprint on the openVA Toolit. This is a suite of open source software that can be used to apply a variety of algorithms to verbal autopsy data. Additional software that works with openVA - e.g. the openVA Pipeline that fully automates cause-coding in CRVS settings - and links to the GitHub repositories where software is maintained are available at openva.net.


Published open-access in PNAS:

Kline, D., Z. Li, Y. Chu, J. Wakefield, W.C. Miller, A. Norris Turner, and S. J. CLARK (2021). Estimating Seroprevalence of SARS-CoV-2 in Ohio: A Bayesian Multilevel Poststratification Approach with Multiple Diagnostic Tests. Proceedings of the National Academy of Sciences 118(26), e2023947118. [ DOI ]


Globally, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected more than 59 million people and killed more than 1.39 million. Designing and monitoring interventions to slow and stop the spread of the virus require knowledge of how many people have been and are currently infected, where they live, and how they interact. The first step is an accurate assessment of the population prevalence of past infections. There are very few population-representative prevalence studies of SARS-CoV-2 infections, and only two states in the United States—Indiana and Connecticut—have reported probability-based sample surveys that characterize statewide prevalence of SARS-CoV-2. One of the difficulties is the fact that tests to detect and characterize SARS-CoV-2 coronavirus antibodies are new, are not well characterized, and generally function poorly. During July 2020, a survey representing all adults in the state of Ohio in the United States collected serum samples and information on protective behavior related to SARS-CoV-2 and coronavirus disease 2019 (COVID-19). Several features of the survey make it difficult to estimate past prevalence: 1) a low response rate; 2) a very low number of positive cases; and 3) the fact that multiple poor-quality serological tests were used to detect SARS-CoV-2 antibodies. We describe a Bayesian approach for analyzing the biomarker data that simultaneously addresses these challenges and characterizes the potential effect of selective response. The model does not require survey sample weights; accounts for multiple imperfect antibody test results; and characterizes uncertainty related to the sample survey and the multiple imperfect, potentially correlated tests.


In July 2020 a large group of colleagues at The Ohio State University collaborated with the Ohio State Department of Health to conduct a probability-based sample survey representative of adults living in Ohio in order to estimate state-level CV19 prevalence of current and past infections. Conducting the survey at that time presented many challenges, including a large non-response rate and an array of tests whose performance characteristics were poorly understood (we used all that we could), and very few positive results from any test. These two issues, and others including the sampling design, presented a particular challenge for analysis and led us to develop a new Bayesian/poststratification approach to estimate state-wide prevalence.

There are still few representative sample surveys of CV19 biomarkers. Most other approaches suffer from the possibility of very large, consequential bias. This paper should be useful for anyone analyzing the results from a similar survey. Although we cannot share the data easily, we have made R code available to replicate the analysis: Bayes Prevalence.

Abigail Norris Turner led the overall study conducted by a large group of collaborators at The Ohio State University and Ohio State Department of Health.

In a truly team effort, Dave Kline, Richard Li, Yue Chue, Jon Wakefield, Bill Miller, Abigail Norris Turner, and me conducted the analysis and developed the overall approach. It was an enjoyable and productive experience working with this team.


APHRC - the African Population Health Research Center - in Nairobi is seeking a consultant demographer for six months to lead the redesign of the Nairobi Urban Health and Demographic Surveillance System Site (NUHDSS). Full details available here.


Today and tomorrow the openVA Team is presenting a training workshop for the CHAMPS project and our Data for Health Initiative colleagues in Thailand.


A long-in-the-making collaboration with UNICEF produced its first non-academic product recently: Subnational Under-five Mortality Estimates, 1990–2019. This work grew out of a small collaboration with Jon Wakefield at the University of Washington, see small-area estimates. Jon grew the group working on it and together with Jessica Godwin saw it through to this. Congratulations to everyone involved!


I gave a talk today at the 2021 'Berlin Demography Days': Global Population Studies in the 21st Century: Priorities Challenges - Mortality.



I learned today that the Social and Behavioral Sciences division of the School of Arts and Sciences at The Ohio State University has chosen me as a recipient of the Joan N. Huber Faculty Fellow Award. A brief description here.

New working paper on arXiv

I posted a slightly edited version of a paper titled "Health and Demographic Surveillance Systems and the 2030 Agenda: Sustainable Development Goals" on arXiv. This paper was invited at a UN Population Division experts' group meeting titled 'Strengthening the Demographic Evidence Base for the Post-2015 Development Agenda' that happened in New York, USA October 5-6, 2015.


The health and demographic surveillance system (HDSS) is an old method for intensively monitoring a population to assess the effects of healthcare or other population-level interventions - often clinical trials. The strengths of HDSS include very detailed descriptions of whole populations with frequent updates. This often provides long time series of accurate population and health indicators for the HDSS study population. The primary weakness of HDSS is that the data describe only the HDSS study population and cannot be generalized beyond that.

The 2030 agenda is the ecosystem of activities - many including population-level monitoring - that relate to the United Nations (UN) Sustainable Development Goals (SDG). With respect to the 2030 agenda, HDSS can contribute by: (1) continuing to conduct cause-and-effect studies; (2) contributing to data triangulation or amalgamation initiatives; (3) characterizing the bias in and calibrating big data; and contributing more to the rapid training of data-oriented professionals, especially in the population and health fields.


Commentary in PNAS: Monitoring epidemics: Lessons from measuring population prevalence of the coronavirus with Abigail Norris Turner. We highlight the need to improve response rates and to prepare a robust measurement capability to be ready for the next pandemic. DOI.


Article out today in PLoS One

Linking the timing of a mother's and child's death: Comparative evidence from two rural South African population-based surveillance studies, 2000–2015 by Brian Houle, Chodziwadziwa W. Kabudula, Alan Stein, Dickman Gareta, Kobus Herbst, and Samuel J. Clark. DOI.


Background. The effect of the period before a mother's death on child survival has been assessed in only a few studies. We conducted a comparative investigation of the effect of the timing of a mother's death on child survival up to age five years in rural South Africa.

Methods. We used discrete time survival analysis on data from two HIV-endemic population surveillance sites (2000–2015) to estimate a child's risk of dying before and after their mother's death. We tested if this relationship varied between sites and by availability of antiretroviral therapy (ART). We assessed if related adults in the household altered the effect of a mother's death on child survival.

Findings. 3,618 children died from 2000–2015. The probability of a child dying began to increase in the 7–11 months prior to the mother's death and increased markedly in the 3 months before (2000–2003 relative risk = 22.2, 95% CI = 14.2–34.6) and 3 months following her death (2000–2003 RR = 20.1; CI = 10.3–39.4). This increased risk pattern was evident at both sites. The pattern attenuated with ART availability but remained even with availability at both sites. The father and maternal grandmother in the household lowered children's mortality risk independent of the association between timing of mother and child mortality.

Conclusions. The persistence of elevated mortality risk both before and after the mother's death for children of different ages suggests that absence of maternal care and abrupt breastfeeding cessation might be crucial risk factors. Formative research is needed to understand the circumstances for children when a mother is very ill or dies, and behavioral and other risk factors that increase both the mother and child's risk of dying. Identifying families when a mother is very ill and implementing training and support strategies for other members of the household are urgently needed to reduce preventable child mortality.


There will be occasional updates here. For now, the news is that I finally finished this web site after I was interrupted for a year by COVID-19. It's up to date and has all the basic content I had planned. The site is written in plain HTML with a simple cascading style sheet. It's straight from 1995, but it's super easy to modify/maintain/augment with a text editor, and with a shell script to upload everything, keeping the site up to date will be easy and does not require any expensive or takes-time-to-learn software or rely on third parties to fix things. :-)