2026-02-13 ▲

Effect of AI Assistance While Learning to Code

Anthropic - creators of the Claude foundation LLM and general try-to-be-more-ethical AI company - did a study to characterize the effect of AI coding tools on how well people were able to learn to code. The results are sobering. Using a high quality set of AI coding tools strongly decreased the scores on a knowledge-based test when training was complete. This suggests that AI (at least used the way that they did) was not helpful as a tool to improve learning new coding skills. Thanks Tyler for forwarding this to me.

2026-01-30 ▲

Appointed to WHO Health Statistics TAG

The WHO Health Systems, Access and Data unit is constituting a new technical advisory group (TAG). I applied to be a member last year some time, and this morning I received an official appointment letter. The group will contribute to developing recommendations for health systems including civil registration and vital statistics.

2026-01-09-2 ▲

Review: HDSS Data Uses

This paper is a great introduction to the unique potentials of HDSS data.

Versatility, value and limitations of using health and demographic surveillance system data for secondary analyses: guidance for researchers, using examples from existing analyses by Estelle McLean, Rebecca Sear, and Emma Slaymaker. Journal of Population Research. 43:3 (2026). DOI:10.1007/s12546-025-09411-z.

Abstract

Health and Demographic Surveillance Systems (HDSS) are geographic open cohorts operating in countries with absent/incomplete vital registration. Data on demographic events, socio-demographic indicators, and certain health conditions are regularly gathered on the whole population of a small area, sometimes for decades. In the same countries there are often also nationally-representative demographic data available from roughly quinquennial Demographic and Health Surveys (DHS). This paper uses a comparison with DHS data, and an in-depth review of complex HDSS analyses to demonstrate the utility of using HDSS data for secondary analyses, and to provide guidance on the conduct and reporting of these analyses. DHS data has advantages in terms of representativeness and data access, and HDSS in terms of scope for complex longitudinal analyses which may take household and familial contexts into account. HDSS also have issues which make interpretation of data and conclusions challenging: lack of data on in- and out-migrants when they are outside of the area, and repeatedly collected data may result in inconsistencies and/or more reliable data for longer-term residents. Despite these challenges, the reviewed HDSS data analyses demonstrate the flexibility and unique strengths of HDSS data. HDSS data users are recommended to clearly state their methods, particularly how they approached the issues specific to HDSS analyses of handling repeated data, migration and missing data: while there were interesting ways used to approach these issues, they were often not discussed. HDSS data producers are further encouraged to ensure that the data are being used to their full potential.

2026-01-09-1 ▲

HEALMOD Grant Cancelled

Over the fall of 2025, irreconcilable differences emerged on the leadership team (including me) of the HEALMOD unit. Great effort was expended by many people to resolve the issues, but a satisfactory resolution was not possible. So, today the OSU Office of Academic Affairs canceled the grant that supports the unit. The four postdocs supported by that award will continue to be supported through the end of their contracts, and they will choose the mentor(s) who will supervise them through the remainder of their postdocs.

This is tremendously sad and the direct result of overwhelming hubris and greed.

I look forward to continuing to work with the postdocs who may continue to want support from me.

2025-12-12 ▲

Reference Data Archive (RDA) Is Now Live at WHO

The RDA is now live at https://data.who.int/rda. It launches with verbal autopsy reference deaths from the Brazilian mortality surveillance system (SVO), starting with the city of São Paulo SVO. Over the coming year a lot more verbal autopsy reference deaths with WHO 2022 VA, minimally-invasive tissue sample (MITS) results and full autopsy from Brazil will be added. Similar reference deaths will also be added from the MITS Alliance sites.

2025-12-10 ▲

Block Domains that Send Spam Email

Over the past year I have identified and blocked almost 800 domains, IP numbers, or email addresses that send spam email. Here is the list as of right now: click to download sources-of-spam.txt. This is a text file with one malicious source on each line. You may be able to easily import it into whatever blacklisting tool you use.

2025-10-20 ▲

Used Car Checklist

If you're shopping for a used car, here's a checklist of things to note during your inspection.

2025-10-13 ▲

A Very Interesting Paper: Sex-associated Differences in Life Span

Sexual Selection Drives Sex Difference in Adult Life Expectancy Across Mammals And Birds by Johanna Staerk, Dalia A. Conde, Morgane Tidière, Jean-François Lemaître, András Liker, Balázs Vági, Samuel Pavard, Mathieu Giraudeau, Simeon Q. Smeele, Orsolya Vincze, Victor Ronget, Rita Da Silva, Zjef Pereboom, Mads F. Bertelsen, Jean-Michel Gaillard, Tamás Székely, and Fernando Colchero. Science Advances. 11, eady8433 (2025). DOI:10.1126/sciadv.ady8433.

Abstract

Across human cultures and historical periods, women, on average, live longer than men, a pattern best understood from a comparative evolutionary perspective. Here, we analyzed adult life expectancy in 528 mammal and 648 bird species in zoos. Like humans, 72% of mammals exhibited a female life expectancy advantage, while 68% of birds showed a male advantage, as expected from the harmful effects of sex chromosomes described by the heterogametic sex hypothesis. Yet, sex differences varied widely. In zoos, we found strong evidence that this variation generally correlated with both the mating system and sexual size dimorphism. Although with weaker evidence, the patterns remained consistent in populations from the wild, with an even larger effect of the mating system. Thus, even in zoos, where environmental pressures are largely reduced, precopulatory sexual selection seems to play a fundamental role in shaping sex differences in life expectancy in mammals and birds.

2025-09-27 ▲

RDA About Ready

The reference death archive - now named the reference data archive (RDA) - is about ready to go live at the WHO headquarters. We are wrapping up a few last odds and ends to integrate into WHO infrastructure and finalize data use/sharing agreements with the launch data providers. Stay tuned.

2025-09-27 ▲

Yue Chu's Dissertation

Yue Chu just finished her dissertation titled "Leveraging Language Models and Machine Learning in Verbal Autopsy Analysis". It's posted on arXiv so you can download and read it if you want. It's a very cool demonstration that pre-trained language models and the VA narrative can be super useful in classifying cause of death using VA data, abstract below.

Abstract

In countries without civil registration and vital statistics, verbal autopsy (VA) is a critical tool for estimating cause of death (COD) and inform policy priorities. In VA, interviewers ask proximal informants for details on the circumstances preceding a death, in the form of unstructured narratives and structured questions. Existing automated VA cause classification algorithms only use the questions and ignore the information in the narratives. In this thesis, we investigate how the VA narrative can be used for automated COD classification using pre-trained language models (PLMs) and machine learning (ML) techniques. Using empirical data from South Africa, we demonstrate that with the narrative alone, transformer-based PLMs with task-specific fine-tuning outperform leading question-only algorithms at both the individual and population levels, particularly in identifying non-communicable diseases. We explore various multi-modal fusion strategies combining narratives and questions in unified frameworks. Multi-modal approaches further improve performance in COD classification, confirming that each modality has unique contributions and may capture valuable information that is not present in the other modality. We also characterize physician-perceived information sufficiency in VA. We describe variations in sufficiency levels by age and COD and demonstrate that classification accuracy is affected by sufficiency for both physicians and models. Overall, this thesis advances the growing body of knowledge at the intersection of natural language processing, epidemiology, and global health. It demonstrates the value of narrative in enhancing COD classification. Our findings underscore the need for more high-quality data from more diverse settings to use in training and fine-tuning PLM/ML methods, and offer valuable insights to guide the rethinking and redesign of the VA instrument and interview.

2025-09-27 ▲

Leg/ankle Surgery

This has been my year for serious leg injuries!

On August 22 in a perfect storm of bad luck, I shattered my lower right fibula, broke a chip off the back of my lower right tibia, and completely broke my ankle. Late in the afternoon, I was riding my mountain bike (at 6.4mph!) on a tame trail I've ridden very many times before; the sun was in my eye and I lost balance and put my foot down, and super unluckily, my foot landed immediately in front of a solid root sticking up about nine inches. The foot stopped instantaneously and all my weight transferred to it. My body and leg kept going and my lower leg levered over the root ... this pushed my right fibula into the tibia, broke it into a number of pieces and forcefully opened my ankle joint resulting in a massive injury - think floppy foot!

After 12 hours in the ER, manual reduction (realignment) of the joint and creation of a strong splint, I went home and waited five days for the swelling to go down and then had a surgery on August 29th to fully reduce and stabilize the joint. This "open reduction and internal fixation" procedure fully realigns everything and then uses screws, plates, and other devices to keep the bones and joint properly aligned for healing.

In my follow up visits, the surgeon says I'm recovering very well and quickly, but it feels really painful and super slow to me! I had just recovered full use of my left knee when this happened, and it has been incredibly difficult to go back to continuous bone pain and inability to move or be active.

A few pictures below - graphic images:

2025-04-10 ▲

Knee Surgery

Blowing leaves from the gutter/eaves of our house in October last year, I tripped and twisted my knee. Over the next few weeks it gradually developed into a painful and debilitating situation, and I finally went to the doctor. In mid December I was diagnosed with a medial meniscus root detachment and additional less serious damage to the left medial meniscus. Repair required a significant surgery involving drilling a hole through the top of my tibia to attach a permanent suture to hold the meniscus root in place. It seems to have worked, but the recovery is very slow and up until about now, very painful. Most people apparently take nine months to a year to be back to 'normal'. I was completely immobile on crutches for about two and a half months and am about now beginning to walk around carefully without them. I'm super pleased to have my leg working again!

2025-02-08 ▲

R Package `SVDMx` on CRAN

The R package I wrote to implement my SVD-Comp mortality model published a while ago in Demography is on CRAN as 'SVDMx'. Install using install.packages("SVDMx").

2025-02-01 ▲

HEALMOD at OSU

During 2023-24, three faculty in Biostatistics (Grzegorz Rempala), Anthropology (Barbara Piparata and Sean Downey), and me in Sociology created the Health and Environment Modeling Co-laboratory (HEALMOD) with support from the College of Arts and Sciences and the School of Public Health at OSU. Together we applied for and received a "Good to Great" grant from the Office of Academic Affairs to support the initial setup and growth of HEALMOD. We have two postdocs and a number of projects getting going, and we are hiring two more postdocs now.

From the HEALMOD site:

The OSU Health and Environment Modeling Co-laboratory (HEALMOD) is a community of scholars dedicated to convergence research and education related to human health and environmental sustainability.

The initiative is driven by specific and compelling problems deeply integrated across disciplines with respect to theories, methods, data, and research communities.

Purpose

HEALMOD provides a thought space and education hub for convergence researchers, including faculty, research scientists, postdoctoral scholars, graduate students and undergraduate students.

HEALMOD builds on existing OSU institutional and human resources to propel OSU to the forefront of convergence research by providing the scaffolding to support new convergence research opportunities and the training of the next generation of convergence researchers by offering a transdisciplinary convergence curriculum.

Goals

HEALMOD will make OSU a world-class center for conducting theory-guided computational research into human-environmental health and sustainability and disseminating actionable results and policy recommendations with measurable social impact, thus raising the profiles of public health, and the social sciences (Anthropology, Sociology).

2024-11-08 ▲

When All You Have Is a Hammer

John Iceland and Eric Silver have written a great article addressing the dominance of critical race theory and social justice ideology in sociology. They offer a perspective that is likely to create a better understanding of what's going on and support the creation of (actually!) effective interventions.

When all you have is a hammer: how social justice distorts what we know about racial disparities

Abstract

The sociological literature on race operates under the progressive ideological assumption that systemic racism is the predominant cause of racial disparities. This assumption has become “paradigmatic,” shaping the selection of research questions and the interpretation of research results. Consequently, the literature offers a rather narrow “Overton window” concerning what we, as sociologists, know about: (1) the causes of racial disparities, (2) the accuracy and motivation behind the public’s views on race-related issues, and (3) race-related policy preferences. A paradigm shift is needed to improve our understanding of racial disparities and devise more effective ways to address them. To achieve this end, sociologists should broaden their perspectives beyond attributing all racial disparities to systemic racism and consider additional hypotheses. From a policy perspective, to reduce racial disparities we should reconsider addressing social class and related factors early in life.

2024-10-22 ▲

Phone Number in Email Signature

My email signature below, net of institutional changes, has been the same for almost twenty years. Exactly two people have every used my cell phone number to call me! I really wish more people would because I'm not a fan of over-full email inboxes.

Sam's email signature with cell phone number.

2024-08-12 ▲

Back from Brazil

Yesterday, I arrived back home to Ohio from Brazil where I was for a one-year sabbatical. It was a really fabulous year!

Goodbye Brazil! Sunset in Araucarias.

2024-08-01 ▲

Lancet Global Health Paper Out

Our paper demonstrating InSilicoVA using ALPHA Network HDSS data came out last week in Lancet Global Health. See Huge Milestone! below.

Chu, Y., M. Marston, A. Dube, C. Festo, E. Geubbels, S. Gregson, K. Herbst, C. Kabudula, K. Kahn, T. Lutalo, L. Moorhouse, R. Newton, C. Nyamukapa, R. Makanga, E. Slaymaker, M. Urassa, A. Ziraba, C. Calvert, S.J. Clark (2024). Temporal Changes in Cause of Death Among Adolescents and Adults in Six Countries in Eastern And Southern Africa: A Multi-Country Cohort Study using Verbal Autopsy Data. Lancet Global Health 12:e1278–87. [ DOI ]

2024-07-04 ▲

Tanzania Ministry of Health endorses InSilicoVA

The Tanzanian Ministry of Health just released a report titled "Causes of Deaths from Community Settings in Tanzania 2018-2021" (June 2024 - local link if TZA link is broken). The report presents findings on their study to evaluate various ideas, methods, and technologies to improve cause of death from community settings (e.g. rural or non-urban enough to not be covered by systems in urban areas - most of Tanzania) in the context of civil registration and vital statistics. Part of that is an evaluation of approaches to classifying cause of death using data from verbal autopsy. The algorithmic approaches - including InSilicoVA - are compared to each other and physician coding. The following directly quoted paragraph (Section 6.2, p. 50) sums up their findings on 'diagnostic algorithms' for verbal autopsy.

The experience documented in this report covering the first 3,601 verbal autopsies conducted as part of routine CRVS and DHIS2 operations provides sound evidence that the WHO standard verbal autopsy methodologies work well in Tanzania and can be taken to the next level of integration into the national health information ecosystem. Moreover, it concludes that both the InSilicoVA and the InterVA5 automated computer coded analytic procedures emulate very well the performance of physician coded VA at population level. CCVA is both considerably more rapid and cost effective than PCVA. The best performing CCVA algorithm was InSilicoVA with a CSMF physician concordance of 83%. InSilicoVA also has an advantage over InterVA5 and Tariff2 in that it does not deliver undetermined results. Moreover, the InterVA5 will likely be discontinued for further development and upgrades in favor of InSilicoVA. The implications of these results suggest that InSilicoVA should become the diagnostic method of choice for CRVS-VA in Tanzania [emphasis added].

Congratulations to the openVA Team, and thanks to our funders who have kept us going on this since roughly 2013 - The Bill and Melinda Gates Foundation for two rounds of support, NICHD of the NIH for a five-year R01, and the various partners of the Data for Health Initiative funded by Bloomberg Philanthropies, Vital Strategies and the CDC Foundation in particular.

2024-06-04 ▲

pyOpenVA

The openVA Team recently completed work on pyOpenVA with Jason Thomas responsible for most of the coding. pyOpenVA is a re-implementation of openVA, InterVA5, and InSilicoVA to address the fact that many users were not able to easily use the R versions of the software - maintaining installations of R and Java was not practically feasible. Additionally, crossVA and InSilicoVA were painfully slow on many users computers. So, pyOpenVA

is fast - written in Python with computationally intensive functions in C++
installs using a traditional, familiar 'wizard'-style installer on both Windows and MacOS
interacts with the user using a familiar graphical user interface
has a variety of additional features to allow users to conduct a full cause-coding workflow:

a comprehensive help system
example data and workflow
data edit capability
more and less fine-grained step-by-step work flows
ability to present results numerically or graphically
ability to save results in either tabular or graphical forms

For more information or help using the openVA tools, contact us using these email addresses:

2024-03-29 ▲

Developing an Agenda for Population Aging and Social Research in Low- and Middle-Income Countries (LMICs): Proceedings of a Workshop (2024)

In early fall last year, I participated in a National Academy of Sciences workshop to support the National Institute on Aging in their medium- and long-range planning for programs related to research in lower- and middle-income countries. The report came out recently, and I think it's useful. Download the PDF: NAS/NIA LMIC Agenda Report.

2024-03-28 ▲

Paper Accepted

I'm a little late with this - was actually accepted last year!

Mortality Disparities by Age and Causes of Death in Rural South Africa by Brian Houle, Chodziwadziwa W. Kabudula, Sanyu A. Mojola, Nicole Angotti, F. Xavier Gómez-Olivé, Dickman Gareta, Kobus Herbst, Samuel J Clark, Jane Menken, and Vladimir Canudas-Romo in BMJ Global Health

Abstract

Introduction. Understanding mortality disparities by age and cause is critical to identifying intervention and prevention actions to support vulnerable populations. We assessed mortality changes in two rural South African populations over 25-years covering pre- and peak AIDS epidemic and subsequent antiretroviral therapy (ART) availability.

Methods. Using population surveillance data from the Agincourt Health and Socio-Demographic Surveillance System (AHDSS; 1994-2018) and Africa Health Research Institute (AHRI; 2000-2018) for five-year periods, we calculated life expectancy from birth to age 85, mortality age distributions and variation, and life-years lost (LYL) decomposed into four cause-of-death groups.

Results. The AIDS epidemic shifted the age-at-death distribution to younger ages and increased LYL. For AHDSS, between 1994-1998 and 1999-2003 LYL increased for females from 13.6 years (95% CI 12.7–14.4) to 22.1 (21.2–23.0) and for males from 19.9 (18.8–20.8) to 27.1 (26.2–28.0). AHRI LYL in 2000-2003 were extremely high (females=40.7 years (39.8–41.5), males=44.8 years (44.1–45.5)). Subsequent widespread ART availability reduced LYL (2014-2018) for women (AHDSS=15.7 (15.0–16.3); AHRI=22.4 (21.7–23.1)) and men (AHDSS=21.2 (20.5–22.0); AHRI=27.4 (26.7–28.2)), primarily due to reduced HIV/AIDS/TB deaths in mid-life and other communicable disease deaths in children. External causes increased as a proportion of LYL for men (2014-2018: AHRI=25%, AHDSS=17%). The share of AHDSS LYL 2014-1018 due to noncommunicable diseases exceeded pre-HIV levels: females=43%; males=40%).

Conclusions. Our findings highlight shifting burdens in cause-specific LYL and persistent mortality disparities in two populations experiencing complex epidemiological transitions. Results show high contributions of children to LYL at the height of the AIDS epidemic. Reductions in LYL were primarily driven by lowered HIV/AIDS/TB and other communicable disease mortality during the ART periods. LYL disparities persist despite widespread ART availability, highlighting the contributions of other communicable diseases in children, HIV/AIDS/TB and external causes in mid-life, and noncommunicable diseases in older ages.

2024-03-26 ▲

Resources for Scientific Writing, Presentations, and Curation of Data and Code

Writing

How small changes to a paper can help to smooth the review process by Michael White. Nature.
How to construct a Nature summary paragraph. Nature.
Communicating Statistical Results by Jennifer Hoeting and Geof Givens.
Writing technical papers or reports by A. S.C. Ehrenberg. The American Statistician. Also at JSTOR.
Ten simple rules for effective presentation slides by K.M. Naegle. PLoS computational biology.
Ten simple rules for structuring papers by B. Mensh and K. Kording. PLOS Computational Biology.
Novelist Cormac McCarthy's tips on how to write a great science paper by V. Savage and P. Yeh. Nature.

Data and Code

FAIR Principles Go Fair.
The FAIR Guiding Principles for scientific data management and stewardship by Mark D. Wilkinson et al. Nature.
Introducing the FAIR Principles for research software by Michelle Barker, Neil P. Chue Hong, Daniel S. Katz, anna-Lena Lamprecht, Carlos Martinez-Ortiz, Fotis Psomopoulos, Jennifer Harrow, Leyla Jael Castro, Morane Gruenpeter, Paula Andrea Martinez, and Tom Honeyman. Nature.
FAIR Versus Open Data: A Comparison of Objectives and Principles by Putu H.P. Jati, Yi Lin, Sara Nodehi, Dwy B. Cahyono, and Mirjam van Reisen. Data Intelligence.

2024-03-25 ▲

Interesting Papers

Shifting the Level of Selection in Science by Leo Tiokhin, Karthik Panchanathan, Paul E. Smaldino, and Daniël Lakens. Perspectives on Psychological Science, 17456916231182568.
Abstract

Criteria for recognizing and rewarding scientists primarily focus on individual contributions. This creates a conflict between what is best for scientists’ careers and what is best for science. In this article, we show how the theory of multilevel selection provides conceptual tools for modifying incentives to better align individual and collective interests. A core principle is the need to account for indirect effects by shifting the level at which selection operates from individuals to the groups in which individuals are embedded. This principle is used in several fields to improve collective outcomes, including animal husbandry, team sports, and professional organizations. Shifting the level of selection has the potential to ameliorate several problems in contemporary science, including accounting for scientists’ diverse contributions to knowledge generation, reducing individual-level competition, and promoting specialization and team science. We discuss the difficulties associated with shifting the level of selection and outline directions for future development in this domain.
Meaningfulness in a Scientific Career Is About More Than Tangible Outputs by A. Alexandrova. Nature 627(8004), 489-489.

2024-03-19 ▲

Pontol do Sul MTB

I broke down and bought a nice cross-country mountain bike. I've been riding around Pontol do Sul enjoying the beach and rain forest! There's been night riding, rain riding, and riding into the very strong wind on the beach. After some trial and error, a nice 18km loop has been worked out. The only real issue is mosquitos - very vicious and in staggering numbers in the forest. 100% DEET is required!

Sam on the rain forest side of the loop, Pontal do Sul

Sam riding on the beach, Pontal do Sul

Riding in the rain forest in the rain, Pontal do Sul

2024-03-15 ▲

Huge Milestone! Completed First VA Methods Development Project with a VA Study using InSilicoVA published in the Lancet Global Health

Around 2010, I started thinking about working on verbal autopsy methods. Over the next few years, Basia Zaba and I decided to coordinate work that she was leading at the ALPHA Network of health and demographic surveillance system (HDSS) sites on identifying AIDS as a cause of death with the work that Tyler McCormick, Zehang (Richard) Li, and I were beginning to do on verbal autopsy cause-coding methods. We wove things together for a grant application that Basia was working on for the Gates Foundation, and we were funded by the Gates Foundation for a couple years to get started. During that time we came up with the basic idea for InSilicoVA, and I led the team to apply for an NIH R01 grant. We were supported by NICHD for five years to complete work on InSilicoVA and support the ALPHA Network sites and the London-based ALPHA secretariat to collect, clean, and harmonize VA data from all the sites. Basia, Tyler, Richard, and I had the idea that we'd develop a new and improved automated cause-coding method for VA and use the ALPHA data for testing and validation. We would then apply the method to the ALPHA Network VAs to produce a comparative description of cause-specific mortality through time, with a focus on HIV/AIDS. To make the method fully transparent and credible, we planned to 1) publish a technical paper in a good statistics journal, 2) create open source software to support that publication and make the method widely usable, and 3) conduct a detailed substantive investigation of the ALPHA Network's cause specific mortality using the new method and publish that in a good public/global health journal.

We have just completed the last of those tasks with the acceptance of the ALPHA Network cause-specific mortality paper by the Lancet Global Health - effectively the best journal for material like this, summary below. Clara Calvert, Milly Marston, Yue Chu, and myself with all of the ALPHA HDSS sites worked over many years to get this done! The methods paper was published in 2016 in the Journal of the American Statistical Association, one of the best statistics journals, see below. Finally, the software has turned into a major success and is the reference software supporting the WHO Standard VA used by many research and mortality surveillance groups worldwide. We published a paper describing the software and how to use it in The R Journal, below. In addition to the research software in R, we were supported by the NGO Vital Strategies to create a production version of the software implemented in Python and C++ by Jason Thomas. That version is very fast, easy to install in the standard way with no additional dependencies required, and easy to use through a GUI (public release in the next few weeks after final validation testing).

Along the way, we quickly convinced ourselves that the verbal autopsy interview is potentially a major source of error, omission, and general data quality issues. With Clarissa Surek-Clark, Nicole Angotti, and soon Brian Houle, we have observed the interview in many settings and are beginning work to improve and standardize it.

Altogether, especially measured by our original objectives, this project has been a total and overwhelming success! This is because of the fantastic team we had from the very beginning. Having this last paper accepted in such a great journal is particularly poignant given the arc of the project and the fact that we lost Basia halfway through.

The three key outputs of the project:

The methods paper describes InSilicoVA - our new automated cause-coding algorithm for verbal autopsy. It was published in the Journal of the American Statistical Association in 2016: Probabilistic Cause-of-Death Assignment using Verbal Autopsies.
We created and released open source, freely-available software for InSilicoVA and all of the other commonly-used verbal autopsy cause-coding algorithms (except Tariff 2.0) in the statistical programming environment R - called openVA. All of the packages are available at the Comprehensive R Archive Network (CRAN). We also maintain a Github repository with the code and a variety of additional resources. For users, we published a tutorial and user manual in the The R Journal: The openVA Toolkit for Verbal Autopsies in 2023.
The ALPHA Network cause-specific morality paper is coming out in the Lancet Global Health sometime in the next few weeks:

Temporal Changes in Cause of Death Among Adolescents and Adults in Six Countries in Eastern And Southern Africa: A Multi-Country Cohort Study using Verbal Autopsy Data by Yue Chu, Milly Marston, Albert Dube, Charles Festo, Eveline Geubbels, Simon Gregson, Kobus Herbst, Chodziwadziwa Kabudula, Kathleen Kahn, Tom Lutalo, Louisa Moorhouse, Robert Newton, Constance Nyamukapa, Ronald Makanga, Emma Slaymaker, Mark Urassa, Abdhalah Ziraba, Clara Calvert, and Samuel J. Clark

Abstract

Background. The absence of high-quality comprehensive civil registration and vital statistics systems across many settings in Africa has led to limited empirical data on causes of death in the region.

Methods. We harmonized verbal autopsy (VA) and residency data from nine health and demographic surveillance system (HDSS) sites across Eastern and Southern Africa, each with variable coverage across the period 1995-2019. InSilicoVA, a probabilistic model, was used to assign cause of death based on the signs and symptoms reported in the VA. Levels and trends in all-cause and cause-specific mortality rates and cause-specific mortality fractions were calculated, stratified by HDSS site, sex, age, and calendar periods.

Findings. All-cause mortality has generally decreased across the HDSS sites, particularly for adults aged 20-59. In many of the HDSS sites, these decreases were driven by reductions in HIV/TB-related deaths. For 2010-2014, the top causes of death were: road traffic accidents, HIV/TB and meningitis/sepsis for adolescents (12-19 years), HIV/TB for adults (20-59 years), and neoplasms and cardiovascular disease for older adults (>59 years). There was greater between-HDSS and between-sex variation in causes of death for adolescents compared to adults.

Interpretation. This study shows that there has been progress in reducing mortality across Eastern and Southern Africa but also points to age, sex and between-HDSS differences in causes of adolescent and adult deaths. This highlights the importance of detailed local-level data to inform health needs to ensure continued improvements in survival.

2023-11-14 ▲

Three Papers Accepted

Three papers that have been in the works for a long time have been accepted -

Understanding Household Dynamics from the Ground Up: A Longitudinal Study from a Rural South African Setting by Yu Shao-Tzu, Brian Houle, Enid Schatz, Nicole Angotti, Chodziwadziwa W. Kabudula, Francesc Xavier Gómez-Olivé, Samuel J. Clark, Jane Menken, and Sanyu A. Mojola in Demography
Abstract

Investigations into household structure in low- and middle-income countries (LMICs) provide important insight into how families manage domestic life in response to resource allocation and caregiving needs during periods of rapid socio-political and health-related challenges. Recent evidence on household structure in many LMICs contrasts with longstanding viewpoints of worldwide convergence to a Western nuclearized household model. Here we adopt a household-centered theoretical and methodological framework to investigate longitudinal patterns and dynamics of household structure in a rural South African setting during a period of high AIDS-related mortality and socioeconomic change. Data come from the Agincourt Health and Socio-Demographic Surveillance System (AHDSS, 2003-2015). Using Latent Transition Models, we derived six-distinct household types by examining conditional interdependency between household head’s characteristics, members’ age composition, and migration status. Over half of households were characterized by their complex and multigenerational profiles, with considerable within-typology variation in household size and dependency structure. Transition analyses showed stability of household types under female headship, while higher proportions of nuclearized household types dissolved over time. Household dissolution was closely linked to prior mortality experiences, particularly following death of a male head. Our findings highlight the need to better conceptualize and contextualize household changes across populations and over time.
Bayesian Nested Latent Class Models For Cause-of-death Assignment Using Verbal Autopsies Across Multiple Domains by Zehang Richard Li, Zhenke Wu, Irena Chen, and Samuel J. Clark in Annals of Applied Statistics
Abstract

Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many low- and middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs as labeled data are usually unavailable in the target population. This article proposes a Latent Class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assigns causes of death for out-of-domain observations, and estimates cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop a computationally efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary materials and reproducible analysis codes are available online. The R package LCVA implementing the method is available on GitHub.
Agreement between cause of death assignment by computer-coded verbal autopsy methods and physician coding of verbal autopsy interviews in South Africa by Pam Groenewald, Jason Thomas, Samuel J Clark, Diane Morof, Jané D. Joubert, Chodziwadziwa Kabudula, Zehang Li, and Debbie Bradshaw in Global Health Action
Abstract

Background. The South African national cause of death validation project collected a national sample of verbal autopsies (VA) with cause of death assignment by both physician-coded verbal autopsy (PCVA) and three computer coded verbal autopsy (CCVA) algorithms. This provided an opportunity to compare the performance of three CCVA algorithms with PCVA in assigning a cause of death at the individual and population level.

Methods. Seven performance metrics were used to assess agreement of cause assignment at individual and population level between PCVA and three algorithms InterVA-5, InSilicoVA and Tariff 2.0 after aggregating the VA cause list into 25 categories based on frequencies. Positive predictive value (PPV), sensitivity, overall agreement, kappa statistic, and chance corrected concordance (CCC) were used to assess agreement at individual level, using the most likely cause of death assigned by the algorithms. The cause specific mortality fraction (CSMF) accuracy, and Spearman’s rank correlation were used to assess the agreement at population level. We analyzed the agreement for age and sex subgroups as well as for deaths that occurred in and out of facilities to compare the performance of the algorithms for different sub-groups.

Results. The level of agreement demonstrated by the Kappa statistic was weak to none for all algorithms and all groups. For overall CCVA agreement, InSilicoVA was highest with moderate agreement for adults and neonates. Both InterVA-5 and Tariff 2.0 algorithms had weak to minimal agreement for all groups. For CSMF accuracy all algorithms had moderate to strong agreement for all groups with InterVA-5 showing the highest agreement (0·90) for neonates, Tariff 2.0 the highest agreement for adults (0·89) and males (0·84), and InSilicoVA the highest agreement for females (0·88), elders (0·83) and out of health facilities. For CCC the agreement was weak to none for all algorithms and groups. The sensitivity and PPV by cause show that algorithm performance is similar for most causes of death, except HIV/AIDS, TB, Maternal, other cancers, and some injuries.

Conclusions. Whilst the results of this extensive study confirmed that the algorithms all identified HIV/AIDS as a leading cause of death, it highlighted significant scope for improving the algorithms for use in South Africa.

2023-11-10 ▲

Reference Death Archive Kickoff at WHO

Kobus Herbst, Yue Chu, and I have developed the Reference Death Archive pilot database over the past few months. In mid November we visited WHO and started the process of handing it off to become the WHO-hosted Reference Death Archive for verbal autopsy under Doris Ma Fat's supervision. This is a significant milestone in the project, and we're all excited about the progress.

Sam, Yue, Doris, and Kobus at WHO

2023-11-01 ▲

Moved to Brazil for a Year

For my sabbatical year, I have moved with my family to Brazil. Clarissa and I will work on our reference death project with the Department of Pathology at the University of São Paulo, and we will spend time with Clarissa's family in Curitiba.

Atlantic Rain Forest, Parana, Brazil

River in Atlantic Rain Forest, Parana, Brazil

Waterfall in Atlantic Rain Forest, Parana, Brazil

2023-10-07 ▲

Five Manuscripts R/R and South Africa/Brazil Exchange

Super exciting: my fabulous colleagues and I have five papers at an advanced R/R stage - feels like the pandemic is finally working out of our pipeline!

Also, finally doing something that I'd wanted to do for some time (again, the pandemic ...) - helping my South African and Brazilian colleagues connect around cause of death ascertainment. Brazilian colleague Luiz Fernando (Burns) who directs the mortality surveillance unit in the Department of Pathology at the University of São Paulo, Ryan Wagner who co-directs the new minimally-invasive tissue sample (MITS) project at the Agincourt health and demographic surveillance system (HDSS) site, and Alison Castle at the Africa Health Research Institute (AHRI) who has just started an autopsy project on community deaths. Burns is visiting the South African sites this week with me and Clarissa, and a South African contingent will visit São Paulo later this year, again with me and Clarissa.

2023-09-29 ▲

World Population Fractions Update - for 2022 WPP

Here's an update to my plot showing how the world population is distributed across major regions from 1950-2100 according to the 2022 edition of the UN Population Division's World Population Prospects. Notice the ever increasing importance of Africa! Code to do this is in this Github repo.

2023-04-04 ▲

Contributed to an issue of the National Geographic Magazine

I spent some time with the authors and editors preparing a recent issue of National Geographic Magazine editing and creating figures that interpret the UN Population Division's World Population Prospects population estimates and forecasts, including the one here: Will Nigeria’s booming population lead it to prosperity or poverty?

2023-03-06 ▲

"The openVA Toolkit for Verbal Autopsies" appeared in The R Journal today.

Li, Z., J. Thomas, E. Choi, T.H. McCormick, and S.J. Clark (2023). The openVA Toolkit for Verbal Autopsies The R Journal [ Link ]

See openVA Toolkit just below for details.

2023-03-03 ▲

Accepted in BMJ Open

Houle, B., C.W. Kabudula, D. Gareta, K. Herbst, and S.J. Clark (Accepted 2023). Household Structure, Composition, and Child Mortality in the Unfolding Antiretroviral Therapy Era in Rural South Africa: Comparative Evidence from Population Surveillance, 2000-2015. BMJ Open.

Abstract

Objectives: The structure and composition of the household has important influences on child mortality. However, little is known about these factors in HIV-endemic areas and how associations may change with the introduction and widespread availability of antiretroviral treatment (ART). We use comparative, longitudinal data from two demographic surveillance sites in rural South Africa (2000-2015) on mortality of children younger than five years (n=101,105).

Design: We use multilevel discrete time event history analysis to estimate children’s probability of dying by their matrilineal residential arrangements. We also test if associations have changed over time with ART availability.

Setting: Rural South Africa.

Participants: Children younger than five years (n=101,105).

Results: 3,603 children died between 2000-2015. Mortality risks differed by co-residence patterns along with different types of kin present in the household. Children in nuclear households with both parents had the lowest risk of dying compared to all other household types. Associations with kin and child mortality were moderated by parental status. Having older siblings lowered the probability of dying only for children in a household with both parents (relative risk ratio (RRR)=0.736 95% CI [0.633, 0.855]). Only in the later ART period was there evidence that older adult kin lowered the probability of dying for children in single parent households (RRR=0.753 95% CI [0.664, 0.853]).

Conclusions: Our findings provide comparative evidence of how differential household profiles may place children at higher mortality risk. Formative research is needed to understand the role of other household kin in promoting child well-being, particularly in one-parent households that are increasingly prevalent.

2022-10-19 ▲

"The openVA Toolkit for Verbal Autopsies" will appear in the R Journal

This is one of the longest in-preparation papers I've ever worked on. Thanks to the persistence of Richard Li and other coauthors, we finally have a statistical software paper in the R Journal - along with the collection of packages it describes in CRAN:

This software and InSilicoVA - our new automated algorithm for classifying cause of death using verbal autopsy data - are primary outputs of our NIH project supported by NIH R01HD086227 from NICHD.

Recently, support for the continued development and maintenance of the software has come from Vital Strategies and the CDC Foundation as part of Bloomberg Philanthropies' Data for Health initiative.

2022-08-23 ▲

New Grant

I have a new grant from the Bill & Melinda Gates Foundation for $2.04M to support the openVA Team to work closely with the WHO to create a pathology-informed reference death archive for verbal autopsy to be hosted by the WHO in Geneva. The first deaths will come from the MITS Alliance, the CHAMPS project, and the mortality surveillance system (SVO) in several cities in Brazil. The aim is to create pathology-informed symptom-cause information (like training data) for automated algorithms that identify likely causes for deaths with verbal autopsy.

2022-07-14 ▲

UN World Population Prospects 2022

The UN Population Division recently released the 2022 iteration of their bi-annual World Population Prospects WPP global population estimates and projections. I led a small team including Jon Muir and Brian Houle to develop a mortality model for HIV-affected countries that was used to produce the 2022 WPP. OSU undergraduate student Michael Allen also contributed to the early stages of our work a couple years ago.

Launch site, including lots of information about the WPP.
Summary of results document, including acknowledgements.
Data in many downloadable forms.

2022-04-28 ▲

How to Update a Web Site Using Git

Here's markdown file and PDF describing how to set up git to automatically update a web site.

2022-04-19 ▲

New York Times Article on Verbal Autopsy

This is a nice article on death registration and verbal autopsy - very high level overview for lay readers!

Although the openVA Team is not mentioned, we work closely with many of the organizations and people mentioned or quoted in this article.

A Door-to-Door Effort to Find Out Who Died Helps Low-Income Countries Aid the Living

2022-04-19 ▲

Papers in Annual Meeting of the Population Association of America (PAA) 2022

Application of a Singular Value Decomposition-based Factorization and Parsimonious Component Model of Mortality to HIV Epidemics in Africa by Brian Houle, The Australian National University; Jonathan Muir, Emory University; Sara Hertog, United Nations; Francois Pelletier, United Nations; Samuel Clark, The Ohio State University; Patrick Gerland, United Nations Population Division. Session 41: Statistical Modeling and Smoothing in Demography. Presented by Sam Clark.
Sibling Mortality in Developing Countries: A Cross Regional Study by Jonathan Muir, Emory University; Brian Houle, The Australian National University; Samuel Clark, The Ohio State University; Solveig Cunningham, Emory University. Session 116: Early Life Health in Low and Middle Income Countries. Presented by Jon Muir.
The Timing of Sibling Mortality in the Widespread Antiretroviral Treatment Era: Comparative Evidence from Population-based Surveillance in Rural South Africa by Brian Houle, The Australian National University; Jonathan Muir, Emory University; Chodziwadziwa Kabudula, University of the Witwatersrand; Samuel Clark, The Ohio State University. Session 51: Child Health: Spillover and Spatial Effects. Presented by Jon Muir.

2022-04-10 ▲

New in Global Health Action

Chandramohan, D., E. Fottrell, J. Leitao, E. Nichols, S. J. CLARK, C. Alsokhn, D. C. Munoz, C. AbouZahr, A. Di Pasquale, R. Mswia, E. Choi, F. Baiden, J. Thomas, I. Lyatuu, Z. Li, P. Larbi-Debrah, Y. Chu, S. Cheburet, O. Sankoh, A. M. Badr, D. M. Fat, P. Setel, R. Jakob, and D. de Savigny (2021). Estimating Causes Of Death Where There Is No Medical Certification: Evolution And State of The Art Of Verbal Autopsy. Global Health Action. [ DOI ]

Abstract

Over the past 70 years, significant advances have been made in determining the causes of death in populations not served by official medical certification of cause at the time of death using a technique known as Verbal Autopsy (VA). VA involves an interview of the family or caregivers of the deceased after a suitable bereavement interval about the circumstances, signs and symptoms of the deceased in the period leading to death. The VA interview data are then interpreted by physicians or, more recently, computer algorithms, to assign a probable cause of death. VA was originally developed and applied in field research settings. This paper traces the evolution of VA methods with special emphasis on the World Health Organization’s (WHO)’s efforts to standardize VA instruments and methods for expanded use in routine health information and vital statistics systems in low- and middle-income countries (LMICs). These advances in VA methods are culminating this year with the release of the 2022 WHO Standard Verbal Autopsy (VA) Toolkit. This paper highlights the many contributions the late Professor Peter Byass made to the current VA standards and methods, most notably, the development of InterVA, the most commonly used automated computer algorithm for interpreting data collected in the WHO standard instruments, and the capacity building in low- and middle-income countries (LMICs) that he promoted. This paper also provides an overview of the methods used to improve the current WHO VA standards, a catalogue of the changes and improvements in the instruments, and a mapping of current applications of the WHO VA standard approach in LMICs. It also provides access to tools and guidance needed for VA implementation in Civil Registration and Vital Statistics Systems at scale.

2022-04-10 ▲

New in Global Health Action

Herbst, K., S. Juvekar, M. Jasseh, Y. Berhane, N. T. K. Chuk, J. Seeley, O. Sankoh, S. J. CLARK and M. Collinson (2021). Health And Demographic Surveillance Systems In Low- And Middle-income Countries: History, State of The Art And Future Prospects. Global Health Action. [ DOI ]

Abstract

Health and Demographic Surveillance Systems (HDSS) have been developed in several low- and middle-income countries (LMICs) in Africa and Asia. This paper reviews their history, state of the art and future potential and highlights substantial areas of contribution by the late Professor Peter Byass.

Historically, HDSS appeared in the second half of the twentieth century, responding to a dearth of accurate population data in poorly resourced settings to contextualise the study of interventions to improve health and well-being. The progress of the development of this network is described starting with Pholela, and progressing through Gwembe, Balabgarh, Niakhar, Matlab, Navrongo, Agincourt, Farafenni, and Butajira, and the emergence of the INDEPTH Network in the early 1990’s

The paper describes the HDSS methodology, data, strengths, and limitations. The strengths are particularly their temporal coverage, detail, dense linkage, and the fact that they exist in chronically under-documented populations in LMICs where HDSS sites operate. The main limitations are generalisability to a national population and a potential Hawthorne effect, whereby the project itself may have changed characteristics of the population.

The future will include advances in HDSS data harmonisation, accessibility, and protection. Key applications of the data are to validate and assess bias in other datasets. A strong collaboration between a national HDSS network and the national statistics office is modelled in South Africa and Sierra Leone, and it is possible that other low- to middle-income countries will see the benefit and take this approach.

2022-02-20 ▲

New in BMC Public Health

Houle, B., C.W. Kabudula, A.M. Tilstra, S.A. Mojola, E. Schatz, S.J. CLARK, N. Angotti, F.X. Gómez-Olivé, and J. Menken (2022). Twin epidemics: The Effects of HIV and Systolic Blood Pressure on Mortality Risk in Rural South Africa, 2010-2019. BMC Public Health. [ DOI ]

Abstract

Background. Sub-Saharan African settings are experiencing dual epidemics of HIV and hypertension. We investigate effects of each condition on mortality and further examine whether HIV and hypertension interact in determining mortality.

Methods. Data come from the 2010 Ha Nakekela population-based survey of individuals ages 40 and older (1,802 women; 1,107 men) nested in the Agincourt Health and socio-Demographic Surveillance System in rural South Africa, which provides mortality follow-up from population surveillance until mid-2019. Using discrete-time event history models stratified by sex, we assessed differential mortality risks according to baseline measures of HIV infection, HIV-1 RNA viral load, and systolic blood pressure.

Results. During the 8-year follow-up period, mortality was high (477 deaths). 37% of men (mortality rate 987.53/100,00, 95% CI: 986.26 to 988.79) and 25% of women (mortality rate 937.28/100,000, 95% CI: 899.7 to 974.88) died. Over a quarter of participants were living with HIV (PLWH) at baseline, over 50% of whom had unsuppressed viral loads. The share of the population with a systolic blood pressure of 140mm Hg or higher increased from 24% at ages 40-59 to 50% at ages 75-plus and was generally higher for those not living with HIV compared to PLWH. Men and women with unsuppressed viral load had elevated mortality risks (men: adjusted odds ratio (aOR) 3.23, 95% CI: 2.21 to 4.71, women: (OR 2.05, 95% CI: 1.27 to 3.30). There was a weak, non-linear relationship between systolic blood pressure and higher mortality risk. We found no significant interaction between systolic blood pressure and HIV status for either men or women (p>0.05).

Conclusions. Our results indicate that HIV and elevated blood pressure are acting as separate, non-interacting epidemics affecting high proportions of the older adult population. PLWH with unsuppressed viral load were at higher mortality risk compared to those uninfected. Systolic blood pressure was a mortality risk factor independent of HIV status. As antiretroviral therapy becomes more widespread, further longitudinal follow-up is needed to understand how the dynamics of increased longevity and multimorbidity among people living with both HIV and high blood pressure, as well as the emergence of COVID-19, may alter these patterns.

2022-01-23 ▲

I discovered something new (to me) that will be useful. Elements of the vector defined by the perpendicular projection of a vector $ \vec{p} = \{p_1,p_2\} $ onto the line $ y=x $ are the arithmetic mean of $ p_1 $ and $ p_2 $: \begin{align} \mbox{proj}_\vec{p}\hat{e} &= \frac{\vec{p} \cdot \hat{e}}{\hat{e} \cdot \hat{e}} \hat{e} \\ &= \frac{p_1e + p_2e}{e^2 + e^2} \hat{e} \\ &= \frac{p_1 + p_2}{2e} \hat{e} \\ &= \left\{\frac{p_1 + p_2}{2},\frac{p_1 + p_2}{2}\right\} \\ \end{align} where '$ \cdot $' is the dot product, $\hat{e} = \{e_1,e_2\} $ is the unit vector in the direction of $ y=x $, and $e = e_1 = e_2 $.

I noticed something else interesting (also new to me). The projection of a point onto every line through the origin forms a circle. In the image below, the red point (2,4.5) is projected onto a set of the lines through the origin and each is marked with a black dot and connected with a red line.

2021-12-28 ▲

Published in Annals of Epidemiology

Norris Turner, A., D. Kline, A. Norris, W.G. Phillips, E. Root, J. Wakefield, Z. Li, S. Lemeshow, M. Spahnie, A. Luff, Y. Chu, M.K. Francis, M. Gallo, P. Chakraborty, M. Lindstrom, G. Lozanski, W. Miller, S.J. CLARK (2021). Prevalence of Current and Past COVID-19 in Ohio Adults. Annals of Epidemiology. [ DOI ]

Abstract

Purpose. To estimate the prevalence of current and past COVID-19 in Ohio adults.

Methods. We used stratified, probability-proportionate-to-size cluster sampling. During July 2020, we enrolled 727 randomly-sampled adult English- and Spanish-speaking participants through a household survey. Participants provided nasopharyngeal swabs and blood samples to detect current and past COVID-19. We used Bayesian latent class models with multilevel regression and poststratification to calculate the adjusted prevalence of current and past COVID-19. We accounted for the potential effects of non–ignorable non–response bias.

Results. The estimated statewide prevalence of current COVID-19 was 0.9% (95% credible interval: 0.1%–2.0%), corresponding to ∼85,000 prevalent infections (95% credible interval: 6,300–177,000) in Ohio adults during the study period. The estimated statewide prevalence of past COVID-19 was 1.3% (95% credible interval: 0.2%–2.7%), corresponding to ∼118,000 Ohio adults (95% credible interval: 22,000–240,000). Estimates did not change meaningfully due to non–response bias.

Conclusions. Total COVID-19 cases in Ohio in July 2020 were approximately 3.5 times as high as diagnosed cases. The lack of broad COVID-19 screening in the United States early in the pandemic resulted in a paucity of population-representative prevalence data, limiting the ability to measure the effects of statewide control efforts.

Context

This article describes the results of the survey for a public health audience. We developed a new Bayesian poststratification method to produce estimates from the data -described in PNAS.

Abigail Norris Turner led the overall study conducted by a large group of collaborators at The Ohio State University and Ohio State Department of Health.

2021-12-22 ▲

New working paper on arXiv

Li, Z. R., Z. Wu, I. Chen, and S. J. CLARK (2021). Bayesian Nested Latent Class Models for Cause-of-Death Assignment using Verbal Autopsies Across Multiple Domains. arXiv Preprint arXiv:2112.12186. [ PDF ]

Abstract

Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many low- and middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs as labeled data are usually unavailable in the target population. This article proposes a Latent Class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assign cause of death for out-of-domain observations, and estimate cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop an efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary materials for this article and the R package to implement the model are available online.

2021-10-26 ▲

This appeared on my Twitter feed recently: Demography Abandons Its Core by Ron Lee in 2001. Unfortunately this is still highly relevant, and those of us interested in formal demography as a recognizable field need to do something! The Formal Demography Working Group is being formed as a vehicle for doing that - please join if you are interested!

2021-10-01 ▲

Today Mary Shenk and I are discussants at the 16^th De Jong Lecture in Social Demography at Penn State. Hans Peter-Peter Kohler is the featured speaker. Link to slides.

2021-09-20 ▲

For the past 6-7 years my colleagues and I who work on verbal autopsy methods have been developing new statistical methods to automate identification of a cause of death using verbal autopsy records. Along the way we have developed a range of open source software tools to ensure that the methods are transparent and available to anyone who wants to use them. Recently we submitted a paper and posted a preprint on the openVA Toolit. This is a suite of open source software that can be used to apply a variety of algorithms to verbal autopsy data. Additional software that works with openVA - e.g. the openVA Pipeline that fully automates cause-coding in CRVS settings - and links to the GitHub repositories where software is maintained are available at openva.net.

2021-06-25 ▲

Published open-access in PNAS:

Kline, D., Z. Li, Y. Chu, J. Wakefield, W.C. Miller, A. Norris Turner, and S. J. CLARK (2021). Estimating Seroprevalence of SARS-CoV-2 in Ohio: A Bayesian Multilevel Poststratiﬁcation Approach with Multiple Diagnostic Tests. Proceedings of the National Academy of Sciences 118(26), e2023947118. [ DOI ]

Abstract

Globally, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected more than 59 million people and killed more than 1.39 million. Designing and monitoring interventions to slow and stop the spread of the virus require knowledge of how many people have been and are currently infected, where they live, and how they interact. The first step is an accurate assessment of the population prevalence of past infections. There are very few population-representative prevalence studies of SARS-CoV-2 infections, and only two states in the United States—Indiana and Connecticut—have reported probability-based sample surveys that characterize statewide prevalence of SARS-CoV-2. One of the difficulties is the fact that tests to detect and characterize SARS-CoV-2 coronavirus antibodies are new, are not well characterized, and generally function poorly. During July 2020, a survey representing all adults in the state of Ohio in the United States collected serum samples and information on protective behavior related to SARS-CoV-2 and coronavirus disease 2019 (COVID-19). Several features of the survey make it difficult to estimate past prevalence: 1) a low response rate; 2) a very low number of positive cases; and 3) the fact that multiple poor-quality serological tests were used to detect SARS-CoV-2 antibodies. We describe a Bayesian approach for analyzing the biomarker data that simultaneously addresses these challenges and characterizes the potential effect of selective response. The model does not require survey sample weights; accounts for multiple imperfect antibody test results; and characterizes uncertainty related to the sample survey and the multiple imperfect, potentially correlated tests.

Context

In July 2020 a large group of colleagues at The Ohio State University collaborated with the Ohio State Department of Health to conduct a probability-based sample survey representative of adults living in Ohio in order to estimate state-level CV19 prevalence of current and past infections. Conducting the survey at that time presented many challenges, including a large non-response rate and an array of tests whose performance characteristics were poorly understood (we used all that we could), and very few positive results from any test. These two issues, and others including the sampling design, presented a particular challenge for analysis and led us to develop a new Bayesian/poststratification approach to estimate state-wide prevalence.

There are still few representative sample surveys of CV19 biomarkers. Most other approaches suffer from the possibility of very large, consequential bias. This paper should be useful for anyone analyzing the results from a similar survey. Although we cannot share the data easily, we have made R code available to replicate the analysis: Bayes Prevalence.

Abigail Norris Turner led the overall study conducted by a large group of collaborators at The Ohio State University and Ohio State Department of Health.

In a truly team effort, Dave Kline, Richard Li, Yue Chue, Jon Wakefield, Bill Miller, Abigail Norris Turner, and me conducted the analysis and developed the overall approach. It was an enjoyable and productive experience working with this team.

2021-04-01:2 ▲

APHRC - the African Population Health Research Center - in Nairobi is seeking a consultant demographer for six months to lead the redesign of the Nairobi Urban Health and Demographic Surveillance System Site (NUHDSS). Full details available here.

2021-04-01:1 ▲

Today and tomorrow the openVA Team is presenting a training workshop for the CHAMPS project and our Data for Health Initiative colleagues in Thailand.

2021-03-31 ▲

A long-in-the-making collaboration with UNICEF produced its first non-academic product recently: Subnational Under-five Mortality Estimates, 1990–2019. This work grew out of a small collaboration with Jon Wakefield at the University of Washington, see small-area estimates. Jon grew the group working on it and together with Jessica Godwin saw it through to this. Congratulations to everyone involved!

2021-03-24 ▲

I gave a talk today at the 2021 'Berlin Demography Days': Global Population Studies in the 21st Century: Priorities Challenges - Mortality.

2021-03-10 ▲

Award

I learned today that the Social and Behavioral Sciences division of the School of Arts and Sciences at The Ohio State University has chosen me as a recipient of the Joan N. Huber Faculty Fellow Award. A brief description here.

New working paper on arXiv

I posted a slightly edited version of a paper titled "Health and Demographic Surveillance Systems and the 2030 Agenda: Sustainable Development Goals" on arXiv. This paper was invited at a UN Population Division experts' group meeting titled 'Strengthening the Demographic Evidence Base for the Post-2015 Development Agenda' that happened in New York, USA October 5-6, 2015.

Abstract

The health and demographic surveillance system (HDSS) is an old method for intensively monitoring a population to assess the effects of healthcare or other population-level interventions - often clinical trials. The strengths of HDSS include very detailed descriptions of whole populations with frequent updates. This often provides long time series of accurate population and health indicators for the HDSS study population. The primary weakness of HDSS is that the data describe only the HDSS study population and cannot be generalized beyond that.

The 2030 agenda is the ecosystem of activities - many including population-level monitoring - that relate to the United Nations (UN) Sustainable Development Goals (SDG). With respect to the 2030 agenda, HDSS can contribute by: (1) continuing to conduct cause-and-effect studies; (2) contributing to data triangulation or amalgamation initiatives; (3) characterizing the bias in and calibrating big data; and contributing more to the rapid training of data-oriented professionals, especially in the population and health fields.

2021-02-28 ▲

Commentary in PNAS

Commentary in PNAS: Monitoring epidemics: Lessons from measuring population prevalence of the coronavirus with Abigail Norris Turner. We highlight the need to improve response rates and to prepare a robust measurement capability to be ready for the next pandemic. DOI.

2021-02-08 ▲

Article out today in PLoS One

Linking the timing of a mother's and child's death: Comparative evidence from two rural South African population-based surveillance studies, 2000–2015 by Brian Houle, Chodziwadziwa W. Kabudula, Alan Stein, Dickman Gareta, Kobus Herbst, and Samuel J. Clark. DOI.

Abstract

Background. The effect of the period before a mother's death on child survival has been assessed in only a few studies. We conducted a comparative investigation of the effect of the timing of a mother's death on child survival up to age five years in rural South Africa.

Methods. We used discrete time survival analysis on data from two HIV-endemic population surveillance sites (2000–2015) to estimate a child's risk of dying before and after their mother's death. We tested if this relationship varied between sites and by availability of antiretroviral therapy (ART). We assessed if related adults in the household altered the effect of a mother's death on child survival.

Findings. 3,618 children died from 2000–2015. The probability of a child dying began to increase in the 7–11 months prior to the mother's death and increased markedly in the 3 months before (2000–2003 relative risk = 22.2, 95% CI = 14.2–34.6) and 3 months following her death (2000–2003 RR = 20.1; CI = 10.3–39.4). This increased risk pattern was evident at both sites. The pattern attenuated with ART availability but remained even with availability at both sites. The father and maternal grandmother in the household lowered children's mortality risk independent of the association between timing of mother and child mortality.

Conclusions. The persistence of elevated mortality risk both before and after the mother's death for children of different ages suggests that absence of maternal care and abrupt breastfeeding cessation might be crucial risk factors. Formative research is needed to understand the circumstances for children when a mother is very ill or dies, and behavioral and other risk factors that increase both the mother and child's risk of dying. Identifying families when a mother is very ill and implementing training and support strategies for other members of the household are urgently needed to reduce preventable child mortality.

2021-02-03 ▲

There will be occasional updates here. For now, the news is that I finally finished this web site after I was interrupted for a year by COVID-19. It's up to date and has all the basic content I had planned. The site is written in plain HTML with a simple cascading style sheet. It's straight from 1995, but it's super easy to modify/maintain/augment with a text editor, and with a shell script to upload everything, keeping the site up to date will be easy and does not require any expensive or takes-time-to-learn software or rely on third parties to fix things. :-)

2026-02-13 ▲

Effect of AI Assistance While Learning to Code

2026-01-30 ▲

Appointed to WHO Health Statistics TAG

2026-01-09-2 ▲

Review: HDSS Data Uses

Abstract

2026-01-09-1 ▲

HEALMOD Grant Cancelled

2025-12-12 ▲

Reference Data Archive (RDA) Is Now Live at WHO

2025-12-10 ▲

Block Domains that Send Spam Email

2025-10-20 ▲

Used Car Checklist

2025-10-13 ▲

A Very Interesting Paper: Sex-associated Differences in Life Span

Abstract

2025-09-27 ▲

RDA About Ready

2025-09-27 ▲

Yue Chu's Dissertation

Abstract

2025-09-27 ▲

Leg/ankle Surgery

2025-04-10 ▲

Knee Surgery

2025-02-08 ▲

R Package SVDMx on CRAN

2025-02-01 ▲

HEALMOD at OSU

From the HEALMOD site:

Purpose

Goals

2024-11-08 ▲

When All You Have Is a Hammer

Abstract

2024-10-22 ▲

Phone Number in Email Signature

2024-08-12 ▲

Back from Brazil

2024-08-01 ▲

Lancet Global Health Paper Out

2024-07-04 ▲

Tanzania Ministry of Health endorses InSilicoVA

2024-06-04 ▲

pyOpenVA

2024-03-29 ▲

Developing an Agenda for Population Aging and Social Research in Low- and Middle-Income Countries (LMICs): Proceedings of a Workshop (2024)

2024-03-28 ▲

Paper Accepted

Abstract

2024-03-26 ▲

Resources for Scientific Writing, Presentations, and Curation of Data and Code

Writing

Data and Code

2024-03-25 ▲

Interesting Papers

Abstract

2024-03-19 ▲

Pontol do Sul MTB

2024-03-15 ▲

Huge Milestone! Completed First VA Methods Development Project with a VA Study using InSilicoVA published in the Lancet Global Health

Abstract

2023-11-14 ▲

Three Papers Accepted

Abstract

Abstract

Abstract

2023-11-10 ▲

Reference Death Archive Kickoff at WHO

2023-11-01 ▲

Moved to Brazil for a Year

2023-10-07 ▲

Five Manuscripts R/R and South Africa/Brazil Exchange

2023-09-29 ▲

World Population Fractions Update - for 2022 WPP

2023-04-04 ▲

Contributed to an issue of the National Geographic Magazine

2023-03-06 ▲

R Package `SVDMx` on CRAN