Samuel Clark

Data Methods


Background

Early in my career I worked intensively with health and demographic surveillance system (HDSS) sites in Africa - particularly the Agincourt HDSS in South Africa, and often with Kobus Herbst who has been part of the senior leadership of the Africa Centre/Africa Health Research Institute (AHRI) since its inception. At that time a group of us were keenly interested in improving the ability of HDSS sites to collect, check, manage, and analyze their data more efficiently and accurately.

The HDSS methodology keeps detailed, longitudinal records on everyone who lives within a specified geographic boundary. These records include individual-specific attributes, group-level attributes, and time-evolving links between individuals and groups, and critically, between any of these things and specific locations. The data are a very complex mixture of time, place, groups, and attributes. This makes quality-control, storage, manipulation, and analysis difficult.

At its core, an HDSS database is a bi- or tri- temporal record-keeping system:

  1. time in the real work when things happen to people,
  2. observation times and durations when the study was in a position to know the status of something, and
  3. database transaction time - when something was known to the database.

The way that data are organized in a computer can have strong effects on what can be done with the data and how easy or hard it is to:

The Structured Population Event History Register

Since at least the early 1990s, a small group - Bruce MacLeod, Justus Benzler, Kobus Herbst, and me - has been interested in designing data models, more specifically relational database schemas, to make storage, management, and analysis of HDSS as accurate and easy as possible. In the context of this effort, I designed the Structured Population Event History Register (SPEHR). SPEHR is a very flexible, metadata-driven relational database schema for HDSS.

Unified Timestamp Timestamp with Explicit Precision

At around the same time Justus Benzler and I thought deeply about how to define a unified timestamp object that could handle all of the time-valued entities we encountered in HDSS, and also accurately represent the precision of measures associated with each of them.

Publications on SPEHR and and our generalized definition for a timestamp:


Updated 2021-02-02